Dynamic Round Robin CPU Scheduling Algorithm Based on K-Means Clustering Technique

Minimizing time cost in time-shared operating system is the main aim of the researchers interested in CPU scheduling. CPU scheduling is the basic job within any operating system. Scheduling criteria (e.g., waiting time, turnaround time and number of context switches (NCS)) are used to compare CPU scheduling algorithms. Round robin (RR) is the most common preemptive scheduling policy used in time-shared operating systems. In this paper, a modified version of the RR algorithm is introduced to combine the advantageous of favor short process and low scheduling overhead of RR for the sake of minimizing average waiting time, turnaround time and NCS. The proposed work starts by clustering the processes into clusters where each cluster contains processes that are similar in attributes (e.g., CPU service period, weights and number of allocations to CPU). Every process in a cluster is assigned the same time slice depending on the weight of its cluster and its CPU service period. The authors performed comparative study of the proposed approach and popular scheduling algorithms on nine groups of processes vary in their attributes. The evaluation was measured in terms of waiting time, turnaround time, and NCS. The experiments showed that the proposed approach gives better results.


Introduction
This section is divided into two subsections; the first subsection discusses CPU scheduling, and the second subsection discusses the clustering technique.

CPU Scheduling
The mechanism for allocating and de-allocating the CPU to a process is known as CPU scheduling [1][2][3]; the portion of the operating system that carries out these functions is called the scheduler. In multi-programming systems, the number of processes in the memory is restricted by the degree of multi-programming. There are several processes in the memory waiting to receive service from the CPU, the scheduler chooses the next process to assign the CPU, waits for its processing period, and de-allocates the CPU from that process. The scheduling mechanism is the order in which processes are selected for CPU processing. The scheduling scheme may be preemptive (i.e., the CPU is assigned to a process for a certain period) or non-preemptive (i.e., once the CPU has been assigned to a process, the process keeps the CPU until it liberates the CPU either by switching to another state or by terminating). Many different CPU scheduling algorithms have been suggested. Under First Come First Served (FCFS), non-preemptive CPU-scheduling, the process that arrives first, gets executed first. Shortest Job First (SJF), non-preemptive CPU-scheduling, selects the process with the shortest burst time. The preemptive version of SJF is called Shortest Remaining Time First (SRTF); processes are placed into the ready queue as they arrive, and the existing process is removed or preempted from execution as a process with short burst time arrives, and the shorter process is executed first. Under priority scheduling, preemptive CPU-scheduling, if a new process arrived has a higher priority than the currently running process, the processing of the current process is paused and the CPU is assigned to the incoming new process [4].
RR scheduling is the most common of the preemptive scheduling algorithms [5], referred to hereafter as Standard RR (SRR), used in real-time operating systems and timesharing [6,7]. In RR scheduling, the operating system is driven by a regular interrupt. Processes are selected in a fixed sequence for execution [8]. A process receiving CPU service is interrupted by the system timer after a short fixed interval called time slice which usually is much shorter than the CPU service period (or CPU burst) of the process [9][10][11]. After that interruption, the scheduler performs a context switch to the next process selected from the ready queue which is treated as a circular queue [12,13]. Thus, all processes in the queue are given a chance to receive service for a short fixed period. This scheduling mechanism is basically used in timesharing systems [14][15][16]. The efficiency of RR algorithm depends on the time slice, if the time slice is small, overheads of more context switches will occur and if the time slice is large, RR behaves somewhat similar to FCFS with the possibility of starvation occurrence between processes. The scheduling algorithm performance depends upon the scheduling states of waiting time (i.e., total period the process spent waiting in the ready queue), turnaround time (i.e., total time between process submission and its completion), and number of context switches (NCS) [17][18][19].

Clustering Technique
Dividing the data into groups that are useful, meaningful, or both is known as clustering [20]; greater difference between clusters and greater homogeneity (or similarity) within a cluster lead to better clustering. Clustering is regarded as a type of classification in that it generates cluster labels of the homogeneous objects [21,22]. Major concern in the clustering process is revealing the collective of patterns into reasonable groups allowing one to find out similarities and differences, as well as to deduce useful and important inferences about them. Unlike classification, in which the classes are predefined and the classification procedure specifies an object to them, clustering creates foremost groups in which the values of the dataset are classified during the classification process (i.e., categorizes subjects (data points) into different groups (clusters)). Such a categorizing process depends on the selected the algorithm adopted and characteristics [23]. The type of features determines the algorithm used in the clustering, for example, conceptual algorithms are used for clustering categorical data, statistical algorithms are used for clustering numeric data, fuzzy clustering algorithms allow data point to be classified into all clusters with a degree of membership ranging from 0 to 1, this degree indicates the similarity of the data point to the mean of the cluster. Most commonly used traditional clustering algorithms can be divided into 9 categories, summarized in Table 1. Ten categories of modern clustering algorithms contain 45 [24]. K-means is the simplest and most commonly used clustering algorithm. The simplicity comes from the use of squared error as stopping criterion. Besides its simplicity, time complexity of K-means is low O(nkt) , where n: the number of objects, k: the number of clusters, and t: the number of iterations. In addition, K-means is used of large-scale data [24]. It partitions the dataset into K clusters (C 1 , C 2 , . . . , C K ), represented by their means or centers to minimize some objective function that depends on the proximities of the subjects to the cluster centroids. Equation (1) describes the function to be minimized in weighted K-means [25].
where K is the number of clusters set by the user, π x is the weight of x, m k = x∈C k π x x n k is the centroid of cluster C k , and the function " dist " computes the distance between object x and centroid m k , 1 ≤ k ≤ K. Equation (2) describes the function to be minimized in standard k-means clustering [26].
where u k represents the kth center, and x i represents the ith point in the dataset. While the selection of the distance function is optional, the squared Euclidean distance, i.e., x − m 2 , has been most widely used in both practice and research. The K value was set according to the given number of clusters for each dataset [27,28]. K-means clustering method requires all data to be numerical. The pseudo-code of K-means Algorithm 1 is as follows:

Algorithm 1 K-Means
Input -Dataset -number of clusters Output -K clusters Step-1: -Initialize K centers of the cluster Step-2: -Repeat -Calculate the mean of all the objects belonging to that cluster µ k = 1 N k N k q=1 x q where µ k is the mean of cluster k and N k is the number of points belonging to that cluster -Assign objects to the closest cluster centroid -Update cluster centroids based on the assignment -Until centroids do not change Determining optimal number of clusters in a dataset is an essential issue in clustering. Many cluster evaluation techniques have been proposed, one of which is the Silhouette method. Silhouette method measures the quality of a clustering; it determines how well each data point lies within its cluster. A high average silhouette width indicates a good clustering [29]. The Silhouette method can be summarized as follows: 1.
Compute clustering algorithm for different values of k. For instance, by varying k from 1 to 10 clusters.

2.
For each k, calculate total Within-cluster Sum of Square (WSS).

3.
Plot the curve of WSS according to the value of k.

4.
The location of a knee in the curve indicates the appropriate number of clusters. The Silhouette coefficient (S i ) of the ith data point is defined in Equation (3).
where b i , is the average distance between the ith data point and all data points in different clusters; a i , is the average distance between the ith data point and all other data points in the same cluster [30,31]. Motivation: Timesharing systems depend on the time slice used in RR scheduling algorithm. Overheads of more context switches (resulted from choosing short time slice), and starvation (resulted from choosing long time slice) should be avoided.
Organization: The rest of this paper is divided as follows: Section 2 discusses the related work. Section 3 presents the proposed algorithm. The experimental implementation is discussed in Section 4. Section 5 concludes this research work (see Figure 1).  = max ( , ) (3) where , is the average distance between the ℎ data point and all data points in different clusters; , is the average distance between the ℎ data point and all other data points in the same cluster [30,31].
Motivation: Timesharing systems depend on the time slice used in RR scheduling algorithm. Overheads of more context switches (resulted from choosing short time slice), and starvation (resulted from choosing long time slice) should be avoided.
Organization: The rest of this paper is divided as follows: Section 2 discusses the related work. Section 3 presents the proposed algorithm. The experimental implementation is discussed in Section 4. Section 5 concludes this research work (see Figure 1).

Related Works
For better CPU performance in most of the operating systems, the RR scheduling algorithm is widely implemented. Many variants of the RR algorithm have been proposed to minimize average waiting time and turnaround. This section discusses the most common versions of RR. Table 2 shows a comparison between the known versions of SRR.
Aaron and Hong [32] proposed a dynamic version of SRR named Variable Time Round-Robin scheduling (VTRR). The time slice allocated to a process depends on the time needed to all tasks, process's burst time, and number of processes in the ready queue.
Tarek and Abdelkader [33] proposed a weighting technique for SRR. The authors classified the processes into five weight categories based on their burst times. The weight of a process is inversely proportional to the weight; process with high weight receives more time slice and vice versa. Processes with burst time less than or equal to 10 tu receive 100% of the time slice defined by SRR, Processes with burst time less than or equal to 25 tu receive 80% of the time slice defined by SRR, and so on.
Samih et al., [18] proposed a dynamic version of SRR named Changeable Time Quantum (CTQ). Their algorithm finds the time slice that gives the smallest average waiting time at every round. CTQ calculates the average waiting time for a specific range of time slices and picks up the

Related Works
For better CPU performance in most of the operating systems, the RR scheduling algorithm is widely implemented. Many variants of the RR algorithm have been proposed to minimize average waiting time and turnaround. This section discusses the most common versions of RR. Table 2 shows a comparison between the known versions of SRR. Aaron and Hong [32] proposed a dynamic version of SRR named Variable Time Round-Robin scheduling (VTRR). The time slice allocated to a process depends on the time needed to all tasks, process's burst time, and number of processes in the ready queue.
Tarek and Abdelkader [33] proposed a weighting technique for SRR. The authors classified the processes into five weight categories based on their burst times. The weight of a process is inversely proportional to the weight; process with high weight receives more time slice and vice versa. Processes with burst time less than or equal to 10 tu receive 100% of the time slice defined by SRR, Processes with burst time less than or equal to 25 tu receive 80% of the time slice defined by SRR, and so on.
Samih et al. [18] proposed a dynamic version of SRR named Changeable Time Quantum (CTQ). Their algorithm finds the time slice that gives the smallest average waiting time at every round. CTQ calculates the average waiting time for a specific range of time slices and picks up the time slice corresponding to smallest average waiting time. Then, the processes in this round execute for this time slice.
Lipika [34] proposed a dynamic version of SRR by adjusting the time slice at the beginning of each round. The time slice is calculated depending on the remaining burst times in the subsequent rounds. In addition, the author also implemented SJF [35][36][37]. In SJF, the processes located in the ready queue are sorted in increasing order based on their burst times (i.e., the process having lowest burst time will be at the front of the ready queue and process having highest burst time will be at the end of the ready queue).
Christoph and Jeonghw [7] presented a dynamic version of SRR named Adaptive80 RR. The reason behind this name is that the time slice is set equal to process's burst time at 80th percentile. Like Lipika's [34], Adaptive80 RR sorts processes in increasing order. The time slice in each round depends on the processes located in the ready queue and if new process arrived, it will be added to the ready queue and will be considered in the subsequent calculations.
Samir et al. [38] proposed a hybrid scheduling algorithm based on SRR and SJF named SJF and RR with dynamic quantum (SRDQ). Their algorithm divided the ready queue into two subqueues Q1 and Q2; Q2 for long tasks (longer than the median) and Q1 for short tasks (shorter than the median). Like Adaptive80 RR [7] and Lipika's [34] algorithms, this algorithm sorts the processes in ascending order in each subqueue. In every round, each process will be assigned a time slice depends on the median and the burst time of this process.
Samih [19] proposed a Proportional Weighted Round Robin (PWRR) as a modified version of SRR. PWRR assigns time slice to each process proportional to its burst time. Each process has a weight calculated by dividing its burst time by the summation of all burst times in the ready queue. Then the time slice is calculated depending on this weight. Samih and Hirofumi [17] proposed a version of SRR named Adjustable Round Robin (ARR) that combines the low-scheduling overhead of SRR and favors a short process. ARR gives short process a chance, under predefined condition, to be executed until termination to minimize the average waiting time.
Uferah et al. [13] proposed a dynamic version of SRR named Amended Dynamic Round Robin (ADRR). The time slice is cyclically adjusted based on the process burst time. Like Adaptive80 RR [7], SRDQ [38] and Lipika's [34] algorithms, ADRR sorts the processes in ascending order.

The Proposed Algorithm
The processes' weights (PW) and numbers of allocation to the CPU (i.e., NCS ) depend on the processes' burst times (BT), which are known, and are calculated as shown in the following subsections. We assumed that all processes arrive at the same time. The main advantage of the clustering technique in the proposed work is the ability to group similar processes in clusters. Similarity between processes depends on the values of BT, PW, and NCS near each other, K-means algorithm is used for this purpose. The proposed technique consists of three stages: Data preparation, data clustering, and finally, dynamic time slice implementation.

Data Preparation
Data preparation stage consists of calculating PW and NCS. The weight of the ith process, PW i , is calculated from Equation (4): where BT i is the burst time of the ith process, and N is the number of the processes in the ready queue. The number of context switches of the ith process, NCS i , is calculated from Equation (5): where STS (standard time slice) is given by SRR, and X denotes the largest integer smaller than or equal to X. If the last round contains one process, this process will continue execution without switching its contents [4].

Data Clustering
Second stage comprises of two phases: First phase is finding the optimum number of clusters by using Silhouette method. A high average Silhouette width indicates a good clustering. Second phase is clustering the data into k number resulted from Silhouette method by using K-means algorithm. In the proposed work, the clustering metrics are BT, PW, and NCS.
Each point is assigned to the closest centroid, and each combination of points assigned to the same centroid is a cluster. The assignment and updating steps are repeated until all centroids remain the same. To quantify the notion of "closest" for a specific data, the Euclidean (L2) distance is the proximity measure for data points in Euclidean space.

Dynamic Time Slice Implementation
In the third stage, dynamic time slice implementation, the weight of the lth cluster, CW l , is calculated from Equation (6): where Cavg l is the average of burst times in the lth cluster. The time slice assigned to the lth cluster, CTS l , is calculated from Equation (7): Each process in this cluster will execute for CTS l . In addition, a process that is close to its completion will get a chance to complete and leave the ready queue. A threshold is determined to allow the process that possesses burst time greater than STS and close to its completion to continue execution until termination. Therefore, the number of processes in the ready queue will be reduced by knocking out short processes relatively faster in the hope to minimize average waiting and turnaround times. The residual burst time of the ith process, RBT i , is calculated from Equation (8).
If a process satisfies the threshold condition, its RBT equals zero and leaves the queue. When a new process arrived, it will be put at the tail of the queue to be scheduled in the next round. The clustering Appl. Sci. 2020, 10, 5134 7 of 14 technique will be applied again for the survived processes (i.e., processes with BT or RBT greater than the time slice assigned to them in the current round) and new processes (if arrived) in the next round. Figure 2 shows the flowchart of the proposed algorithm. execution until termination. Therefore, the number of processes in the ready queue will be reduced by knocking out short processes relatively faster in the hope to minimize average waiting and turnaround times. The residual burst time of the ℎ process, , is calculated from Equation (8).

= −
If a process satisfies the threshold condition, its equals zero and leaves the queue. When a new process arrived, it will be put at the tail of the queue to be scheduled in the next round. The clustering technique will be applied again for the survived processes (i.e., processes with or greater than the time slice assigned to them in the current round) and new processes (if arrived) in the next round. Figure 2 shows the flowchart of the proposed algorithm.

Illustrative Examples
The following examples provide a more in depth understanding of the proposed technique.

Example 1
From the benchmark datasets used in the experiments, the first dataset which contains 10 processes (see Table 3) will be used in this example. and are calculated from Equations (4) and (5), respectively. Silhouette method is used to find the optimum value of . The location of a knee in the curve (see Figure 3) indicates the optimal number of clusters. From the curve, the optimal value of is 2.

Illustrative Examples
The following examples provide a more in depth understanding of the proposed technique.

Example 1
From the benchmark datasets used in the experiments, the first dataset which contains 10 processes (see Table 3) will be used in this example. PW and NCS are calculated from Equations (4) and (5), respectively. Silhouette method is used to find the optimum value of k. The location of a knee in the curve (see Figure 3) indicates the optimal number of clusters. From the curve, the optimal value of k is 2. Now, k = 2 will be used by K-means algorithm to clustering the data points. The 7th and 8th processes are grouped into cluster 1 and the others are grouped into cluster 0 (Table 4).  Figure 3. Finding optimal number of clusters. Now, = 2 will be used by K-means algorithm to clustering the data points. The 7th and 8th processes are grouped into cluster 1 and the others are grouped into cluster 0 ( Table 4). The weight of cluster 0 equals 0.12271, and the weight of cluster 1 equals 0.87729. The time slice assigned to cluster 0 is 8.77287 tu, and the time slice assigned to cluster 1 is 1.227123 tu. The 7th and 8th processes will be assigned ..112.13 tu, and other processes will be assigned 1.227123 tu. The burst time of the third process is smaller than its cluster's time slice, therefore it will terminate and leave. The number of processes, burst times, and weights will be updated for the next iteration and so forth until the ready queue becomes empty.  The weight of cluster 0 equals 0.12271, and the weight of cluster 1 equals 0.87729. The time slice assigned to cluster 0 is 8.77287 tu, and the time slice assigned to cluster 1 is 1.227123 tu. The 7th and 8th processes will be assigned 1.227123 tu, and other processes will be assigned 1.227123 tu. The burst time of the third process is smaller than its cluster's time slice, therefore it will terminate and leave. The number of processes, burst times, and weights will be updated for the next iteration and so forth until the ready queue becomes empty.

Example 2
Giving a short process more CPU time decreases the waiting time of this process more than it increases the waiting time of the long process. Consequently, the average waiting time decreases. To illustrate this concept, assume the following set of processes ( Table 5) that arrive at the same time, each of which with its burst time, and the STS is 10 tu. Using SRR scheduling, we would schedule these processes according to the following Gantt chart: Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 16 Giving a short process more CPU time decreases the waiting time of this process more than it increases the waiting time of the long process. Consequently, the average waiting time decreases. To illustrate this concept, assume the following set of processes ( Table 5) that arrive at the same time, each of which with its burst time, and the STS is 10 tu. Table 5. Four processes with the length of the burst time. P1  15  P2  10  P3  31  P4 17 Using SRR scheduling, we would schedule these processes according to the following Gantt chart:

Process ID BT
The waiting time is 30 tu for process P1, 10 tu for process P2, 42 tu for process P3, and 45 for process P4. Thus the average waiting time is 31.75 tu, and the average turnaround time is 50 tu. On the other hand, suppose that each process is assigned a percentile of STS equals ((STS / BT) × STS) as in Table 6.  TS  RBT  TS  RBT  TS  RBT TS   P1  P2  P3  P4  P1  P3  P4  P3  0  10  20  30  40  45  55  62  73 The waiting time is 30 tu for process P1, 10 tu for process P2, 42 tu for process P3, and 45 for process P4. Thus the average waiting time is 31.75 tu, and the average turnaround time is 50 tu. On the other hand, suppose that each process is assigned a percentile of STS equals ((STS / BT) × STS) as in Table 6. Table 6. Assigning time slice for the running processes in each round. Round 2  Round 3  Round 4  Process ID  BT  TS  RBT  TS  RBT  TS  RBT  TS   P1  15  7  8   12.5  terminates  after 8 tu   ----P2  10  10  terminates  ------P3  31  4  27  4  23  4  19  19   P4  17  6  11  9  2 50 terminates after 2 tu --

Round 1
The results will be as shown in the following Gantt chart: The waiting time is 20 tu for process P1, 7 tu for process P2, 42 tu for process P3, and 37 for process P4. Thus the average waiting time is 26.5 tu, and the average turnaround time is 44.75 tu.

Experimental Implementation
The experiments were carried out using a computer with the following specification: Intel core i5-2400 (3.10 GHz) processor, 16   The waiting time is 20 tu for process P1, 7 tu for process P2, 42 tu for process P3, and 37 for process P4. Thus the average waiting time is 26.5 tu, and the average turnaround time is 44.75 tu.

Experimental Implementation
The experiments were carried out using a computer with the following specification: Intel core i5-2400 (3.10 GHz) processor, 16

Benchmark Datasets
Nine synthetic datasets are used to test the performance of the algorithms used in the comparison. Each dataset contains a number of processes used for numerical simulation. For each process of each dataset, the burst time is randomly generated. The datasets on hand vary in number of processes, processes' burst times, processes' weights, and processes' number of context switches. Detailed information on datasets is presented in Table 7. Weight and NCS depend on the process's burst time, which means that the most important variant of the benchmark dataset is the burst times of its processes. Table 7. Datasets specifications. The first column presents the dataset ID, the second column presents the number of processes in each dataset, the third column presents the number of attributes (i.e., BT, PW, and NCS), and the forth column presents the standard deviations.

Performance Evaluation
The proposed algorithm was compared with five common algorithms; PWRR, VTRR, BRR, SRR, and ADRR on different nine combinations of number of processes and burst times. The authors implemented all these algorithms using the benchmark datasets. The average waiting and turnaround times depend on the number of processes in the ready queue; as number of processes increases, time cost increases. In addition, long burst times of the processes increase the time cost. To emphasize the efficiency of the proposed algorithm, benchmark datasets varying in number and burst times of processes are used. The time taken in clustering the dataset is trivial. Comparing with waiting and turnaround times, it can be said that the clustering time has no effect and can be ignored. Table A1 compares between the running times of the proposed algorithm (including the times consumed in the clustering) against other methods. Table A2 shows a comparison of the time cost between the proposed algorithm and other algorithms in terms of average waiting time, turnaround time, and NCS. Table A3, Figure 4, and Figure 5 show the superiority of the proposed algorithm over the compared algorithms in all the datasets where the time cost of the proposed algorithm is the smallest (average waiting time and average turnaround time are 979.14 tu, 1061.36 tu respectively). Figure 6 shows how much improvement is achieved by the proposed algorithm. Unlike PWRR, BRR, and SRR which their time slices are less than or equal to the time slice defined by SRR, VTRR may behave somewhat similar to FCFS as it may give a process very long time slice. Therefore, the time slices calculated in the VTRR algorithm are restricted to be less than or equal to the time slice defined by SRR (i.e., 10 tu).

Conclusions and Future Work
In this paper, a dynamic SRR-based CPU scheduling algorithm has been proposed that could be used in timesharing systems in which it is important to reduce the time cost of the scheduling. Unlike the SRR algorithm which uses fixed time slice for the processes in the ready queue in all rounds, the proposed algorithm uses dynamic time slice. The proposed algorithm benefits from

Conclusions and Future Work
In this paper, a dynamic SRR-based CPU scheduling algorithm has been proposed that could be used in timesharing systems in which it is important to reduce the time cost of the scheduling. Unlike the SRR algorithm which uses fixed time slice for the processes in the ready queue in all rounds, the proposed algorithm uses dynamic time slice. The proposed algorithm benefits from clustering technique in grouping processes that resemble each other in their features (i.e., burst times, weights, and NCS). Each cluster is assigned a time slice proportional to its weight, and every process in this cluster receives this amount of time for its execution in the current round. The features will be updated in the subsequent rounds. In addition, the proposed algorithm gives processes that are close to completion a chance to complete execution and leave; this in turn helps in reducing number of processes in the ready queue and reducing NCS. The proposed algorithm was compared with five common algorithms from the point of view of average waiting, turnaround times, and NCS. The proposed algorithm outperformed all the others; however, it behaves somewhat similar to ADRR in NCS.
In cloud-computing systems, achieving optimality in scheduling processes over computing nodes is an important aim for all researchers interested in both cloud and scheduling, where the available resources must be scheduled using an efficient CPU scheduler to be assigned to clients. The relation between CSP (cloud service providers) and CSC (cloud service consumers) is formalized through service level agreement (SLA). CSP must achieve best performance through minimizing time cost. Because of the superiority of the proposed algorithm over the compared algorithms in all the datasets, the proposed algorithm in this work is promising for cloud computing systems.