Clustering and Dispatching Rule Selection Framework for Batch Scheduling

: In this study, a batch scheduling with job grouping and batch sequencing is considered. A clustering algorithm and dispatching rule selection model is developed to minimize total tardiness. The model and algorithm are based on the constrained k-means algorithm and neural network. We also develop a method to generate a training dataset from historical data to train the neural network. We use numerical examples to demonstrate that the proposed algorithm and model e ﬃ ciently and e ﬀ ectively solve batch scheduling problems.


Introduction
The aim of this study involves developing a framework that schedules batch jobs with identical machines to minimize total tardiness. The objectives of this framework include solving the problem in a short period, solving various batch scheduling problems, and improving performance as more problems are solved. We design a framework based on clustering and classification models because the machine learning approach fits well with the aforementioned goals.
Typical approaches for batch scheduling problems are categorized into heuristic, meta-heuristic, and dispatching rule approaches. The heuristic approach develops an algorithm that is problem-specific. The meta-heuristic approach modifies a meta-heuristic algorithm (e.g., to solve problems). The dispatching rule approach calculates the priority of each job or batch of jobs via a set of predetermined dispatching rules. The jobs are then processed in order of descending priority.
Heuristic and meta-heuristic approaches are usually dependent on the types of problems that are considered [1,2]. Since they are designed for a specific problem, they may yield relatively poor results when applied to different problems, or a substantial modification to the model is necessary before application. Additionally, a relatively long run time is sometimes necessary to yield a schedule using these approaches.
The dispatching rule approach is suitable for the aforementioned goals because it is relatively robust in terms of problem types, and thus can be applied to various problems without substantial modification to yield a schedule quickly. However, this approach occasionally encounters issues in selecting the optimal dispatching rule because it cannot outperform other rules in every scheduling situation [3]. Thus, an optimal universal dispatching rule is absent. Additionally, the optimal rule can change as successive jobs are completed, thereby implying that a dynamic dispatching rule can work better. Several studies developed classification models to select dynamic dispatching rules. For example, Mouelhi-Chibani and Pierreval [3] trained a neural network with dozens of queue state variables to select dispatching rules for flow shop scheduling. Similarly, Shiue [4] developed a support vector machine-based dispatching rule selection model for shop floor control system scheduling.
Meanwhile, a few studies developed machine learning model to determine scheduling parameters.
Mathematics 2020, 8 Mönch et al. [5] trains a neural network and inductive decision tree to estimate the weight of the ATC (apparent tardiness cost) rule, which consists of two dispatching rules: weighted shortest processing time rule and the least slack rule. However, there is a paucity of extant studies on developing a dispatching rule selection model for the batch scheduling problem. This requires grouping jobs into batches before applying a dispatching rule. We design a clustering algorithm to group similar jobs under machine capacity constraints and add it to the framework.
The major contents and contributions of this study are as follows. First, we develop a clustering algorithm based on the constrained k-means algorithm to group jobs into batches. The algorithm uses a set of features impacting total tardiness and yields clusters that contain jobs wherein sums of job sizes do not exceed the machine load capacity. Second, we develop a step-by-step method based on the Monte Carlo Markov chain (MCMC) approach to create a training dataset to work with the dispatching rule selection model. Each sample in the dataset represents the virtual queue state via queue state variables (e.g., the number of batches in the queue and maximum processing time of the batches). Subsequently, the samples are attached to the optimal dispatching rule for the particular state. Third, we design a dispatching rule selection model based on neural networks for batch sequencing. Whenever a machine is available, queue state variable values for the current queue state enter the model to yield the optimal dispatching rule.
Section 2 introduces the batch scheduling problem including major assumptions, notations, and an overview of the framework. Section 3 describes methods to generate the training dataset to transform the variables to the queue status variables and to design the dispatching rule selection model. Section 4 discusses the clustering algorithm and application procedure and explains the model deployment process. Section 5 provides illustrative examples to show the application of the proposed model. Section 6 summarizes our conclusions and proposes future research directions.

Problem Statement, Notations, and Framework Overview
There are n jobs to be processed by one of m identical machines. Each job is characterized by job characteristics, namely processing time (p), size (s), due date (d), and release time (r). Jobs are grouped into batches, and a process sequence is determined via a dispatching rule ( Figure 1). The former and latter stages are termed as grouping jobs and sequencing batches, respectively [6]. The sum of job sizes in each batch must be equal to or less than the load capacity of a machine. Each batch is processed by a machine as soon as it is available. of the ATC (apparent tardiness cost) rule, which consists of two dispatching rules: weighted shortest processing time rule and the least slack rule. However, there is a paucity of extant studies on developing a dispatching rule selection model for the batch scheduling problem. This requires grouping jobs into batches before applying a dispatching rule. We design a clustering algorithm to group similar jobs under machine capacity constraints and add it to the framework.
The major contents and contributions of this study are as follows. First, we develop a clustering algorithm based on the constrained k-means algorithm to group jobs into batches. The algorithm uses a set of features impacting total tardiness and yields clusters that contain jobs wherein sums of job sizes do not exceed the machine load capacity. Second, we develop a step-by-step method based on the Monte Carlo Markov chain (MCMC) approach to create a training dataset to work with the dispatching rule selection model. Each sample in the dataset represents the virtual queue state via queue state variables (e.g., the number of batches in the queue and maximum processing time of the batches). Subsequently, the samples are attached to the optimal dispatching rule for the particular state. Third, we design a dispatching rule selection model based on neural networks for batch sequencing. Whenever a machine is available, queue state variable values for the current queue state enter the model to yield the optimal dispatching rule.
Section 2 introduces the batch scheduling problem including major assumptions, notations, and an overview of the framework. Section 3 describes methods to generate the training dataset to transform the variables to the queue status variables and to design the dispatching rule selection model. Section 4 discusses the clustering algorithm and application procedure and explains the model deployment process. Section 5 provides illustrative examples to show the application of the proposed model. Section 6 summarizes our conclusions and proposes future research directions.

Problem Statement, Notations, and Framework Overview
There are jobs to be processed by one of identical machines. Each job is characterized by job characteristics, namely processing time ( ), size ( ), due date ( ), and release time ( ). Jobs are grouped into batches, and a process sequence is determined via a dispatching rule ( Figure 1). The former and latter stages are termed as grouping jobs and sequencing batches, respectively [6]. The sum of job sizes in each batch must be equal to or less than the load capacity of a machine. Each batch is processed by a machine as soon as it is available. The batch processing time denotes the longest processing time among all jobs in the batch [7]. The objective of scheduling involves minimizing total tardiness as follows: where , , and denote tardiness, completion time, and due date of job , respectively. The major assumptions are as follows:


All job characteristics, , , , , of job are known in advance.  Every job is ready for processing at its release time. The batch processing time denotes the longest processing time among all jobs in the batch [7]. The objective of scheduling involves minimizing total tardiness as follows: where T j , c j , and d j denote tardiness, completion time, and due date of job j, respectively. The major assumptions are as follows: • All job characteristics, p j , s j , d j , r j , of job j are known in advance.
• Every job is ready for processing at its release time.

•
Neither setup nor rework is assumed.

•
Each machine only processes one batch at a time.

•
The characteristics data of several historical jobs are available and are sufficient to create a training dataset for the dispatching rule selection model.
Notations used in this study are as follows.
Training sample index (k = 1, 2, · · · , K) Job characteristics p j Processing time of job j s j Size of job j d j Due date of job j r j Release time of job j ξ j Slack time of job j, i.e., the remaining time until due date after finishing the process, at its release time, ξ j = d j − r j − p j ξ t Notations for clustering δ Load capacity of a machine π j Vector of job j's characteristics for grouping purposes, π j = p j , d j , r j C i Centroid of batch i E i,j Distance between C i and π j Notations for dispatching rule selection y k Dispatching rule for sample k (k = 1, 2, · · · , K) W k Number of batches waiting in the queue in sample k, W k ≤ W max for all k t k System's clock time with respect to sample k f c Probability mass function of c, where c either corresponds to processing time (p), size (s), slack time (ξ), or release time (r) Ψ q Candidate queue state variable (q = 1, 2, · · · , Q ) S q Selected queue state variable (q = 1, 2, · · · , Q), Q ≤ Q Figure 2 shows the step-by-step process that is used to obtain the set of job batches and the batch sequence for the machines to process. The framework consists of two tracks. Track I is termed as the model training track and is used to build a dispatching rule selection model via a training dataset of batches that is artificially created from historic data. Track II is termed as the scheduling track and is used to group the actual jobs into batches and to determine the processing sequence using the model provided by Track I. Step-by-step process.

Figure 2.
Step-by-step process. A more detailed explanation of Track I is presented in Section 3, and Section 4 introduces Track II. Numbers inside the circle at the upper-left corner of the blocks denote the corresponding steps and subsections where the corresponding description are provided.

Generating the Training Dataset
In this subsection, we introduce the step-by-step method used to create a dataset from historic data of job characteristics. The probability distributions of each job characteristic are calculated, and samples are then created to form a virtual batch dataset. This dataset is then transformed to a more adequate format for training.
Step 3.0: Prepare the probability distributions of job characteristics. First, the pmfs (probability mass functions) f p , f s , f ξ , and f r of processing time (p), size (s), slack time (ξ), and release time (r), respectively, are estimated from the historic data based on the relative frequencies as follows: where c ∈ p, s, ξ, r and N(c =ĉ) denote the number of jobs that satisfy c =ĉ.
Step 3.1: Create batches in the queue dataset.
With the pmfs, total K samples are generated where sample k (k = 1, · · · , K) exhibits W k batches and each batch consists of one or more jobs characterized by job characteristics (p, s, d, r). Specifically, . Next, the system's clock time t k of sample k should exceed any release times of jobs in the queue ( max j∈Sample k r j ). It also must be smaller than the maximum remaining processing time, and this is as follows. There are currently W k batches that remain in the queue, and thus it is inferred that W max − W k batches are processed and the mean processing time of existing batches in the queue corresponds to i∈Sample k p (i) /W k . Therefore, t k is arbitrarily picked and satisfies max j∈Sample k Figure 3 shows a typical example of a set of K samples generated from estimated pmfs. The set of artificial samples is termed as the queue dataset because the batches in each sample wait in the queue to be processed.

Generating the Training Dataset
In this subsection, we introduce the step-by-step method used to create a dataset from historic data of job characteristics. The probability distributions of each job characteristic are calculated, and samples are then created to form a virtual batch dataset. This dataset is then transformed to a more adequate format for training.
Step 3.0: Prepare the probability distributions of job characteristics. First, the pmfs (probability mass functions) , , , and of processing time ( ), size (s), slack time ( ), and release time (r), respectively, are estimated from the historic data based on the relative frequencies as follows: where ∈ , , , and ( =) denote the number of jobs that satisfy =.
Step 3.1: Create batches in the queue dataset.
With the pmfs, total samples are generated where sample ( = 1, ⋯ , ) exhibits batches and each batch consists of one or more jobs characterized by job characteristics ( , , , ).
Specifically, is arbitrarily picked in the interval 1, , where = ∑ . Next, the system's clock time of sample should exceed any release times of jobs in the queue ( max ∈ ). It also must be smaller than the maximum remaining processing time, and this is as follows. There are currently batches that remain in the queue, and thus it is inferred that − batches are processed and the mean processing time of existing batches in the queue corresponds to ∑ ( ) / ∈ . Therefore, is arbitrarily picked and satisfies max Figure 3 shows a typical example of a set of samples generated from estimated pmfs. The set of artificial samples is termed as the queue dataset because the batches in each sample wait in the queue to be processed. Note: It is necessary to exercise caution when more than one job is created in a batch. When a job is sampled, then the next job should be "similar" in terms of its job characteristics ( , , , ) to that previously sampled because batches contain "similar" jobs. This is performed by using the following conditional pmfs as follows: Note: It is necessary to exercise caution when more than one job is created in a batch. When a job is sampled, then the next job should be "similar" in terms of its job characteristics (p, s, d, r) to that previously sampled because batches contain "similar" jobs. This is performed by using the following conditional pmfs as follows: where c denotes one of the variables p, ξ, r , c pre denotes the previously generated job's characteristic, N ĉ andĉ pre in clusters denotes the number of clusters containing jobs satisfying c =ĉ and c =ĉ pre , and N ĉ pre in clusters denotes the number of clusters that contain at least one job wherein c corresponds toĉ pre . Algorithm 1 shows the generation process of the queue dataset explained above.
Queue dataset Step 3.2: Transform the variables. Each sample of the queue dataset contains the values of system time (t) and job characteristics (p, d, s, r). It is necessary to transform them into the values representing the status of batches in the queue, as this facilitates improved training for the machine learning model. We introduce queue state variables, Ψ q (q = 1, · · · , Q ), with which the current status of the queue is well defined and contributes to determining the optimal dispatching rule [3]. Queue state variables are defined by in extant studies [3,4,8] as listed in Table 1. Variables Ψ 1 to Ψ 9 represent the batch state (e.g., number of batches and processing times of the batch) while Ψ 10 to Ψ 21 describe the job state (e.g., slack time of jobs). Moreover, the queue dataset consists of 21 queue state variables along with system time.
Step 3.3: Assign the optimal dispatching rule to each training sample. This step determines the optimal dispatching rule y k for sample k (k = 1, 2, · · · , K) of the batches in the queue dataset. The dispatching rules in Table 2 are considered.

Dispatching Rule Batch with the Highest Priority
Shortest processing time first (SPT) We apply each rule to the batches in each queue dataset sample and then determine the optimal rule to minimize total tardiness. This is appended to each queue dataset sample and is used as a class label of the dispatching rule selection model.
Step 3.4: Select relevant queue state variables. This step involves examining the queue state variables and removing those that are not relevant to the dispatching rule, y k . The relevance of a queue state variable is calculated, and variables with high relevance are selected as inputs in the training. We adopt ANOVA's (analysis of variance) F-statistics for class-relevance, as suggested by [9]: where the group denotes a set of samples with the same optimal dispatching rule.  Table 3 shows the structure of the queue dataset that is used as the training dataset for the dispatching rule, where S 1 , S 2 , · · · , S Q ⊂ {Ψ 1 , Ψ 2 , · · · , Ψ 21 } denotes a set of (selected) relevant queue state variables. Table 3. Structure of the training dataset for the dispatching rule selection model.

Model Training
We develop a machine learning model that uses the values of the queue state variables to determine the dispatching rule, from which the processing order of actual job batches is derived. We select the artificial neural network because it is widely used and has yielded reliable results in previous research [10]. It should be noted that any type of classification model can be used in place of the neural network model.
Recall that, according to the steps of Section 3.1, values of the queue state variables Ψ i (i = 1, 2, · · · , 21) are calculated from the generated dataset, and the optimal dispatching rule y is derived, which gives the minimum tardiness. We use the ANOVA to select the relevant queue state variables S 1 , S 2 , · · · , S Q , and then (S 1 , S 2 , · · · , S Q , y) composes one sample of the training dataset that will be an input to the neural network. Therefore, the neural network is trained against the dispatching rule for each given values of the queue state variables.
The neural network in our model exhibits an input layer, an output layer, and two hidden layers. Each node in the input layer corresponds to each queue state variable, and each node in the output layer matches the candidate dispatching rule. The number of nodes in the hidden layers are parameters that should be tuned. We test 45 models with nodes in each hidden layer (h 1 , h 2 ) = (5, 1), (5, 2), . . . , (5, 5), (6, 1), . . . , (10, 10), where h 1 and h 2 denote the number of nodes in the first and second hidden layers, respectively, and we select the parameters exhibiting the optimal performance with five-fold cross validation.

Clustering Model to Group Jobs
Similar jobs are grouped together to form a group of batches when a set of jobs to be processed are newly entered. Jobs in each batch are simultaneously processed.

Clustering Model to Group Jobs
When jobs are processed in a batch and not individually, the completion time of the jobs are negatively impacted by the wasted processing time, j∈J (i) p (i) − p j , and residual machine capacity of batch i, δ − j∈J (i) s j . Therefore, minimizing both wasted processing time and residual machine capacity decreases the tardiness of the jobs [11]. Additionally, variance of the jobs' due dates also increases the tardiness of the jobs. Thus, it is necessary to use the processing time (p), due date (d), and release time (r) while grouping the jobs into batches. Specifically, π j = p j , d j , r j denotes the vector of the job characteristics for clustering and is termed as clustering characteristics. Job size s j is considered as the clustering constraint.

Formations of Batches
Grouping jobs is based on the constrained k-means clustering algorithm. It consists of three steps, namely the centroid initialization step, job assignment step, and centroid update step as follows: (i) Centroid initialization step: The initial number of batches is obtained by dividing the total size of the jobs by the machine capacity (i.e., b = n j=1 s j δ ), and the centroids C i for i = 1, 2, · · · , b are randomly initialized.
(ii) Job assignment step: Each job is assigned to the nearest cluster. If there is a cluster that does not satisfy the machine capacity constraints, then one or more jobs in that cluster should be re-assigned. The algorithm identifies a job j * that is the farthest from the cluster's centroid to which it belongs (e.g., cluster i), albeit the closest to the next nearest cluster's centroid, i.e., j * = argmax It assigns job j * to the nearest cluster i * ( i) among those satisfying the capacity constraint, The centroid of each cluster is updated. The job assignment step and centroid update step are repeated until the clustering converges. If the clustering does not converge even after a certain predetermined number of times, then we increase the number of batches by 1, and the procedure is repeated until we have enough number of batches that accommodates all jobs. The following Algorithm 2 summarizes the clustering algorithm described above.

Increase the number of iterations by 1} Output
Job clusters (batches) The following is an illustrative example of the clustering algorithm to improve the reader's understanding.

Model Deployment
After the jobs are grouped via the proposed clustering model, the rule selection model determines the batch to be processed whenever a machine becomes idle. First, the values of the selected queue state variables are calculated from batches, and the current time is set when the machine becomes idle. Next, the rule selection model selects the proper dispatching rule via the trained neural network. Finally, the selected rule calculates the priority of each batch, determines the highest priority, and processes it with the idle machine. Note that the selected rule may be different from the those selected by human experts because scheduling decisions of human and software models can be different from each other, and thus it is recommended not to rely only on the selected rule but to consider a human expert's experience for practical usage as suggested in [12].

Experiment
This section illustrates the application procedure of the proposed framework to batch scheduling problems to minimize total tardiness where every job is released on the first day. We consider a test case with a small number of jobs and verify the proposed framework's effectiveness by comparing it with the optimal schedule obtained via exhaustive search. Please refer to [13] for the algorithm to find the optimal solution of small batch scheduling problems.
Additionally, we show the validity of the proposed framework by comparing it with the schedules from single dispatching rules to solve large-scale realistic problems. Based on the experiment, we suggest that the proposed framework determines an optimal schedule for small problems and a better schedule than those obtained via a single dispatching rule for large problems.

Parameters
We assume that the pmfs and conditional pmfs (cpmfs) of job characteristics are estimated from the historical data listed in Tables 4 and 5, respectively.  Each cell in Table 5 indicates Pr c =ĉ c 0 =ĉ 0 . For example, the value 0.076 in Table 5 denotes the probability Pr p = 2 p 0 = 1 = 0.076, which denotes that the processing time of a newly generated job corresponds to 2 when the processing time of the previously generated sample corresponds to 1. It should be noted that we do not consider Pr r =r r 0 =r 0 because every job is assumed to be released at once. For the purpose of training, we first generated a training dataset with 100,000 samples as explained in Section 3.1. The optimal neural network was obtained by using this training dataset (which exhibits eight nodes in the first hidden layer and nine nodes in the second hidden layer), and the accuracy corresponds to 88.45%. Table 6 illustrates the characteristics of 10 jobs to be processed via one of two identical machines wherein all capacities are 3.

Comparison with Exhaustive Search
We applied Algorithm 2 (Clustering algorithm for grouping jobs) to the jobs, and this resulted in five batches. A batch schedule via the trained neural network was established, and the total tardiness of this schedule was found to be 1.
Subsequently, this result was compared with that obtained by the exhaustive search. All possible schedules were generated by considering the processing order of batches as well as grouping jobs. This results in a total of 1,300,561,920 schedules. Each schedule's total tardiness and maximum tardiness were calculated, and Figure 5 plots the distributions of them for all schedules.  1  1  2  2  2  3  1  5  3  3  2  4  4  1  2  3  5  3  3  6  6  2  1  4  7  3  2  5  8  3  2  4  9  2  There are 55,296 schedules (0.0042% among all the possible schedules) with a total tardiness of 1, and it is confirmed that the schedule obtained from the trained neural network is included in the 55,296 schedules.
We performed an additional experiment with problem sizes similar to those mentioned above to obtain the schedule with the minimum total tardiness via the proposed approach and then compare them with the optimal schedule. We randomly generated 1000 datasets that included 2, 3, and 4 machines with capacities of 3, 4, 5, and 6, and the numbers of jobs were 8, 9, 10, 11, 12, 13, 14, and 15. In the experiment, job characteristics were randomly generated based on Table 4. The results indicate that the proposed approach obtains the optimal schedules in 948 datasets out of 1000 datasets. Based on the result, we conclude that the proposed approach can establish the optimal schedule for small problems in most cases.

Comparison with Single Dispatching Rule
In this subsection, we compare the proposed framework with cases of single dispatching rule when there are hundreds of jobs, thereby resulting in a computationally intractable optimal batch schedule. The experiment involves demonstrating that the proposed method can be applied to a realsized problem and exhibits an excellent performance.
We generated 100 test datasets via Tables 4 and 5 where each dataset contained 300 jobs and there were four identical machines wherein all the capacities were 6. Subsequently, we applied the clustering algorithm to each dataset. The dispatching rule selection model was then applied to the batches, and the average total tardiness was calculated. This was compared with the cases wherein single dispatching rules (Shortest processing time first (SPT), Longest processing time first (LPT), There are 55,296 schedules (0.0042% among all the possible schedules) with a total tardiness of 1, and it is confirmed that the schedule obtained from the trained neural network is included in the 55,296 schedules.
We performed an additional experiment with problem sizes similar to those mentioned above to obtain the schedule with the minimum total tardiness via the proposed approach and then compare them with the optimal schedule. We randomly generated 1000 datasets that included 2, 3, and 4 machines with capacities of 3, 4, 5, and 6, and the numbers of jobs were 8,9,10,11,12,13,14, and 15. In the experiment, job characteristics were randomly generated based on Table 4. The results indicate that the proposed approach obtains the optimal schedules in 948 datasets out of 1000 datasets. Based on the result, we conclude that the proposed approach can establish the optimal schedule for small problems in most cases.

Comparison with Single Dispatching Rule
In this subsection, we compare the proposed framework with cases of single dispatching rule when there are hundreds of jobs, thereby resulting in a computationally intractable optimal batch schedule. The experiment involves demonstrating that the proposed method can be applied to a real-sized problem and exhibits an excellent performance.
We generated 100 test datasets via Tables 4 and 5 where each dataset contained 300 jobs and there were four identical machines wherein all the capacities were 6. Subsequently, we applied the clustering algorithm to each dataset. The dispatching rule selection model was then applied to the batches, and the average total tardiness was calculated. This was compared with the cases wherein single dispatching rules (Shortest processing time first (SPT), Longest processing time first (LPT), Earliest minimum due date first (EMIDD), Earliest maximum due date first (EMADD), Earliest minimum release time first (EMIST), and Earliest maximum release time first (EMAST)) were adopted to the same datasets. Figure 6 shows the average total tardiness over all the datasets. As shown in Figure 6, the proposed model yields the least average total tardiness among all dispatching rules, and this is approximately 4-40% less than the other models. Therefore, it is concluded that the proposed dispatching model yields a more efficient schedule when compared to that of the single dispatching rule, even with respect to a real-sized batch scheduling problem. The proposed model shows only a slight improvement compared to EMIST. This is because this rule seems to be appropriate for the test datasets and the proposed model selects the most appropriate dispatching rule among the pre-defined rules (i.e., SPT, LPT, EMIDD, EMADD, EMIST, and EMAST). It implies that the proposed model relies on the pre-defined rules, and therefore, it is important to consider various rules as pre-defined rules for practical usage.

Conclusions
The batch scheduling problem is one of the most important scheduling problems in the manufacturing process, and consists of two subproblems: grouping jobs into batches and determining the sequence of batches. Heuristic approach is a typical way to solve these problems because they are NP-hard (Non-deterministic Polynomial-time hard). However, the heuristic algorithm has the evident limitation that it needs to be designed differently according to the specific problem. In addition, much previous research has addressed only one of those two subproblems.
Motivated by these issues, in this study, we developed a batch scheduling model that is not problem specific and can solve both subproblems. The model consists of a clustering model and dispatching rule selection model to solve the problems of grouping jobs and sequencing batches during batch scheduling, respectively. The clustering model is developed based on the constrained k-means algorithm using a set of features impacting total tardiness (e.g., due date, processing time, and arrival time) and yields clusters that contain jobs wherein the sums of job sizes do not exceed the machine load capacity. The dispatching rule selection model is a supervised model that uses queue state variables (e.g., the number of batches in the queue and maximum processing time of the batches) as features and the most appropriate dispatching rule as the class label. The results of the experiment indicated that the framework solves batch scheduling problems and yields a good batch schedule.
With respect to future work, the proposed framework can be extended to solve diverse batch scheduling problems, for example, modification of the clustering algorithm to include machine capacities that are not identical wherein different types of objective functions are introduced. As shown in Figure 6, the proposed model yields the least average total tardiness among all dispatching rules, and this is approximately 4-40% less than the other models. Therefore, it is concluded that the proposed dispatching model yields a more efficient schedule when compared to that of the single dispatching rule, even with respect to a real-sized batch scheduling problem. The proposed model shows only a slight improvement compared to EMIST. This is because this rule seems to be appropriate for the test datasets and the proposed model selects the most appropriate dispatching rule among the pre-defined rules (i.e., SPT, LPT, EMIDD, EMADD, EMIST, and EMAST). It implies that the proposed model relies on the pre-defined rules, and therefore, it is important to consider various rules as pre-defined rules for practical usage.

Conclusions
The batch scheduling problem is one of the most important scheduling problems in the manufacturing process, and consists of two subproblems: grouping jobs into batches and determining the sequence of batches. Heuristic approach is a typical way to solve these problems because they are NP-hard (Non-deterministic Polynomial-time hard). However, the heuristic algorithm has the evident limitation that it needs to be designed differently according to the specific problem. In addition, much previous research has addressed only one of those two subproblems.
Motivated by these issues, in this study, we developed a batch scheduling model that is not problem specific and can solve both subproblems. The model consists of a clustering model and dispatching rule selection model to solve the problems of grouping jobs and sequencing batches during batch scheduling, respectively. The clustering model is developed based on the constrained k-means algorithm using a set of features impacting total tardiness (e.g., due date, processing time, and arrival time) and yields clusters that contain jobs wherein the sums of job sizes do not exceed the machine load capacity. The dispatching rule selection model is a supervised model that uses queue state variables (e.g., the number of batches in the queue and maximum processing time of the batches) as features and