Learning Dispatching Rules for Scheduling: A Synergistic View Comprising Decision Trees, Tabu Search and Simulation

: A promising approach for an effective shop scheduling that synergizes the beneﬁts of the combinatorial optimization, supervised learning and discrete-event simulation is presented. Though dispatching rules are in widely used by shop scheduling practitioners, only ordinary performance rules are known; hence, dynamic generation of dispatching rules is desired to make them more effective in changing shop conditions. Meta-heuristics are able to perform quite well and carry more knowledge of the problem domain, however at the cost of prohibitive computational effort in real-time. The primary purpose of this research lies in an ofﬂine extraction of this domain knowledge using decision trees to generate simple if-then rules that subsequently act as dispatching rules for scheduling in an online manner. We used similarity index to identify parametric and structural similarity in problem instances in order to implicitly support the learning algorithm for effective rule generation and quality index for relative ranking of the dispatching decisions. Maximum lateness is used as the scheduling objective in a job shop scheduling environment.


Introduction
Machine scheduling is of principal concern in the planning phase as well as the operation of manufacturing systems.It is aimed at efficiently allocating the available machines to jobs, or operations within jobs and subsequent time phasing of these jobs on individual machines [1].Simulation procedures, analytical models and heuristics are among the traditional solution methodologies for the scheduling problem that are still widely used in industry.However, an increasing complexity of the manufacturing system necessitates a much deeper investigation of the problem domain.
Scheduling problems in the static environment have a known number of jobs and their corresponding ready times are fixed before the actual schedule execution in contrast to the dynamic environment problems in which jobs are continually revealed during the execution process [2].Dynamic scheduling uses priority dispatching rule (PDR) to prioritize jobs waiting for processing at a resource [3].In this approach, a score is associated dynamically for each possible assignment of a task to a particular resource.The objective is to select the task with a minimum or maximum assigned score (as the case may be) for the chosen resource [4].Due to their ease of implementation and substantially reduced computational requirement, they remain a very popular technique despite of their poor performance in the long run [5,6].
The major drawbacks of PDRs include their performance dependence on the state of the system and non-existence of any single rule that is superior to all the others for all possible states the system might be in [7].Meta-heuristics (e.g., simulated annealing, Tabu search, etc.) have an advantage over PDRs in terms of solution quality and robustness, however, these are usually quite difficult to implement and tune as well as computationally too complex to be used in a real-time system.
Robust and better-quality solutions provided by meta-heuristics contain useful but implicit knowledge about the problem domain and solution space explored.Such a set of solutions represents a wealth of scheduling knowledge to the domain that can be transformed in a usable form of decision tree or a rule-set.In this paper, we propose an approach to exploit this scheduling knowledge in this way.In our approach, we seek this hidden scheduling knowledge through a data mining module to identify a rule-set by exploring the patterns in the solution set obtained by an optimization module based on Tabu search, a very efficient meta-heuristic for Job-Shop Scheduling Problem (JSSP) [8,9].This rule-set approximates the output of the optimization module when incorporated in a simulation model of the system.It is subsequently used to make dispatching decisions in an online manner.
The rest of the paper is organized as follows.First, we present a concise review of the closely related literature on the use of learning in scheduling.Subsequently, the general structure of the proposed methodology is outlined, followed by the details of each module with its working, significance and coordination with other modules.Section 3 provides a brief description of the experimental setup.The results of the experiments are then presented with a comparison to standard dispatching rules for a set of benchmark job shop scheduling problems.Finally, the paper concludes in the light of findings with a brief note on future research directions.

Learning in Shop Scheduling: A Concise Review
Generally, the approach of learning in shop scheduling stems from the idea of dynamic selection of the priority dispatching rules owing to the inability of traditional PDR-based job dispatching to adapt for the changing conditions of the shop floor.Extending the idea of the dynamic selection using simulation towards the dynamic modification and subsequently to the dynamic generation of new rules using the machine learning based techniques resulted in a significant increase of research interest in this area.
In the context of solving various types of shop scheduling problems, Inductive learning [10][11][12], artificial neural-networks [13], case-based reasoning [14], support vector machines [15], reinforcement learning [16], fuzzy logic [17], evolutionary learning, genetic programming [18], analytical network processes [19] and artificial immune networks [20,21] ,are among the major learning based approaches with a large number of their variants that have been employed by researchers.Table 1 lists a few of the selected works that employed learning in scheduling.For detailed reviews on the subject, refer to [14,22,23] and an updated review by [24].

Proposed Methodology
The framework we propose consists of four modules, namely the control module, simulation module, optimization module and learning module.In addition, there is a set of three databases.The objective of the proposed framework is to generate a set of rules for making dispatching decisions.Our focus is on the job shop scheduling environment.Figure 1 illustrates the workflow of the proposed approach.Initially a set of problem instances is generated by the control module under pre-specified settings.These problem instances are categorized by the control module on the basis of similarity index (SI) in order to better reflect their individual contribution.Each problem instance tagged with a vector of similarity indices is subsequently stored in an instance database.
The optimization module generates solutions in the form of sequence (π) for a subset of these instances to start with.The collection of these job shop instances and the corresponding solutions form the basis for the initial training dataset.In effect, it is an implicit collection of good scheduling decisions made by the optimization module.It is required to explicitly identify each decision and assign an index of quality to it.However, the downstream decisions significantly affect the computed index of quality.A more relevant quality index (τ δ k ) is obtained by taking this factor into account as well as the processing times of the operations involved in the dispatching decision.The simulation module generates and associates this quality index to each decision taken by the optimization module.
The entire collection of decisions along with the assigned quality indices are used as relevant scheduling knowledge.This scheduling knowledge is kept in a scheduling database in the form of Predictor-Value-Index (PVI) trio to be used subsequently by a learning process.As the characteristics of each solution instance are implicitly linked with the complexity and the structure of the problem, the matrix of similarity index is also used in the learning process.Based on this scheduling knowledge, a decision tree is generated by the learning process.
This decision tree is then used to dispatch the jobs-awaiting service in an online manner.The decision tree is dynamically updated, whenever necessary through a control module.The control module transmits the knowledge of good scheduling decisions to the optimization module as well to improve its own performance at subsequent levels.In the following subsections, the structure and working of each module is described in detail.

Control Module
The functions of the control module include the generation of the relevant scheduling problem instances, classification of these problem instances based upon the similarity and relative complexity and synchronization of the scheduling decisions with all the three databases.
The generation of the problem instances is a quite simple process.It includes the controlled random processes for generating a set of operations with positive processing times, assignment of these operations to the jobs and assigning a due-date to each job.The problem instances are in fact acyclic graphs with directed and undirected arcs.This is disjunctive graph representation proposed by [44] that is able to effectively represent a Job Shop Scheduling Problem (JSSP).An example of a disjunctive graph representation of a small size JSSP with four machines and three jobs is shown in Figure 2. The three jobs have ready times r 1 , r 2 and r 3 , respectively.The node O ik represents operation of the i th job to be processed on machine k with a processing time of p ik (labeled on edges or conjunctive arcs).The first and the last nodes are the dummy nodes.Operations to be performed on the same machine are connected through disjunctive arcs (dotted edges with arrows on both sides).The control module creates a matrix of similarity index values for such disjunctive graphs for all problem instances.The similarity index value is an estimate of how much resemblance a problem instance has, to any other problem instance.Rajendran et al. [45] provide a survey on the various methods used in finding graph similarity.In our experiments, the computation of the similarity indices are made on the loopy belief propagation based algorithm proposed by [46].For this purpose, it takes into account the structure of the sub-graphs over the entire set of nodes and normalized Euclidean distance among these sub-graphs.

Optimization Module
The optimization module has a pivotal role in the proposed methodology.As the learning algorithm relies on the quality of the scheduling decisions taken by the optimization module, it is very important that the solutions provided by the optimization module are of good quality.Moreover, it is desired that there is some means of quantifying the quality of these solutions.The working of the optimization modules is based on the Tabu search (TS) algorithm proposed by [47].Jain et al. [9] found the Tabu search (TS) algorithms, in general, to be very effective for finding solutions of JSSP.This is mainly attributed to its powerful memory function [48] in coordination with neighborhood structures and flexible move evaluation strategies [5].This is in disparity with what limited capability PDRs can do due to their myopic nature.
The optimization module finds the solutions to a selected set of problem instances that are initially referred to as efficient solutions in this methodology.This is due to the assumption that these solutions are obtained through rigorous and intelligent moves made by the algorithm.The optimization module works in an offline manner, however, it continuously provides more and more solutions to the problem instances generated by the control module.

Simulation Module
Simulation module is a multi-purpose module that is tightly integrated with all other modules.One of the key functions of the simulation module is to transform the solution provided by the optimization module into a set of dispatching decisions.Each such decision is simply a yes/no-value for a job A to be dispatched before some other job B. At each instant, when a decision is to be taken for the next job to be dispatched, all the jobs competing for a particular resource are alternatively tested (the alternate decision) as per the sequencing order provided by the optimization module.
Simulation module is also in charge for the generation of values for a set of pre-selected attributes, referred as predictors to be used by the learning module.This is done corresponding to each instant at which a dispatching decision is made.It is worth-mentioning that these decision-points (the time instants when the decision is made) are dependent on the solution provided by the optimization module as well as the problem structure and parameters.This means that for two solutions of the same quality with different sequencing order, one may have the same or different decisions points and hence the values for the selected attributes accordingly.Despite this fact, a particular decision may be same for both solutions.This is illustrated by an example of 3 ˆ3 problem instance given in Table 2.The schedules illustrated in Figures 3 and 4 have only a difference in sequencing order at machine 2, i.e., job 3 and job 2 have interchanged their order.This does not change the C max .The simulation module assigns a quality index (QI) for each solution, generated by the optimization module, based on the bound that is also computed by the simulation module using multi-pass simulator.It is computed as QI " f f , where f ˚and f are respectively the values of the objective function computed by the optimization module using Tabu search and the simulation module using the best dispatching rule.This quality index is used to categorize the generated solutions for subsequent use by the learning algorithm.Each solution is included in the learning set based upon certain characteristics it possesses.These characteristics are the implicit relations among operations that govern the particular sequencing order we have for an efficient solution.It is not a trivial task to unfold these relations, however a certain set of guidelines can be attained.The multi-pass simulator incorporated in simulation module simulates the problem instances solved by the optimization module and finds a set of schedules using a pre-defined set of PDRs.A schedule σ b with best objective value is used as a reference schedule among this set of schedules.The objective value is set as a bound, f b or f on the solution-set for that problem instance.This bound may be used as a characteristic for the set of the dispatching decisions, {δ k } taken in the schedule σ ˚, generated by the optimization module.
By generating an alternative dispatching decision, δ k -that is, by swapping the sequence of two jobs namely i and j with processing times p i and p j , respectively, on a machine h in the schedule σ ˚, the simulation module computes and assigns a quality index τ δ k to each dispatching decision δ k .This is done by generating a set of active schedules, namely tσ Åk u.Hence, we have as many alternate schedules as there are original dispatching decisions, z made in the optimal schedule.It is worth mentioning that the computed quality index is dependent on the downstream operations.The value of the quality index, τ δ k for a kth dispatching decision δ k is computed as, where f `σ˚, δ k ˘represents the value of the performance measure obtained by incorporating the dispatching decision δ k in the original schedule σ ˚, and ω 1 and ω 2 are the weighting factors and p is the remaining processing times for the jobs including p i and p j .Note that f pσ ˚, ϕq represents the value of the performance measure for original schedule σ ˚without incorporating any change (in short, also represented as f pσ ˚q or f ˚).This notation is used for the consistency.It is worth-mentiong here that this quality index, τ δ k is different from the quality index (QI), associated with the schedule.Each dispatching decision δ k has its own quality index τ δ k in contrast to the value of the lower bound, f b , which is same for all dispatching decisions.For example, for the schedule shown in Figure 5 for a problem instance (6 ˆ6) of Table 3, if the sequence of operations for job 4 and job 3 on machine 2 is swapped to get a new active schedule as shown in Figure 6, we can assign a quality index τ δ 5 to the dispatching decision δ 5 (i.e., job 3 to be processed before job 4 on machine 2) as follows: where pσ ˚, ϕq " L max pσ ˚q " 24, f `σ˚, δ k ˘" L max `σ˚, δ k ˘" 29, k " 5 and z " 20.  3.

Learning Module
Learning module uses the induction process of decision trees as a learning mechanism.The conceptual process of classification of dispatching decisions stored in the schedule database is relatively simple when represented as a decision tree.This simplicity and the transparency is the major motivation in adopting the decision tree based learning for the proposed approach.In addition, decision tree based learning helps to bridge the gap between the pure reactive nature of PDR and the predictive centralized approaches that are generally prohibitive in online scheduling due to delayed response [49].Learning module uses C4.5 algorithm for mining the implicit information in the dispatching decisions.
The dispatching decisions made by the optimization module and transformed by the simulation module are stored in a schedule database for a subsequent learning process.It is worth mentioning that all the decisions including the alternate decisions generated by the simulation module are in the same database along with the corresponding quality index.It is assumed to have a better learning accuracy even with decisions that are not generated by the optimization module.This is because we have a quality index with each decision that helps identifying the areas of greater significance in the search space.
Construction of a relevant training dataset is very crucial point in the entire KDD process.Scheduling database in the form of Predictor-Value-Index (PVI) trios form the basis for the training dataset.This information forms a number of cluster-sets for each attribute based on their values.From the data mining perspective in JSSP, the target concept to be learned is to determine which job should be dispatched first within a set of jobs that are ready to be scheduled on the same machine at a particular instant.Extracting this knowledge from the training dataset would allow us to dispatch the next job at any given time and thereafter to create dispatching lists for any set of jobs rx k y k s forms a collection of positive training examples from a single problem instance with k dispatching decisions, where x k " rx 1k x 2k . . .x lk s represnts values for the l selected predictors (Table 4) and y k P t0, 1u is the value for the target concept for k th decision.The target concept, y k (y k :" precedes u,v,q ) is a binary variable representing the processing order of two jobs u, v to be processed on machine q, i.e., y k " 1 represents u Ñ v in lexical order on machine q and vice versa.For example, consider the schedule shown in Figure 5.At t = 0, the jobs J 3 , J 4 and J 6 compete for the machine M 1 .To make a dispatching decision for the position 1 on M 1 and to generate the relevant data at t = 0, three comparisons are made namely (3,1) vs. (4,1), (3,1) vs. (6,1), and (4,1) vs. (6,1), where (3,1) signifies operation number 1 of J 3 and so on.Since the optimization module selected J 6 before J 4 , as shown in Figure 5, the target concept has a value of 0. Similarly, 0 is returned for other two comparisons.At t = 5, we have again three comparisons resulting value of 0 for target concept for all of them and then at t = 10, we have four comparisons, and a comparison, for instance, (1,2) vs. (5,4) returns a value of 1 for the target concept.The corresponding values of the predictors are computed at the time the decision is made.Hence we have as many rows as positive learning examples as the number of comparisons made at various decision points at each machine as per the solution generated by the optimization module.
Finally, the decision tree induced using the learning algorithm can be applied directly to the same JSSP to validate the explored knowledge and as a predictive model to predict the target concept.A set of scheduling problem instances chosen from the database in accordance with their similarity indices is to be used as a test dataset for the scheduling knowledge discovered.The overall sequence of operations obtained by these rules is translated to a schedule using a schedule generator.Thus, the tree will, given any two jobs, predict which job should be dispatched first and can be thought of as a new, previously unknown rule.In addition to the prediction, decision trees and decision rules reveal insightful structural knowledge that can be used to further enhance the scheduling decisions.
Proper selection of the relevant predictors and creation of new predictors that are more pertinent to the desired target concept has the key role in the learning process.The entire learning process significantly suffers with the poor selection and inappropriate creation of the predictors.It is the task of finding the most reasonable subset of predictors for a classifier to seek fewer predictors and maximum class separability [35].This process is also critical for the effectiveness of the subsequent model induction by eliminating certain redundant and irrelevant predictors.Effective use of cluster-sets in tandem with selected attribute-set helps in generating a better quality decision tree.
Both the creation of new predictors and selection of predictors (we call the both process combined as attribute extraction) are primarily linked with the objectives of the JSSP.Tardiness based objectives require different predictors to be taken into account while flow-time based objectives have different requirements.For instance, deadline related statistics and counters are more suited for tardiness based objectives [50].
There exists a strong relation among the sequencing of operations due to precedence constraints, however, considering only two (operations of the) jobs to be processed by the same machine among schedulable jobs (the predecessor, if any, of whom are already dispatched) at any instance for the comparison reduces this dependency effect.Proper attribute extraction plays an important role to reduce this dependency as well.
Arithmetic combinations of primitive predictors can also be used to generate new useful predictors.However, a large set of predictors is not desirable, as the predictors are generally not independent of each other, making the process computationally impractical.Several heuristics, such as backward stepwise heuristic and forward stepwise group heuristic, have been proposed to limit the selected subset of predictors while maintaining a certain performance level [51].Each simulation scenario as a static control rule and necessary predictors are collected at data collection points and saved in corresponding file.

Experimental Setup
Two sets of 6 ˆ6 similarly sized instances of a static job shop problem with different seed values are used as training and test data.All jobs are available simultaneously at time zero.Discrete uniform distribution between 1 and 10 is used to generate the operation processing times.The job due dates are determined using two parameters τ and ρ, where τ determines the expected number of tardy jobs (and hence the average tightness of the due dates) and ρ specifies the due date range.Once these parameters have been specified, the job due dates are generated from the discrete uniform distribution given as, where u " p1 ´τq E rC max s is the mean due date.E rC max s denotes the expected makespan for the problem instance and is calculated as, E rC max s " Note that this assumes no idle time on machines, and hence will be an optimistic estimate of C max .We consider τ " 0.3 and ρ " 0.5 with L max (Maximum Lateness) used as the scheduling objective.Table 5 lists the parameters used in the experimental setup.Selection of the relevant predictors has a key role in obtaining the appropriate performance level.High dimensionality poses a challenge to learning tasks.Due to irrelevant predictors, classification algorithms tend to over-fit training data and degrade the generalization ability [52,53].The selected predictors have the following characteristics: the predictors are related to tardiness based performance measures.It is preferred to define predictors in relative values instead of absolute values.The predictors with high variation are discretized.Table 4 lists the predictors used in the experiments.The selection of these predictors is generally on the basis of earlier research (see for example, [35,54,55]).It is acknowledged though that a simple attribute-selection model is not capable to generate and guarantee even near-optimal subset of predictors.A more rigorous approach for the combinatorial attribute selection may be used at the cost of extra computational complexity (see [56] for review and limitations of other approaches).
The predictors p max ´ϑ and ϑ use the remaining processing time for direct comparison of two jobs, as an average measure and relative value among all the jobs in system.It is important to note that it is the combined effect of these predictors along-with others, which plays a role in strengthening their relation with the target concept.For example, ϑ in relation with n t and p max ´ϑ affects the n p n t .Hence it is not a very trivial task to identify a relationship of predictor with the target concept.The bound on the value of the performance measure is also used as a predictor, however it is again not independent from the other selected predictors.
For this study, we used the binning method-an unsupervised discretization method-to establish banded categories for the predictor values.Based on the values of the mean and standard deviation of the distribution of the specified predictor(s), we generate a field with banded categories.Figure 7 illustrates a ˘3 standard deviation based discretization for a predictor to generate seven bins.However, creating banded categories based on standard deviations may result in some bins being defined outside the actual data range and even outside the range of possible data values, affecting the contribution of the predictor in the learning process.

Results and Discussion
The decision tree with a maximum allowed limit of 30 splits along with the cross validation is grown using the experimental setup described with training dataset consisting of 100 instances.A set of 49 rules with a class labeling error of 28.06% is obtained.It is observed that there is no significant improvement in the class error by allowing more number of splits.
The rule-set is applied to the instances of the test set.Figure 8 shows the box plot of L max values for the set of PDRs given in Table 6, Tabu search used in the optimization module and the rule-set obtained by the decision tree based learning algorithm.The validation accuracy of the rule-set obtained is found to be 60.5%.Figure 9 shows the plot of the confusion matrix for the decision tree grown.It is worth mentioning that another grown decision tree employing the quality index, τ δ k as an additional predictor results in a significant boost to the validation accuracy with a set of 37 rules while all other parameters remain the same.This is illustrated in Figure 10 using the confusion matrix plot of the decision tree.This requires computations for the quality index of each decision, which is an expensive process.The tree we obtained by including the quality index predictor for the test instances requires these computations a priori based on the optimal decisions made by the optimization algorithm.However, a procedure to find an estimate for the value of the quality index would result in better performance.A comparison of the two confusion matrix plots reveals that the original decision tree was unable to effectively predict the class with label 0, i.e., the operation not selected as the first operation resulting in the specificity value to be 3.9 %.This value is significantly increased to 77.3% with the inclusion of τ δ k in the predictor-set.

Conclusions
A synergy of optimization, simulation and learning is used to better address the problem of shop scheduling.The cooperative interaction of these areas is desired in scheduling due to the proven effectiveness of each of them separately.Optimization provides with efficient schedules but the computational complexity prohibits its use in an online manner.Dispatching rules are quick but lack robustness and adaptability.Simulation enables making a comparison of the effectiveness of dispatching rules, analyzing behavior of scheduling strategies and understanding the problem domain.Learning makes use of the implicit knowledge contained in the problem domain and efficient solution domain to approximate the behavior of efficient solution domain identified by the optimization.
A detailed description of the arrangement and functionality of different modules of the proposed methodology is provided.For a set of similar-size instances of job shop scheduling problem, the results on the maximum lateness using the proposed methodology are presented.In most of the real-world sized JSSPs, the optimal solutions are not obtainable or implementable due to the complex dynamic nature of the problem.However, through this approach several alternative solutions could be proposed that are not only sufficiently efficient but as easy to implement as the traditional dispatching rules are.However the underlying assumption for an effective implementation of the methodology is that the optimization process is able to capture the inherent problem structure in regards with the scheduling objectives.A natural extension of the proposed methodology in this context would be to reuse this knowledge in order to effectively solve the problem instances of matching similarity index.
By making use of alternative dispatching decisions, we associated a quality index to each dispatching decision.This value estimates the significance of a particular dispatching decision.In combination with a group of similar decisions arising from different or similar problem instances, learning algorithm is able to build a density map of the dispatching decision.This helps in analyzing the efficient schedules in comparison with relatively less efficient schedules.
One of the objectives of the proposed approach is to capture the effect of disjunctive constraints on the dispatching decision during the different phases of the schedule generation.This is partially achieved by fairly improved prediction accuracy, however it lacks the implementation scheme for the test instances.Moreover, an improved metric for the quality index may be devised for superior performance.
It is not unusual to have a different set of selected predictors in regards with scheduling objective.In fact, it is a key factor for the successful implementation of the proposed framework.Predictor selection for different objectives and their combinations has to be rigorously explored to obtain compact and efficient rule set.

Figure 1 .
Figure 1.Workflow of the proposed approach.

Figure 2 .
Figure 2.An Example of a disjunctive graph for a Job Shop Scheduling Problem (JSSP).

Figure 4 .
Figure 4. Schedule with an alternate sequence.

Figure 5 .
Figure 5.A schedule for the problem instance of Table3.

Figure 6 .
Figure 6.An alternative schedule obtained by swapping a dispatching decision at machine 2 from Figure 5.
For each dispatching decision, we have a set of alternate dispatching decisions as negative examples with associated quality index values.For a complete set of training instances, we have a collection of rx k y k s as positive training examples and associated negative examples for each instance of the training set.

Figure 8 .
Figure 8. Box plot of Maximum Lateness values for set of Priority Dispatching Rules (PDRs), Tabu search (TS) and rule-set.

Figure 9 .
Figure 9. Confusion Matrix plot for the original grown decision tree.

Figure 10 .
Figure 10.Confusion Matrix plot with inclusion of quality index in predictor-set.

Table 1 .
List of selected references for learning in shop scheduling.

Table 3 .
A 6 ˆ6 problem instance with due dates.
fBound on the value of f , where f " L max .QIQuality Index of the best solution among solutions provided by PDRs, " f f .

Table 5 .
Summary of parameters for experimental setup.

Table 6 .
Definitions of benchmark priority dispatching rules.
Partition into three queues, late queue, operationally late queue and ahead-of-schedule queue, with SI as selection criterion within queues.Shifting of job to other queues is not allowed.