Linking Scheduling Criteria to Shop Floor Performance in Permutation Flowshops

The goal of manufacturing scheduling is to allocate a set of jobs to the machines in the shop so these jobs are processed according to a given criterion (or set of criteria). Such criteria are based on properties of the jobs to be scheduled (e.g., their completion times, due dates); so it is not clear how these (short-term) criteria impact on (long-term) shop floor performance measures. In this paper, we analyse the connection between the usual scheduling criteria employed as objectives in flowshop scheduling (e.g., makespan or idle time), and customary shop floor performance measures (e.g., work-in-process and throughput). Two of these linkages can be theoretically predicted (i.e., makespan and throughput as well as completion time and average cycle time), and the other such relationships should be discovered on a numerical/empirical basis. In order to do so, we set up an experimental analysis consisting in finding optimal (or good) schedules under several scheduling criteria, and then computing how these schedules perform in terms of the different shop floor performance measures for several instance sizes and for different structures of processing times. Results indicate that makespan only performs well with respect to throughput, and that one formulation of idle times obtains nearly as good results as makespan, while outperforming it in terms of average cycle time and work in process. Similarly, minimisation of completion time seems to be quite balanced in terms of shop floor performance, although it does not aim exactly at work-in-process minimisation, as some literature suggests. Finally, the experiments show that some of the existing scheduling criteria are poorly related to the shop floor performance measures under consideration. These results may help to better understand the impact of scheduling on flowshop performance, so scheduling research may be more geared towards shop floor performance, which is sometimes suggested as a cause for the lack of applicability of some scheduling models in manufacturing.


Introduction
To handle the complexity of manufacturing decisions, these have been traditionally addressed in a hierarchical manner, in which the overall problem is decomposed into a number of sub-problems or decision levels [1]. Given a decision level, pertinent decisions are taken according to specific local criteria. It is clear that, for this scheme to work efficiently, the decisions among levels should be aligned to contribute to the performance of the whole system. Among the different decisions involved in manufacturing, here we focus on scheduling decisions. Scheduling (some authors use the term "detailed scheduling") is addressed usually after medium-term production planning decisions have been considered, since production planning decision models do not usually make distinction between products within a family, and do not take into account sequence-dependent costs, or detailed machine capacity [2]. A short-term detailed scheduling model usually assumes that there are several jobs-each one with its own characteristics-that have to be scheduled so one or more scheduling criteria are minimised. The schedule is then released to the shop floor, so the events in the shop floor are executed according to the sequence and timing suggested by the schedule [3]. Therefore, there is a clear impact of the chosen scheduling criteria on (medium/long term) shop floor performance, which is eventually reflected on shop floor performance measures such as the throughput of the system (number of jobs dispatched by time unit), cycle time (average time that the jobs spend in the manufacturing system), or work in process. As these performance measures can be linked to key aspects of the competitiveness of the company (e.g., throughput is related to capacity and resource utilisation, while cycle time and work in process are related to lead times and inventory holding costs), the chosen scheduling criterion may have an important impact in the performance of the company, so it is important to assess the impact of different scheduling criteria on shop floor performance measures. However, perhaps for historical reasons, the connection between shop floor performance measures and scheduling criteria has been neglected by the literature since, to the best of our knowledge, there are not contributions addressing this topic. In general, the lack of understanding and quantification of these connections has led to a number of interrelated issues: • Some widely employed scheduling criteria have been subject of criticism due to their apparent lack of applicability to real-world situations (see, e.g., the early comments in [4] on Johnson's famous paper, or [5] and [1] on the lack of real-life application of makespan minimisation algorithms), which suggest a poor alignment of these criteria with the companies' goals. • Some justifications for using specific scheduling criteria are given without a formal proof. For instance, it is usual in the scheduling literature to mention that minimising the completion time in a flowshop leads to minimising work-in-process, whereas this statement-as we discuss in Section 2.2-is not correct from a theoretical point of view. • Some scheduling criteria employed in manufacturing have been borrowed from other areas. For instance, the minimisation of the completion time variance is taken from the computer scheduling context; therefore their potential advantages on manufacturing have to be tested.

•
There are different formulations for some scheduling criteria intuitively linked to shop floor performance: While machine idle time minimisation can be seen, at least approximately, as related to increasing the utilisation of the system, there are alternative, non-equivalent, manners to formulate idle time. Therefore, it remains an open question to know which formulation is actually better in terms of effectively increasing the utilisation of the system. • Finally, since it is customary that different, conflicting goals have to be balanced in the shop floor (such as balancing work in process, and throughput), it would be interesting to know the contribution of the different scheduling criteria to shop floor performance in order to properly balance them.
Note that, in two cases, the linkages between scheduling criteria and shop floor performance measures can be theoretically established. More specifically, it can be formally proved that makespan minimisation implies maximising the throughput, and that completion time minimisation implies the minimising the average cycle time. However, for the rest of the cases such relationships cannot be theoretically proved, so they have to be tested via experimentation. To do so, in this paper we carry out an extensive computational study under a different variety of scheduling criteria, shop floor performance measures, and instance parameters.
Since the mathematical expression of the scheduling criteria is layout-dependent, we have to focus on a particular production environment. More specifically, in this paper we assume a flow shop layout where individual jobs are not committed to a specific due date. The main reason for the choice is that flow line environments are probably the most common setting in repetitive manufacturing. Regarding not considering individual due dates for jobs, it should be mentioned that both scheduling criteria and shop floor performance measures differ greatly from due date related settings to non due date related ones, and therefore this aspect must be subject of a separate analysis. Finally, we also assume that all jobs to be scheduled are known in advance.
The results of the experiments carried out in this paper show that 1. There are several scheduling criteria (most notably the completion time variance and one definition of idle time) which are poorly related with any of the indicators considered for shop floor performance. 2. Makespan minimisation is heavily oriented towards increasing throughput, but it yields poor results in terms of average completion time and work-in-process. This confines its suitability to manufacturing scenarios with very high utilisation costs as compared to those associated with cycle time and inventory. 3. Minimisation of one definition of idle times results in sequences with only a marginal worsening in terms of throughput, but a substantial improvement in terms of cycle time and inventory. Therefore, this criterion emerges as an interesting one when the alignment with shop floor performance is sought. 4. Minimisation of completion times also provides quite balanced schedules in terms of shop floor performance measures; note that it does not lead to the minimisation of WIP, as recurrently stated in the literature.
The rest of the paper is organised as follows: In the next section, the scheduling criteria and shop floor performance measures to be employed in the experimentation are discussed, as well as the theoretically provable linkages among them. The methodology adopted in the computational experience is presented in Section 3.2. The results are discussed in Section 4. Finally, Section 5 is devoted to outline the main conclusions and to highlight areas for future research.

Background and Related Work
In this section, we first present the usual scheduling criteria employed in the literature, while in Section 2.2 we discuss the usual shop floor performance measures, together with the relationship with the scheduling criteria that can be formally proved. For the sake of brevity, we keep the detailed explanations on both criteria and performance measures at minimum, so the interested reader is referred to the references given for formal definitions.

Scheduling Criteria
Undoubtedly, the most widely employed scheduling criterion is the makespan minimisation (usually denoted as C max ) or maximum flow time (see, e.g., [6] for a recent review on research in flowshop sequencing with makespan objective). Another important measure is the (total or average) total completion time or ∑ C j . Although less employed in scheduling research than makespan, total completion time has also received a lot of attention, particularly during the last years. Just to mention a few recent papers, we note the contributions in [7,8].
An objective also considered in the literature is the minimisation of machine idle time, which can be defined in (at least) three different ways [9]:

•
The idle time, as well as the head and tail, of every single machine, i.e., the time before the first job is started on a machine and the time after the last job is finished on a machine, but the whole schedule has started on the first machine and has not been finished yet on the last machine, can be included into the idle time or not. In a static environment, including all heads and tails means that idle time minimisation is equivalent to minimisation of makespan (see, e.g., in [4]). This case would not have to be considered further.

•
Excluding heads and tails would give an idle time within the schedule, implicitly assuming that the machines could be used for other tasks/jobs outside the current problem before and after the current schedule passes/has passed the machine. This definition of idle time is also known as "core idle time" (see, e.g., in [10][11][12]) and it has been used by [13] and by [14] in the context of a multicriteria problem. We denote this definition of idle time as ∑ IT j .
• Including machine heads in the idle time computation whereas the tails are not included means that the machines are reserved for the schedule before the first job of the schedule arrives but are released for other jobs outside the schedule as soon as the last job has left the current machine. In the following, we denote this definition as ∑ ITH j . This definition is first encountered in [15] and in [16] and it has been used recently as a secondary criterion for the development of tie-breaking rules for makespan minimisation algorithms (see, e.g., [17,18]). Figure 1 illustrates these differences in idle time computation for an example of two jobs on three machines. The light grey time-periods (IT and Head) are included in our idle time definition whereas the Tail is not. In the literature, an equivalent expression for heads and tails are Front Delay and Back Delay, respectively, see in [19] or [9]. Finally, the last criterion under consideration is the Completion Time Variance (CTV). CTV was originally introduced by [20] in the computer scheduling context, where it is desirable to organise the data files in on-line computing systems so that the file access times are as uniform as possible. It has been subsequently applied in the manufacturing scheduling context as it is stated to be an appropriate objective for just-in-time production systems, or any other situation where a uniform treatment of the jobs is desirable (see, e.g., in [21][22][23][24]). In the flow shop/job shop scheduling context, it has been employed by [25][26][27][28][29][30][31][32].

Shop Floor Performance Measures
Shop floor performance is usually measured using different indicators. Among classical texts, Goldratt [33] mentions throughput, inventory, and operating expenses as key manufacturing performance measures. Nahmias [34] mentions the following manufacturing objectives: meet due dates, minimise WIP, minimise cycle time, and achieve a high resource utilisation. Wiendahl [35] identifies four main objectives in the production process: short lead times, low schedule deviation, low inventories, and high utilisation. Hopp and Spearman [1] list the following manufacturing objectives: high throughput, low inventory, high utilisation, short cycle times, and high product variety. Li et al. [36] cites utilisation and work-in-process as the two main managerial concerns in manufacturing systems. Finally, throughput and lateness are identified by several authors (e.g., [37,38]) as the main performance indicators in manufacturing.
Although these objectives have remained the same during decades [39], their relative importance has changed across time [40], and also depends on the specific manufacturing sector (for instance, in the semiconductor industry, average cycle time is regarded as the most important objective, see, e.g., [41] or [42]). According to the references reviewed above, we consider three performance measures: Throughput (TH), Work-In-Process (W IP), and Average Cycle Time (ACT) as shop floor performance indicators. With respect to other indicators mentioned in the reviewed references, note that one of them is not relevant in the deterministic environment to which this analysis is constrained (low schedule deviation), while other is not specifically related to shop floor operation (high product variety). Furthermore, as our study does not assume individual due dates for jobs, we exclude due date related measures, although we wish to note that, quite often short cycle times are employed as an indicator of due date adherence [38,43]. Finally, we prove below that utilisation and throughput are directly related, so utilisation does not need to be considered in addition to throughput.
Regarding the relationship of the shop floor performance measures with the scheduling criteria, it is easy to check that TH the throughput may be defined in terms of C max (S) the makespan of a sequence S of n jobs, i.e., As a result, throughput is inversely proportional to makespan. Note that the utilisation U(S) can be defined as (see, e.g., [36]): therefore, it is clear that U(S) = ∑ i ∑ j p ij n · TH(S), and, as is constant for a given instance, then it can bee seen that the two indicators are fully related.
Accordingly, ACT average cycle time can be expressed in terms of the completion time, see, e.g., [44]: It follows that the total completion time is proportional to ACT. Since TH, ACT and W IP are linked through Little's law, the following equation holds.
From Equation (4), it may be seen that total completion time and W IP minimisation are not exactly equivalent, although it is a common statement in the scheduling flowshop literature: It is easy to show that the two criteria are equivalent for the single-machine case, but this does not necessarily hold for the flowshop case.
As, apart from the two theoretical equivalences above discussed, there are no straightforward relationship between the scheduling criteria and the shop floor performance measures, such relationships should be empirically discovered over a high number of problem instances. This computational experience must take into account that the results might be possibly influenced by the instance sizes and the processing times employed. The methodology to carry out the experimentation is described in the next section.

Computational Experience
The following approach is adopted to asses how the minimisation of a certain scheduling criterion impacts on the different shop floor indicators: 1. Build a number of scheduling instances of different sizes and with different mechanisms for generating the processing times. The procedure to build these test-beds is described in Section 3.1. 2. For each one of these instances, find the sequences optimising each one of the scheduling criteria under consideration. For small-sized instances, the optimal solutions can be found, while for the biggest instances, a good solution found by a heuristic approach is employed. The procedure for this step is described in Section 3.2. 3. For each one of these five optimal (or good) sequences, compute their corresponding values of TH, W IP, and ACT. This can be done in a straightforward manner according to Equations (1)-(4). 4. Analyse the so-obtained results. This is carried out in Section 4.

Testbed Setting
Although, in principle, a possible option to obtain flowshop instances to perform our research may be to extract these data from real-life settings, this option poses a number of difficulties. First, obtaining such data is a representative number is complicated. There are only few references publishing real data in the literature (see [45,46]). It may be thus required to obtain such data from primary sources, which may be a research project itself. Second, processing time data are highly industry-dependent, and it is likely that a sector-by-sector analysis would be required, which in turn makes the analysis even more complicate and increases the need of obtaining additional data. Finally, extracting these data from industry would make processing times to be external (independent) variables in the analysis.
Therefore, we generate these data according to test-bed generation methods available in the literature. For the flowshop layout in our research, this means establishing the problem size (number of jobs and machines) and processing times of each job on each machine.
With respect to the values of the number of jobs n and m machines, we have chosen the following: n ∈ {20, 50, 100, 200}, and m ∈ {10, 20, 50}. For each problem size, 30 instances have been generated. This number has been chosen so that the results have a relatively high statistical significance.
Regarding the generation of the processing times, methods for generating processing times can be classified in random and correlated. In random methods, processing times are assumed to be independent from the jobs and the machines, i.e., they are generated by sampling them from a random interval using a uniform distribution [a, b]. The most usual values for this interval are [1,99] (see, e.g., in [47,48]), while in some other cases even wider intervals are employed (e.g., [49] uses [1,200]). Random methods intend to produce difficult problem instances, as it is known that, at least with respect to certain scheduling criteria, this generation method yields the most difficult problems [50,51]. As foreseeable, random processing times are not found in practice [52]. Instead of random processing times, in real-life manufacturing environments it is encountered a mixture of job-correlation and machine-correlation for the processing times, as some surveys suggest (e.g., [53]). To model this correlation, several methods have been proposed, such as those of [54][55][56], or [57]. Among these, the latest method synthesises the others. This method allows obtaining problem instances with mixed correlation between jobs and machines. The amplitude of the interval from which the distribution means of the processing times are uniformly sampled depends on a parameter α ∈ [0, 1]. For low values of α, differences among the processing times in the machines are small, while the opposite occurs for large values of α. For a detailed description of the implementation, the reader is referred to [57].
Finally, it is to note that several works claim the Erlang distribution to better capture the distribution of processing times (e.g., [4,19], or [58]), yet these do not specify whether this has been confirmed in real-life settings. Therefore, we discard this approach.
In [57], the processing times for each job on each machine p ij are generated according to the following steps.
1. Set the upper and lower bounds of processing times, Dur LB and Dur UB , respectively, and a factor α controlling the correlation of the processing times. 2. Obtain the value Interval st by drawing a uniform sample from the interval [Dur LB , j is uniformly sampled from the interval [1,5]. 4. For each job i, a real value rank i is uniformly sampled from the interval [0, 1]. Then, the processing times p ij are obtained in the following manner: where η is a 'noise factor' obtained by uniformly sampling from the interval [−2, 2]. 5. p ij are ensured to be within the upper and lower bounds, i.e. if p ij < Dur LB , then p ij = Dur LB .
The parameter α controls the degree of correlation, so for the case α = 0.0, there is no correlation among jobs and machines. In our research, we consider four different ways to generate processing times: • LC (Medium Correlation): Processing times are drawn according to the procedure described above and α = 0.1. • MC (Medium Correlation): Processing times are drawn according to the procedure described above and α = 0.5. • HC (High Correlation): Processing times are drawn according to the procedure described above and α = 0.9. • NC (No Correlation): Processing times are drawn from a uniform distribution [1,99]. This represents the "classical" noncorrelated assumption in many scheduling papers.

Optimisation of Scheduling Criteria
For each one of the problem instances, the sequences minimising each one of the considered scheduling criteria are obtained. For small problem sizes (i.e., n ∈ {5, 10}), this has been done by exhaustive search. As for bigger problem sizes, using exhaustive search or any other exact method is not feasible in view of the NP-hardness of these decision problems, we have found the best sequence (with respect to each of the scheduling criteria considered) by using an efficient metaheuristic, which is allowed a long CPU time interval. More specifically, we have built a tabu search algorithm (see, e.g., [59]). The basic outline of the algorithm is as follows. • The neighbourhood definition includes the sum of the general pairwise interchange and insertion neighbourhoods. Both neighbourhood definitions are widely used in the literature.

•
The size of the tabu list L has been set to the maximum value between the number of jobs and the number of machines, i.e., L = max n, m. As the size of the list is used to avoid getting trapped into local optima, the idea is keeping a list size related to the size of the neighbourhood.

•
As stopping criterion, the algorithm terminates after a number of iterations without improvement. This number has been set as the minimum of 10 · n. This ensures a large minimum number of iterations, while increasing this number of iterations with the problem size.

Dominance Relationships among Scheduling Criteria
A first goal of the experiments is to establish which scheduling criterion is more related to the different shop floor performance measures. To check the statistical significance of the results, we test a number of hypotheses using a one-sided test for the differences of means of paired samples (see, e.g., [60]) for every combination of m and n. More specifically, for each pair of scheduling criteria (A, B) and a shop floor performance measure ζ, we would like to know whether the sequence resulting from the minimisation of scheduling criteria A yields a better value for ζ, denoted as ζ(A), than the sequence resulting from the minimisation of scheduling criteria B. More specifically, we want to establish the significance of the null hypothesis H 0 : ζ(A) better than ζ(B) to determine whether criterion A is more aligned with SF indicator ζ than criterion B, or vice versa. Note that better than may express different ordinal relations depending on the performance measure, i.e., it is better to have a higher TH, but it is better to have lower ACT and W IP, therefore we specifically test the following three hypotheses for every combination of scheduling criteria A and B:  Tables 1-10 for the different testbeds, where p-values are given as the maximum level of significance to reject H 0 (p represents the limit value to reject hypothesis H 0 resulting from a t-test, i.e., for every level of significance α ≤ p, H 0 would have to be rejected, whereas for every α > p, H 0 would not be rejected. A high p indicates that H 0 can be rejected with high level of significance, and therefore H 1 can be accepted.) To express it in an informal way: a value close to zero in the column corresponding to the performance measure ζ in the table comparing the pair of scheduling criteria (A, B) indicates that minimizing criterion A leads to better values of ζ than minimizing criterion B, whereas a high value indicates the opposite.
To make an example of the interpretation of the procedure adopted, let us take the column TH for any of the testbeds in Table 1 (all zeros). This column shows the p-values obtained by testing the null hypothesis that makespan minimisation produces solutions with higher throughput than those produced by using flowtime minimisation as a scheduling criterion. Since these p-values are zero for all problem sizes, then the null hypothesis cannot be rejected. As a consequence, we can be quite confident (statistically speaking) that makespan minimisation is more aligned with throughput increase than completion time minimisation.
In view of the results of the tables, the following comments can be done.
• Regarding Table 1, it is clear that makespan outperforms the total completion time regarding throughput, and that the total completion time outperforms the makespan regarding average cycle time. These results are known from theory and, although they could have been omitted, we include them for symmetry. The table also shows that completion time outperforms makespan with respect to work in process, a result that cannot be theoretically predicted. This results is obtained for all instance sizes and different methods to generate the processing times. As a consequence, if shop floor performance is measured using primarily one indicator, C max would be the most aligned objective with respect to throughput, whereas ∑ C j would be the most aligned with respect to cycle time and work in process.

•
From Table 2, it can be seen that makespan outperforms ∑ ITH j with respect to throughput, and, in general, with respect to ACT (with the exception of small problem instances for certain processing times' generation). Finally, regarding work in process, in general, makespan outperforms ∑ ITH j if n > m, whereas the opposite occurs if m ≥ n. •  Tables 3 and 4 show an interesting result: despite the problem size and/or the distribution of the processing times, makespan outperforms both ∑ IT j and CTV for all three shop floor performance measures considered. This result reveals that the minimisation of CTV or ∑ IT j are poorly linked to shop floor performance, as least compared to makespan minimisation. • Table 5 show that, regardless the generation of processing times and/or the problem size, completion time performs worse than ∑ ITH j for makespan, whereas it outperforms it in terms of average cycle time and work in process. • Table 6 show that, with few exception cases, the completion time outperforms ∑ IT j for all three SF indicators.

•
In Table 7, a peculiar pattern can be observed: while it can be that ∑ C j dominates CTV with respect to the three SF indicators, this is not the case for the random processing times, as in this case the makespan values obtained by CTV are higher than those observed for the total completion time.

•
In Tables 8 and 9 it can be seen that ∑ ITH j outperforms both ∑ IT j and CTV for all instance sizes and all generation of the processing times. Regarding considering the heads or not in the idle time function, this result makes clear that idle time minimisation including the heads is better with respect to all shop floor performance measures considered. • Finally, in Table 10 it can be seen that the relative performance of ∑ IT j and CTV with respect to the indicators depends on the type of testbed and on the problem instance size. However, in view of the scarce alignment of both scheduling criteria with any SF already detected in Tables 3, 4, 8 and 9, these results do not seem relevant for the purpose of our analysis.

•
If a trade-off between two shop floor performance measures is sought, for each pair of indicators it is possible to represent the set of efficient scheduling criteria in a multi-objective manner, i.e., criteria for which no other criterion in the set obtains better results with respect to both two indicators considered. This set is represented in Table 11, and it can be seen that completion time minimisation is the only efficient criterion to minimise both W IP and ACT. In contrast, if TH is involved in the trade-off, a better value for TH (and worse for ACT and W IP) can be obtained by minimising ∑ ITH j , and a further better value for TH (at the expenses of worsening ACT and W IP) would be obtained by minimising C max .                    n  m  TH  ACT WIP  TH  ACT  WIP  TH  ACT WIP  TH  ACT Table 11. Efficient criteria for each pair of SF indicators.

Ranking of Scheduling Criteria
In this section, we further try to explore the trade-off among the different criteria by answering the following question: Once we choose certain scheduling criterion according to the aforementioned ranking, how are the gains (or losses) that we can expect in the different shop floor performance measures when we switch from one scheduling criterion to another. More formally, we intend to quantify the difference between picking one scheduling criterion or another for a given shop floor performance measure. To address this issue, we define the RD PM or Relative Deviation with respect to a given PM (performance measure) in the following manner.

RD(A) PM
where PM(S A ) is the value of PM obtained for the sequence S A which minimises scheduling criterion A. Analogously, S A + is the sequence obtained by minimising scheduling criterion A + , being A + the scheduling criterion ranking immediately behind A for the performance measurement PM. When A is the scheduling criterion ranking last for PM, then RD is set to zero. Note that this definition of RD allows us to obtain more information than the mere rank of scheduling criteria. For instance, let us consider the scheduling criteria A, B, and C, which rank (ascending order) with respect to the performance measure PM in the following manner: B, C, A. This information (already obtained in Section 4.1) simply states that B (C) is more aligned that C (A) with respect to performance measure PM, but does not convey information on whether there are substantial differences between the three criteria for PM, or not. This information can be obtained by measuring the corresponding RD: If RD(B) is zero or close to zero, it implies that B and C yield similar values for PM, and therefore there is not so much difference (with respect to PM) between minimizing B, or C. In contrast, a high value of RD(C) indicates a great benefit (with respect to PM) when switching from minimizing A to minimizing C.
Since RD is defined for a specific instance, we use the Average Relative Deviation (ARD) for comparison across the testbed, consisting in averaging the RDs. The results of the experiments for the different testbeds with respect to ARD are shown in Tables 12-15, together with the rank of each criterion for each problem size. In addition, the cumulative ARD of the scheduling criteria for each shop floor performance measures are shown in Figure 2 for the different testbed. In view of the results, we give the following comments. Table 12. Average Relative Deviation (ARD) and ranks (in parentheses) of the scheduling criteria for the random test-bed.  • ∑ ITH j emerges as an interesting criterion as its performance is only marginally worse than C max with respect to TH-particularly in the NC testbed, see Figure 2a, but it obtains better values regarding ACT and W IP. Similarly, although it performs worse than ∑ C j for ACT and W IP, it performs better in terms of throughput. • The differences in ARD for throughput are, in general, smaller than those for ACT and W IP.
For the correlated test-beds (LC to HC), the differences never reach 1%. This speaks for the little difference between minimising any of the scheduling measures if throughput maximisation is sought. The highest differences are encountered for the random test-bed (~6%).

•
The differences in all measures for structured instances are smaller than for random test-bed. For instance, whereas makespan ranks first for TH (theoretically predictable), the maximum ARD for a given problem size in the random test-bed is 6.04%, whereas this is reduced to 0.52% for LC, and to 0.16% for HC. Analogously, the maximum differences between the completion time (ranking first for ACT) and the next criterion raise up to 23.84% for the random test-bed while dropping to 1.27% for HC. This means that the structured problems are easier than random problems because the distribution of the processing times flattens the objective functions, at least with respect to the considered shop floor performance measures.

Conclusions and Further Research
An extensive computational study has been carried out in order to analyse the links between several scheduling criteria in a flowshop and well-known shop floor performance measures. These results give some insights into the nature of these links, which can be summarised as follows.

•
Roughly speaking, we could divide the considered scheduling criteria into two big categories: those tightly related to any (some) shop floor performance measure, and those poorly related to SF performance. Among the later, we may classify CTV and ∑ IT j . Nevertheless, this is not meant to say that these criteria are not useful. However, from a shop floor performance perspective, it may be interesting to investigate whether these scheduling criteria relate to other performance measures. Perhaps extending the analysis to a due date scenario might yield some positive answer.

•
Makespan matches (as theoretical predicted) throughput maximisation better than any other considered criteria. However, it turns out that differences between its minimisation and the minimisation of other criteria with respect to throughput are very small. Additionally, given the relatively poor performance of makespan with respect to ACT, one might ask whether makespan minimisation pays off for many manufacturing scenarios in terms of shop floor performance as compared, e.g., to completion time or ∑ ITH j minimisation. A positive answer seems to be confined to these scenarios where costs associated to cycle time are almost irrelevant as compared to costs related to machine utilisation. The fact that this situation is not common in many manufacturing scenarios may lead to the lack of practical descriptions on the application of this criteria already discussed by [4].

•
Completion time minimisation matches extremely well both work in process and average cycle time minimisation (the latter being theoretical predictable), better than any other criteria. In addition, the rest of the scheduling criteria perform much worse. Therefore, completion time minimisation emerges as a major criterion when it comes to increase shop floor performance. This empirical reasoning indicates the interest of the research on completion time minimisation rather than on other criteria, at least within the flowshop scheduling context.

•
The minimisation of idle time (including the heads) performs better than completion time with respect to throughput. However, its performance is substantially worse than completion time regarding ACT and W IP. Hence, it seems an interesting criterion when throughput maximisation is the most important performance measure but work-in-process costs are not completely irrelevant.

•
With respect to the influence of the test-bed design on the results, there are noticeable differences between the overall results obtained in the correlated test-beds (LC-HC), and those obtained from the random test-bed. In general, the introduction of structured processing times seems to reduce the differences between the scheduling criteria. At a first glance, this means that random processing times make it difficult to achieve a good shop floor performance by the application of a specific scheduling criterion. It is widely know that random problems produce difficult instances in the sense that there were high differences between bad and good schedules (with respect to a given scheduling criterion), at least for the makespan criterion. In view of the results of the experiments, we can also assert that these also translate into shop floor performance measures.
From these results, some aspects warrant future research: • ∑ ITH j emerges as an interesting scheduling criterion, with virtues in between makespan and completion time. For most of the problem settings, it compares to makespan in terms of cycle time, and it outperforms total completion time in terms of throughput. In view of these results, perhaps it is interesting devoting more efforts to flowshop minimisation with this criterion, which so far has been used only as a secondary tie-breaking rule. Interestingly, the results in this paper might suggest that its excellent performance in terms of tie-breaking rule is motivated by its alignment with shop floor performance.

•
While it is possible to perfectly match the shop floor objectives of throughput and average cycle time with scheduling criteria (makespan and completion time, respectively), W IP cannot be linked to a scheduling criterion in a straightforward manner. Although the minimisation of completion time achieves the best results for W IP minimisation among the tested criteria, "true" work-in-process optimization is not the same as completion time minimisation. Here, the quotient between total completion time and makespan emerges as a "combined" scheduling criteria which may be worth of research as it matches an important shop floor performance measure such as work-in-process minimisation.

•
The results of the present study are limited by the shop layout (i.e., the permutation flowshop) and the scheduling criteria (i.e., not due date-related criteria) considered. Therefore, an obvious extension of this study is to analyse other environments and scheduling measures. Particularly, the inclusion of due date related criteria could provide some additional insights on the linkage between these and the shop floor performance measures, as well as between the due date and non-due date scheduling criteria.