Hybrid Flow Shop with Unrelated Machines , Setup Time , and Work in Progress Buffers for Bi-Objective Optimization of Tortilla Manufacturing

Victor Hugo Yaurima-Basaldua 1, Andrei Tchernykh 2,3,*, Francisco Villalobos-Rodríguez 1 and Ricardo Salomon-Torres 1 ID 1 Software Engineering, Sonora State University, San Luis Rio Colorado, Sonora 83455, Mexico; victor.yaurima@ues.mx (V.H.Y.-B.); fco.vr1@gmail.com (F.V.-R.); ricardo.salomon@uabc.edu.mx (R.S.-T.) 2 Computer Science Department, CICESE Research Center, Ensenada 22860, Mexico 3 School of Electrical Engineering and Computer Science, South Ural State University, Chelyabinsk 454080, Russia * Correspondence: chernykh@cicese.mx; Tel.: +521-646-178-6994


Introduction
Tortillas are a very popular food as a favorite snack and meal option in various cultures and countries.There are two types of tortillas: wheat-based and corn-based.Their overall consumption is growing all over the world.Originally, tortillas were made by hand: grinding corn into flour, mixing the dough, and pressing to flatten it.In fact, many small firms today still use processes that are more labor-intensive, requiring people to perform a majority of the tasks such as packaging, loading, baking, and distributing.This trend is expected to change over the next years.Due to its considerable practical significance, optimization of tortilla production is important.To improve the production timing parameters (setup, changeover, waiting), operational cost (energy consumption, repairing, service), throughput, etc., careful analysis of the process and of advance scheduling approaches is needed.
This industry is a typical case of a hybrid flow shop, which is a complex combinatorial optimization problem that arises in many manufacturing systems.In the classical flow shop, a set of jobs has to pass through various stages of production.Each stage can have several machines.There is also the flexibility of incorporating different-capacity machines, turning this design into hybrid flow shop scheduling (HFS).
The production decisions schedule work to machines in each stage to determine the processing order according to one or more criteria.Hybrid flow shop scheduling problems are commonly encountered in manufacturing environments.A large number of heuristics for different HFS configurations considering realistic problems have been proposed [1][2][3][4][5][6].
In spite of significant research efforts, some aspects of the problem have not yet been sufficiently explored.For example, previous works deal mostly with improving the efficiency of production, where objective functions are usually used to minimize the total production time (makespan), the total tardiness, etc. [7].Although the production efficiency is essential, it is not the only factor to consider in manufacturing operations.One of the main motivations of this work is to reduce energy consumption.In recent years, it has been recognized that high energy consumption can increase operational cost and cause negative environmental impacts [8].
In this paper, we propose a genetic algorithm that contributes to the solution of a bi-objective HFS problem of a real tortilla production environment, with unrelated machines, setup time, and work in progress buffers taking into account the total completion time and energy consumption of the machines.
The rest of the paper is organized as follows.After presenting the description and characteristics of the production line and jobs in Section 2, we discuss related work in Section 3. We describe the model of the problem in Section 4. We present the details of the proposed solution in Section 5 and the algorithm calibration in Section 6.We discuss computational experiments and results in Section 7. Finally, we conclude with a summary and an outlook in Section 8.
Algorithms 2018, 11, x FOR PEER REVIEW 2 of 22 is also the flexibility of incorporating different-capacity machines, turning this design into hybrid flow shop scheduling (HFS).
The production decisions schedule work to machines in each stage to determine the processing order according to one or more criteria.Hybrid flow shop scheduling problems are commonly encountered in manufacturing environments.A large number of heuristics for different HFS configurations considering realistic problems have been proposed [1][2][3][4][5][6].
In spite of significant research efforts, some aspects of the problem have not yet been sufficiently explored.For example, previous works deal mostly with improving the efficiency of production, where objective functions are usually used to minimize the total production time (makespan), the total tardiness, etc. [7].Although the production efficiency is essential, it is not the only factor to consider in manufacturing operations.One of the main motivations of this work is to reduce energy consumption.In recent years, it has been recognized that high energy consumption can increase operational cost and cause negative environmental impacts [8].
In this paper, we propose a genetic algorithm that contributes to the solution of a bi-objective HFS problem of a real tortilla production environment, with unrelated machines, setup time, and work in progress buffers taking into account the total completion time and energy consumption of the machines.
The rest of the paper is organized as follows.After presenting the description and characteristics of the production line and jobs in Section 2, we discuss related work in Section 3. We describe the model of the problem in Section 4. We present the details of the proposed solution in Section 5 and the algorithm calibration in Section 6.We discuss computational experiments and results in Section 7. Finally, we conclude with a summary and an outlook in Section 8.

Production Process
Figure 1 shows the eight main stages of the production process: (1) Preparation, (2) Mixing, (3) Division and Rounding, (4) Repose, (5) Press, (6) Bake, (7) Cooling, (8) Stacking and Packing.In this paper, we consider a hybrid flow shop with six stages (from 2 to 7), with different numbers of machines in the stages 2, 4, and 6.In the production of wheat flour tortillas, the raw ingredients prepared in Stage 1 are transferred to a large mixer (Stage 2), where they are combined to create the dough.
Corn tortilla production starts with a process called nixtamalization, in which the corn is soaked in an alkali solution of lime (calcium hydroxide) and hot water.The dough is divided into fragments with round shapes (Stage 3).These fragments repose in a controlled environment to obtain a certain consistency as to the temperature and humidity (Stage 4).Next, they are pressed to take the shape of a tortilla (Stage 5).Rows of tortillas are heated in stoves (Stage 6).The next steps are to cool (Stage 7), stack, and pack them (Stage 8).In this paper, we consider a hybrid flow shop with six stages (from 2 to 7), with different numbers of machines in the stages 2, 4, and 6.In the production of wheat flour tortillas, the raw ingredients prepared in Stage 1 are transferred to a large mixer (Stage 2), where they are combined to create the dough.
Corn tortilla production starts with a process called nixtamalization, in which the corn is soaked in an alkali solution of lime (calcium hydroxide) and hot water.The dough is divided into fragments with round shapes (Stage 3).These fragments repose in a controlled environment to obtain a certain consistency as to the temperature and humidity (Stage 4).Next, they are pressed to take the shape of a tortilla (Stage 5).Rows of tortillas are heated in stoves (Stage 6).The next steps are to cool (Stage 7), stack, and pack them (Stage 8).
The process is distributed among flow lines with several machines per stage (Figure 2).It is operated in a semi-automated way controlled by operators.The process is distributed among flow lines with several machines per stage (Figure 2).It is operated in a semi-automated way controlled by operators.Tables 1 and 2 present the capacities of the machines in each stage.These are represented by the amount of work (tortillas in kilograms) that can be processed on each machine.Table 2 shows that Stages 3, 4, 6, and 7 have identical machines, and Stages 2 and 5 have unrelated machines.Table 3 shows the power of the machines in each stage in kW per minute.The consumption is independent of the type of job to be processed.Machines in Stage 4 do not consume energy.The equipment is cabinets, where dough balls remain for a given period.Machine 1 0.09 0.03 0 0.62 0.04 0.12 2 0.09 0.03 0 0.62 0.04 0.12 3 0.11 0.03 0 0.62 0.04 0.12 4 0.14 0.03 0 0.9 0.04 0.12 5 0.23 0.03 0 0.9 0.04 0.12 6 0.23 0.03 0 0.9 0.04 0.12 Tables 1 and 2 present the capacities of the machines in each stage.These are represented by the amount of work (tortillas in kilograms) that can be processed on each machine.Table 2 shows that Stages 3, 4, 6, and 7 have identical machines, and Stages 2 and 5 have unrelated machines.Table 3 shows the power of the machines in each stage in kW per minute.The consumption is independent of the type of job to be processed.Machines in Stage 4 do not consume energy.The equipment is cabinets, where dough balls remain for a given period.4 shows characteristics of the jobs.We consider a workload with five jobs for production of the five types of tortillas most demanded by industry: Regular, Integral, Class 18 pcs, Taco 50, and MegaBurro.Each job is characterized by its type, group, and batches required for production.It has several assigned batches.This quantity must be within a certain range that is obtained from the standard deviations of the average monthly production in kilograms of tortillas for a year.The batch indicates the number of 10 kg tasks that have to be processed in the job.A job with one batch indicates processing 10 kg, and with five batches indicates processing 50 kg.
The group indicates the compatibility of the job tasks.Job tasks belonging to the same group may be processed simultaneously.For instance, they can be processed on all unrelated machines of Stage 2, if their sizes are less than or equal to the capacity in kilograms of the machines (Table 1).The job tasks of the same type of dough can be combined, forming compatibility groups.In this case, the jobs of Types 1, 3, 4, and 5 belong to Group 1, and jobs of Type 2 belong to Group 2.
Table 5 shows processing times in minutes for every 10 kg of tortilla for a job type.The 10 kg represents a task.We see that in Stages 2 and 4, all machines process jobs for the same durations of 1 and 45 min, respectively, due to the fact that the time does not depend on the job type.Table 6 shows the setup time of machines at each stage.At stages from 3 to 7, the setup time is independent of the task sequence, job type, and weight.At Stage 2, the setup time is calculated as follows: where t w is the total weight in kilograms of the tasks to be processed by machines in the same time period.These tasks have to belong to jobs of the same group.

Energy-Aware Flow Shop Scheduling
Investment in new equipment and hardware can certainly contribute to energy savings [9].The use of "soft" techniques to achieve the same objective is also an effective option [10].
The advance schedule could play an important role in reducing the energy consumption of the manufacturing processes.Meta-heuristics, e.g., genetic algorithms (GA) [11], particle swarm optimization [12], and simulated annealing [13] are greatly popularity for use in the design of production systems.
However, studies of flow shop scheduling problems for energy saving appear to be limited [14,15].In fact, most production systems that allow changing of the speed of the machines belong to the mechanical engineering industry and usually take the configuration of a work shop instead of a flow shop [16].
One of the first attempts to reduce energy consumption through production scheduling can be found in the work of Mouzon et al. [17].The authors collected operational statistics for four machines in a flow shop and found that nonbottleneck machines consume a considerable amount of energy when left idle.As a result, they propose a framework for scheduling power on and off events to control machines for achieving a reduction of total energy consumption.Dai et al. [18] applied this on/off strategy in a flexible flow shop, obtaining satisfactory solutions to minimize the total energy consumption and makespan.
Zhang and Chiong [16] proposed a multiobjective genetic algorithm incorporated into two strategies for local improvements to specific problems, to minimize power consumption, based on machine speed scaling.
Mansouri et al. [19] analyzed the balance between minimizing the makespan, a measure of the level of service and energy consumption, in a permutation flow shop with a sequence of two machines.The authors developed a linear model of mixed-integer multiobjective optimization to find the Pareto frontier composed of energy consumption and total makespan.
Hecker et al. [20] used evolutionary algorithms for scheduling in a non-wait hybrid flow shop to optimize the allocation of tasks in production lines of bread, using particle swarm optimization and ant colony optimization.Hecker et al. [21] used a modified genetic algorithm, ant colony optimization, and random search procedure to study the makespan and total time of idle machines in a hybrid permutation flow model.
Liu and Huang [14] studied a scheduling problem with batch processing machines in a hybrid flow shop with energy-related criteria and total weighted tardiness.The authors applied the Nondominated Sorting Genetic Algorithm II (NSGA-II).
Additionally, in time completion problems, Yaurima et al. [3] proposed a heuristic and meta-heuristic method to solve hybrid flow shop (HFS) problems with unrelated machines, sequence-dependent setup time (SDST), availability constraints, and limited buffers.

Multiobjective Optimization
Different optimization criteria are used to improve models of a hybrid flow shop.An overview of these criteria can be found in [4].Genetic algorithms have received considerable attention as an approach to multiobjective optimization [6,22,23].
Deb et al. [24] proposed a computationally efficient multiobjective algorithm called NSGA-II (Nondominated Sorting Genetic Algorithm II), which can find Pareto-optimal solutions.The computational complexity of the algorithm is O MN 2 , where M is the number of targets and N is the size of the data set.
In this article, we consider it to solve our problem with two criteria.An experimental analysis of two crossover operators, three mutation operators, three crossover and mutation probabilities, and three populations is presented.For each case of machines per stage, the goal is to find the most desirable solutions that provide the best results considering both objectives.

Desirability Functions
Getting solutions on the Pareto front does not resolve the multiobjective problem.There are several methodologies to incorporate preferences in the search process [25].
These methodologies are responsible for giving guidance or recommendations concerning selecting a final justifiable solution from the Pareto front [26].One of these methodologies is Desirability Functions (DFs) that are responsible for mapping the value of each objective to desirability, that is, to values in a unitless scale in the domain [0, 1].
Let us consider that the objective f i is Z i ⊆ R; then a DF is defined as any function d i : Z i → [0, 1] that specifies the desirability of different regions of the domain Z i for objective f i [25].
The Desirability Index (DI) is the combination of the individual values of the DFs in one preferential value in the range [0, 1].The higher the values of the DFs, the greater the DI.The individual in the final front with the highest DI will be the best candidate to be selected by the decision maker [27].The algorithm calibration of desirability functions is applied to the final solutions of the Pareto front to get the best combination of parameters.

Model
Many published works address heuristics for flow shops that are considered production environments [3,23,[28][29][30].We consider the following problem: A set J = {1, 2, 3, 4, 5} of jobs available at time 0 must be processed on a set of 6 consecutive stages S = {2, 3, 4, 5 , 6, 7}. of the production line, subject to minimization of total processing time C max and energy consumption of the machines E op .
Each stage i ∈ S consists of a set M i = {1, . . . ,m i } of parallel machines, where |M i | ≥ 1.The sets M 2 and M 5 have unrelated machines and the sets M 3 , M 4 , M 6 , and M 7 have identical machines.We consider three different cases of m i = {2, 4, 6} machines in each stage.
Each job j ∈ J belongs to a particular group G k .Jobs 1, 3, 4, 5 ∈ J belong to G 1 , job 2 ∈ J belongs to G 2 .Each job is divided into a set T j of t j tasks, T j = 1, 2, . . ., t j .Jobs and tasks in the same group can be processed simultaneously only in stage 2 ∈ S according to the capacity of the machine.In stages 3, 4, 5, 6, 7 ∈ S, the tasks must be processed by exactly one machine in each stage (Figure 2).
Each task t ∈ T j consists of a set of batches of 10 kg each.The tasks can have different numbers of batches.The set of all the tasks together of all the jobs is given by T = {1, 2, . . . ,Q}.
Each machine must process an amount of "a" batches at the time of assigning a task.Let p i l ,q be the processing time of the task q ∈ T j on the machine l ∈ M i at stage i.Let S i l ,qp be the setup (adjustment) time of the machine l from stage i to process the task p ∈ T j after processing task q.Each machine has a process buffer for temporarily storing the waiting tasks called "work in progress".The setup times in stage 2 ∈ S are considered to be sequence dependent; in stages 3, 4, 5, 6, 7 ∈ S, they are sequence independent.

Encoding
Each solution (chromosome) is encoded as a permutation of integers.Each integer represents a batch (10 kg) of a given job.The enumeration of each batch is given in a range specified by the order of the jobs and the number of batches.
Batches are listed according to the job to which they belong.Table 7 shows an example the five jobs.If Job 1 has nine batches, they are listed from 1 to 9. If Job 2 has 12 batches, they are listed from 10 to 21. Reaching Job 5, we have accumulated 62 batches.Figure 3 shows an example of the production representation.Groups are composed according to the number of machines in the stage (m i ).The batches of each job are randomly distributed between the groups.To obtain the number of batches per group (a), we divide the total amount of batches by the number of machines per stage.3 shows an example of the production representation.Groups are composed according to the number of machines in the stage (  ).The batches of each job are randomly distributed between the groups.To obtain the number of batches per group (a), we divide the total amount of batches by the number of machines per stage.
Let us consider six stages with  1 = 4 machines per stage.The chromosome is composed of 4 groups.Let us assume that we have 62 batches for 5 jobs.This is equal to 62 (batches)/4 (machines per stage) = 15.5 (batches per group).The result is rounded to the nearest higher integer, ending with a = 16 batches per group.An example of processing tasks on Machine 1 is shown in Figure 4  In the ordered sequence, "Jobs" indicate jobs to which the batches belong according to their enumeration.These groups indicate the number of total tasks for each job from left to right.For example, in the ordered sequence, Job 4 has two tasks; the first one "4-1" has three batches (38,42,51) and the second one "4-2" has one batch (48).
"Proc.Order" indicates the order of processing of the task groups.Assuming that a machine in Stage 1 has to process up to 70 kg (7 batches), Group 1 is composed of sets of tasks 1-1 (1, 5), 3-1 (23,35), 4-1 (38,42,51) with seven batches.When the capacity of the machine is filled, Group 2 is formed for the tasks 4-2 (48) and 5-1 (56, 61, 59).Assuming that the batches of Job 2 are not compatible with any of the remaining four jobs, the batches are processed separately from the other two groups, forming the task 2-1 with five batches (14,11,15,19,16).When the first group's tasks have completed their processing in the machine, the second group is processed, and so on.
The tasks processed in the group are assigned to the machines of the following stages individually according to their order of completion in the first stage.When the tasks of a group-for example, Group 1 (1-1, 3-1, 4-1)-are processed together, the completion time is the same for all.Therefore, the three tasks will be ready at the same time to be assigned individually to machines available in the next stage, until the last (6th) stage.
In the post-second stages, the tasks that are ready to be assigned are in the process buffer.The task that has the longest processing time is assigned first to an available machine.The calculation of the total completion time   is as follows.The time  , for completing task p in stage i is calculated according to the formula The maximum completion time of all tasks and jobs is calculated as Let us consider six stages with m 1 = 4 machines per stage.The chromosome is composed of 4 groups.Let us assume that we have 62 batches for 5 jobs.This is equal to 62 (batches)/4 (machines per stage) = 15.5 (batches per group).The result is rounded to the nearest higher integer, ending with a = 16 batches per group.An example of processing tasks on Machine 1 is shown in Figure 4.   3 shows an example of the production representation.Groups are composed according to the number of machines in the stage (  ).The batches of each job are randomly distributed between the groups.To obtain the number of batches per group (a), we divide the total amount of batches by the number of machines per stage.
Let us consider six stages with  1 = 4 machines per stage.The chromosome is composed of 4 groups.Let us assume that we have 62 batches for 5 jobs.This is equal to 62 (batches)/4 (machines per stage) = 15.5 (batches per group).The result is rounded to the nearest higher integer, ending with a = 16 batches per group.An example of processing tasks on Machine 1 is shown in Figure 4. Machine   In the ordered sequence, "Jobs" indicate jobs to which the batches belong according to their enumeration.These groups indicate the number of total tasks for each job from left to right.For example, in the ordered sequence, Job 4 has two tasks; the first one "4-1" has three batches (38,42,51) and the second one "4-2" has one batch (48).
"Proc.Order" indicates the order of processing of the task groups.Assuming that a machine in Stage 1 has to process up to 70 kg (7 batches), Group 1 is composed of sets of tasks 1-1 (1, 5), 3-1 (23,35), 4-1 (38,42,51) with seven batches.When the capacity of the machine is filled, Group 2 is formed for the tasks 4-2 (48) and 5-1 (56, 61, 59).Assuming that the batches of Job 2 are not compatible with any of the remaining four jobs, the batches are processed separately from the other two groups, forming the task 2-1 with five batches (14,11,15,19,16).When the first group's tasks have completed their processing in the machine, the second group is processed, and so on.
The tasks processed in the group are assigned to the machines of the following stages individually according to their order of completion in the first stage.When the tasks of a group-for example, Group 1 (1-1, 3-1, 4-1)-are processed together, the completion time is the same for all.Therefore, the three tasks will be ready at the same time to be assigned individually to machines available in the next stage, until the last (6th) stage.
In the post-second stages, the tasks that are ready to be assigned are in the process buffer.The task that has the longest processing time is assigned first to an available machine.The calculation of the total completion time   is as follows.The time  , for completing task p in stage i is calculated according to the formula In the ordered sequence, "Jobs" indicate jobs to which the batches belong according to their enumeration.These groups indicate the number of total tasks for each job from left to right.For example, in the ordered sequence, Job 4 has two tasks; the first one "4-1" has three batches (38,42,51) and the second one "4-2" has one batch (48).
"Proc.Order" indicates the order of processing of the task groups.Assuming that a machine in Stage 1 has to process up to 70 kg (7 batches), Group 1 is composed of sets of tasks 1-1 (1, 5), 3-1 (23,35), 4-1 (38,42,51) with seven batches.When the capacity of the machine is filled, Group 2 is formed for the tasks 4-2 (48) and 5-1 (56, 61, 59).Assuming that the batches of Job 2 are not compatible with any of the remaining four jobs, the batches are processed separately from the other two groups, forming the task 2-1 with five batches (14,11,15,19,16).When the first group's tasks have completed their processing in the machine, the second group is processed, and so on.
The tasks processed in the group are assigned to the machines of the following stages individually according to their order of completion in the first stage.When the tasks of a group-for example, Group 1 (1-1, 3-1, 4-1)-are processed together, the completion time is the same for all.Therefore, the three tasks will be ready at the same time to be assigned individually to machines available in the next stage, until the last (6th) stage.
In the post-second stages, the tasks that are ready to be assigned are in the process buffer.The task that has the longest processing time is assigned first to an available machine.The calculation of the total completion time C max is as follows.The time C i,p for completing task p in stage i is calculated according to the formula The maximum completion time of all tasks and jobs is calculated as Q indicates the total number of tasks for all jobs.C m,p is the completion time of task p ∈ T j in the last stage 7 ∈ S. The total energy consumption E op of the execution of all tasks is calculated as where E i l indicates the electrical power consumption of machine l in stage i, and p i l ,q refers to the processing time of the task q ∈ T.

Nondominated Sorting Genetic Algorithm II (NSGA-II)
The NSGA-II algorithm is used to assign tasks to machines so that C max and E op are minimized.NSGA-II is a multiobjective genetic algorithm characterized by elitism and stacking distance to maintain the diversity of the population to find as many Pareto-optimal solutions as possible [24].It generates a population of N individuals, where each represents a possible solution.Individuals evolve through genetic operators to find optimal or near-optimal solutions.Three operators are usually applied: tournament selection (using a stacking tournament operator), crossover, and mutation to generate another N individuals or children.
From the mixture of these two populations, a new population of size 2N is created.Then, the best individuals are taken according to their fitness value by ordering the population on nondominated fronts.Individuals from the best nondominated fronts are first taken, one by one, until N individuals are selected.
The crowding distance is then compared to preserve diversity in the population.This operator compares two solutions and chooses a tournament winner by selecting the setting that is located on the best Pareto front.If the participating tournament configurations are on the same front, the best crowding distance (highest) to determine the winning setting is used.Later, the algorithm applies the basic genetic operators and promotes the next generation cycle with the configurations that occupy the best fronts, preserving the diversity through the crowding distance.A job is assigned to a machine at a given stage by taking into consideration processing speeds, setup times, machine availability, and energy consumption.Each solution is encoded in a permutation of integers.

Crossover Operators
Crossover operators allow the obtaining of new solutions (children) by combining individuals (parents) of the population.The crossover operator is applied under a certain probability.In this paper, we consider two operators: the partially mapped crossover and the order crossover.
Partially mapped crossover (PMX) [32,33]: This operator uses two cut points (Figure 5).The part of the first parent between the two cut points is copied to the children.Then, the following parts of the children are filled by a mapping between the two parents, so that their absolute positions are inherited where possible from the second parent.Figures 7 and 8 show representations of tasks for two and four machines per stage.The original sequence is ordered considering the enumeration of the jobs and their batches.Each machine is assigned batches to process, which form the tasks of each job.Each machine must process a certain set of tasks; each set is assigned a processing order (Proc.Order).These sets are formed according to the processing compatibility.The tasks of Jobs 1, 3, and 4 can be processed in the same set at the same time.The tasks of Job 2 are put into a separate set because they are not compatible with each other.For example, in Figure 7, the first two tasks (1-1, 4-1) are processed by Machine 1, then the second set with one task (2-1).In Machine 2, two tasks of the first set (1-2 and 3-1) are processed, assuming that the machine has only   Figures 7 and 8 show representations of tasks for two and four machines per stage.The original sequence is ordered considering the enumeration of the jobs and their batches.Each machine is assigned batches to process, which form the tasks of each job.Each machine must process a certain set of tasks; each set is assigned a processing order (Proc.Order).These sets are formed according to the processing compatibility.The cross parents are composed of the batches of the jobs listed in Table 8.Figures 7 and 8 show representations of tasks for two and four machines per stage.The original sequence is ordered considering the enumeration of the jobs and their batches.Each machine is assigned batches to process, which form the tasks of each job.Each machine must process a certain set of tasks; each set is assigned a processing order (Proc.Order).These sets are formed according to the processing compatibility.Figures 7 and 8 show representations of tasks for two and four machines per stage.The original sequence is ordered considering the enumeration of the jobs and their batches.Each machine is assigned batches to process, which form the tasks of each job.Each machine must process a certain set of tasks; each set is assigned a processing order (Proc.Order).These sets are formed according to the processing compatibility.The tasks of Jobs 1, 3, and 4 can be processed in the same set at the same time.The tasks of Job 2 are put into a separate set because they are not compatible with each other.For example, in Figure 7, the first two tasks (1-1, 4-1) are processed by Machine 1, then the second set with one task (2-1).In Machine 2, two tasks of the first set (1-2 and 3-1) are processed, assuming that the machine has only the capacity to process up to 6 batches (60 kg).The second set will then be followed by one task (4-1).Figures 7 and 8 show representations of tasks for two and four machines per stage.The original sequence is ordered considering the enumeration of the jobs and their batches.Each machine is assigned batches to process, which form the tasks of each job.Each machine must process a certain set of tasks; each set is assigned a processing order (Proc.Order).These sets are formed according to the processing compatibility.The tasks of Jobs 1, 3, and 4 can be processed in the same set at the same time.The tasks of Job 2 are put into a separate set because they are not compatible with each other.For example, in Figure 7, the first two tasks (1-1, 4-1) are processed by Machine 1, then the second set with one task (2-1).In Machine 2, two tasks of the first set (1-2 and 3-1) are processed, assuming that the machine has only the capacity to process up to 6 batches (60 kg).The second set will then be followed by one task (4-1).The tasks of Jobs 1, 3, and 4 can be processed in the same set at the same time.The tasks of Job 2 are put into a separate set because they are not compatible with each other.For example, in Figure 7, the first two tasks (1-1, 4-1) are processed by Machine 1, then the second set with one task (2-1).In Machine 2, two tasks of the first set (1-2 and 3-1) are processed, assuming that the machine has only the capacity to process up to 6 batches (60 kg).The second set will then be followed by one task (4-1).

𝑚
Figure 8 shows four machines per stage.Four batches are assigned to each machine.Machines 1 and 2 process two sets of tasks.Machines 3 and 4 process only one set.Upon completion of the tasks processed in sets in Step 1, in the later stages, they will be processed individually.
Order crossover (OX) [32,[34][35][36][37]: Two cut points are selected randomly; the part of the first parent located between these two cut points is copied to the child.The child's other spaces are left blank.The values copied to the child are deleted in the second parent.The following positions in the child are filled starting in the blank spaces and considering the order found in the second parent (Figure 9). Figure 8 shows four machines per stage.Four batches are assigned to each machine.Machines 1 and 2 process two sets of tasks.Machines 3 and 4 process only one set.Upon completion of the tasks processed in sets in Step 1, in the later stages, they will be processed individually.
Order crossover (OX) [32,[34][35][36][37]: Two cut points are selected randomly; the part of the first parent located between these two cut points is copied to the child.The child's other spaces are left blank.The values copied to the child are deleted in the second parent.The following positions in the child are filled starting in the blank spaces and considering the order found in the second parent (Figure 9).

Mutation Operators
Mutation operators produce small changes in individuals according to a probability.This operator helps to prevent falling into local optima and to extend the search space of the algorithm.Three mutation operators are considered: Displacement, Exchange, and Insertion.
Displacement is a generalization of the insertion mutation, in which instead of moving a single value, several values are changed (Figure 10).The exchange operator selects two random points and these position values are exchanged (Figure 11).In insertion, a value is selected randomly and will be inserted at an arbitrary position (Figure 12).

Mutation Operators
Mutation operators produce small changes in individuals according to a probability.This operator helps to prevent falling into local optima and to extend the search space of the algorithm.Three mutation operators are considered: Displacement, Exchange, and Insertion.
Displacement is a generalization of the insertion mutation, in which instead of moving a single value, several values are changed (Figure 10).The exchange operator selects two random points and these position values are exchanged (Figure 11).In insertion, a value is selected randomly and will be inserted at an arbitrary position (Figure 12).
operator helps to prevent falling into local optima and to extend the search space of the algorithm.Three mutation operators are considered: Displacement, Exchange, and Insertion.
Displacement is a generalization of the insertion mutation, in which instead of moving a single value, several values are changed (Figure 10).The exchange operator selects two random points and these position values are exchanged (Figure 11).In insertion, a value is selected randomly and will be inserted at an arbitrary position (Figure 12).Three mutation operators are considered: Displacement, Exchange, and Insertion.
Displacement is a generalization of the insertion mutation, in which instead of moving a single value, several values are changed (Figure 10).The exchange operator selects two random points and these position values are exchanged (Figure 11).In insertion, a value is selected randomly and will be inserted at an arbitrary position (Figure 12).

Parameter Calibration
Calibration is a procedure for choosing the algorithm parameters that provide the best result in the response variables.A computational experiment for calibration includes the following steps: (a) each instance or workload is run with all possible combinations of parameters; (b) the best solution is obtained; (c) the relative difference of each algorithm on the best solution is calculated; (d) multifactorial Analysis of Variance (ANOVA) with a 95% confidence level is applied to find the parameter that most influences the solution; (e) the set of the best values is selected for each parameter.
Table 9 shows the parameters used for calibration.A total of 3 × 2 × 3 × 3 × 3 = 162 different algorithms or combinations are considered.Thirty runs for each combination are performed, with 162 × 30 = 4860 experiments in total.For each of the 30 runs, a different workload will be taken.For each of the five jobs, 30 batches are generated randomly from a uniform distribution according to their limits (Table 4).

Parameter Calibration
Calibration is a procedure for choosing the algorithm parameters that provide the best result in the response variables.A computational experiment for calibration includes the following steps: (a) each instance or workload is run with all possible combinations of parameters; (b) the best solution is obtained; (c) the relative difference of each algorithm on the best solution is calculated; (d) multifactorial Analysis of Variance (ANOVA) with a 95% confidence level is applied to find the parameter that most influences the solution; (e) the set of the best values is selected for each parameter.
Table 9 shows the parameters used for calibration.A total of 3 × 2 × 3 × 3 × 3 = 162 different algorithms or combinations are considered.Thirty runs for each combination are performed, with 162 × 30 = 4860 experiments in total.For each of the 30 runs, a different workload will be taken.For each of the five jobs, 30 batches are generated randomly from a uniform distribution according to their limits (Table 4).
Each batch is assigned to a workload, obtaining 30 loads from 5 jobs.The variation of processing capabilities (in minutes) of the machines in Stages 2 and 5 is generated according to Tables 1 and 2. Stage 2 has 6 machines with different processing capabilities in kilograms.Stage 5 has 6 machines.
According to the selection of a number of machines per stage, the set of machines is chosen.The machines are numbered from 1 to 6.By considering different numbers of machines per stage (2, 4, or 6), we conduct 4860 × 3 = 14,580 experiments.In each of the 30 runs of each combination, the best individual applying desirability function is obtained.Subsequent values of each objective (C max and E op ) are used to calculate an average of 30, obtaining a single value for each objective.The performance of each algorithm is calculated as the relative increment of the best solution (RIBS).The RIBS is calculated with the following formula: where Heu sol is the value of the objective function obtained by the algorithm, and Best sol is the best value obtained during the execution of all possible combinations of parameters.All experiments were performed on a PC with Intel Core i3 CPU and 4 GB RAM.The programming language used to encode the algorithm is R.It provides advantages, including the scalability and libraries [38].The calibration of the algorithm lasted approximately 15 days and 9 h.

Residual Analysis
To verify that the data are normalized, the supposition of the suitability of the model using the normality, homoscedasticity, and independence of residues is verified.The residuals are calculated according to the following formula [39]: where y i is the RIBS for the run i and y i is the average of the experiment.Figure 3 shows the graph of the normal probability of residuals for 6 machines per stage.As can be seen (Figure 13), the graph complies with the assumption of normality.The same results are obtained for 2 and 4 machines per stage.All experiments were performed on a PC with Intel Core i3 CPU and 4 GB RAM.The programming language used to encode the algorithm is R.It provides advantages, including the scalability and libraries [38].The calibration of the algorithm lasted approximately 15 days and 9 h.

Residual Analysis
To verify that the data are normalized, the supposition of the suitability of the model using the normality, homoscedasticity, and independence of residues is verified.The residuals are calculated according to the following formula [39]: where   is the RIBS for the run  and   ̅ is the average of the experiment.Figure 3 shows the graph of the normal probability of residuals for 6 machines per stage.As can be seen (Figure 13), the graph complies with the assumption of normality.The same results are obtained for 2 and 4 machines per stage.

Variance Analysis
Variance analysis is applied to evaluate the statistical difference between the experimental results and to observe the effect of the parameters on the quality of the results.It is used to determine factors that have a significant effect and to discover the most important factors.The parameters of the problem are considered as factors and their values as levels.We assume that there is no interaction between factors.

Variance Analysis
Variance analysis is applied to evaluate the statistical difference between the experimental results and to observe the effect of the parameters on the quality of the results.It is used to determine factors that have a significant effect and to discover the most important factors.The parameters of the problem are considered as factors and their values as levels.We assume that there is no interaction between factors.
The F-Ratio is the ratio between Factor Mean Square and Mean Square residue.A high F-Ratio means that this factor significantly affects the response.The p-value shows the statistical significance of the factors: p-values that are less than 0.05 have a statistically significant effect on the response variable (RIBS) with a 95% confidence level.According to the F-Ratio and p-value, the most important factors in the case of two machines per stage are the mutation operator and the crossover operator variable.DF shows the number of degrees of freedom (Table 10).In the case of four machines per stage, the most important factors are mutation operator and crossover probability (Table 11).For six machines per stage, the most important factors are the mutation operator and mutation probability (Table 12).Figures [14][15][16][17] show means and 95% Least Significant Difference (LSD) confidence intervals.Figure 14 shows the results obtained for mutation operators for two, four, and six machines per stage.We can see that operator Displacement is the best mutation among the three tested operators.Figure 15 shows the results for the crossover operators, where we observe that the PMX operator is the best.The results for four and six stages are similar.
Figure 16 shows plots for the crossover probability.It can be seen that the best crossover probability occurring is 0.9. Figure 17 presents plots for the mutation probability, where the probability of 0.2 is statistically more significant.17 show means and 95% Least Significant Difference (LSD) confidence intervals.Figure 14 shows the results obtained for mutation operators for two, four, and six machines per stage.We can see that operator Displacement is the best mutation among the three tested operators.Figure 15 shows the results for the crossover operators, where we observe that the PMX operator is the best.The results for four and six stages are similar.

E:
Figure 16 shows plots for the crossover probability.It can be seen that the best crossover probability occurring is 0.9. Figure 17 presents plots for the mutation probability, where the probability of 0.2 is statistically more significant.Tables [13][14][15] show the ordering of the front points regarding a combination of   ,   , and Desirability Index (DI) for each Pareto front.We see, for example, that the parameter combination 156 corresponds to a higher DI, followed by 129, etc.

Pareto Front Calibration Analysis
Table 16 shows the parameters of these combinations.The crossover PMX has higher values of DI.The mutation operator Displacement is used in each of the top three combinations.The crossover probability is maintained at 0.9 in the two most significant combinations.The mutation probability remains at the highest with 0.2 in all three, as does the population with 50 individuals.
The most influential parameters correspond to the combination 156, which has greater DI in three cases (see Tables 13-15).Tables [13][14][15] show the ordering of the front points regarding a combination of E op , C max , and Desirability Index (DI) for each Pareto front.We see, for example, that the parameter combination 156 corresponds to a higher DI, followed by 129, etc. Table 16 shows the parameters of these combinations.The crossover PMX has higher values of DI.The mutation operator Displacement is used in each of the top three combinations.The crossover probability is maintained at 0.9 in the two most significant combinations.The mutation probability remains at the highest with 0.2 in all three, as does the population with 50 individuals.
The most influential parameters correspond to the combination 156, which has greater DI in three cases (see Tables 13-15).
Our analysis shows that crossover PMX and mutation operator Displacement are the best among those tested.The best probability of crossover is 0.9 and that of mutation is 0.2.The population of 50 individuals is statistically more significant.

Experimental Setup
In multiobjective optimization, there is usually no single best solution, but a set of solutions that are equally good.Our objective is to obtain a good approximation of the Pareto front regarding two objective functions [40].
Our bi-objective genetic algorithm is tuned up using the following parameters obtained during the calibration step: crossover, PMX; mutation, Displacement; crossover probability, 0.9; mutation probability, 0.2; population size, 50.
We consider five jobs in each of the 30 workloads, obtaining the number of batches for each job based on a uniform random distribution (Table 4).Figure 21 shows the solution space (two objectives) for two machines per stage.This twodimensional solution space represents a feasible set of solutions that satisfy the problem constraints.The Pareto front covers a wide range of   from 6856 to 7178 kW, while   is in the range of 96.64 to 97.48 h.

Bi-Objective Analysis
Figure 22 shows the solution space for four machines per stage.The Pareto front covers   from 12,242 to 12,559 kW, while   is in the range of 43.98 to 44.8 h.Finally, Figure 23   Figure 21 shows the solution space (two objectives) for two machines per stage.This two-dimensional solution space represents a feasible set of solutions that satisfy the problem constraints.The Pareto front covers a wide range of E op from 6856 to 7178 kW, while C max is in the range of 96.64 to 97.48 h.
Figure 22 shows the solution space for four machines per stage.The Pareto front covers E op from 12,242 to 12,559 kW, while C max is in the range of 43.98 to 44.8 h.Finally, Figure 23 shows the solution space for 6 machines per stage.E op ranges from 16,899 to 17,221 kW, while C max is in the range of 25.66 to 26.66 h.
Although the Pareto fronts are of good quality, it is observed that many of the generated solutions are quite far from it.Therefore, a single execution of the algorithm can produce significantly different results.The selection of the solutions included in the Pareto front depends on the preference of the decision maker.

Branch and Bound Analysis
A Branch and Bound algorithm (B&B) is an exact method for finding an optimal solution to an NP-hard problem.It is an enumerative technique that can be applied to a wide class of combinatorial optimization problems.The solution searching process is represented by a branching tree.Each node in the tree corresponds to a subproblem, which is defined by a subsequence of tasks that are placed at the beginning of the complete sequence.
This subsequence is called partial sequence (PS).The set of tasks not included in PS is called NPS.When a node is branched, one or more nodes are generated by adding a task ahead of the partial sequence associated with the node being branched.To avoid full enumeration of all task permutations, the lower bound of the value of the objective functions is calculated in each step for each partial schedule.In the case of ties, the algorithm selects a node with the lowest lower bound.
The B&B algorithm was run on a PC with an Intel Core i3 CPU and 4 GB of RAM.The programming language used to encode the algorithm was R. The algorithm lasted approximately 12 days, 3 h.
A comparison of the B&B algorithm concerning the Pareto front is shown in Figure 21 as a lowest left point.The B&B result is considered as the best result that could be obtained, known as the global optimum.
Table 17 shows the relative degradation of our bi-objective algorithm over the B&B best solutions.Results are obtained by averaging 30 experiments.We see that the results for each objective are not more than 3.44% worst for C max and 2.94% for E op .Table 18 shows the computation times for obtaining the Pareto front solutions.
According to the results comparison, we conclude that the proposed bi-objective algorithm can produce satisfactory results close to a global optimum in short computational time.

Comparison with Industry
Figures 24 and 25 show E op and C max of the industry and the B&B algorithm.The term "Algorithm" in these graphs represents the results of the B&B algorithm.The term "Industry" represents the actual data of the production company.The industry results shown in Table 19 are obtained considering the average monthly production load in kilograms of tortillas per year.The results of the B&B algorithm are obtained for a different number of machines per stage.Table 20 shows the same results as in Table 19 in percentages.19 are obtained considering the average monthly production load in kilograms of tortillas per year.The results of the B&B algorithm are obtained for a different number of machines per stage.Table 20 shows the same results as in Table 19 in percentages.Figure 24 shows C max considering working hours.We see that the results of the algorithm are significantly improved compared with industry.For two machines per stage, the difference is almost a halving of C max from 186 to 96.52 h.In the case of four machines, it remains below half (93 and 43.82 h).In the case of six machines, the results are better by almost three times with regard to C max (62 h versus 25.57h).
Figure 25 shows E op according to the processing time of machines (Table 3).We see that the B&B algorithm significantly improves upon the results obtained in industry.For two machines per stage, the E op objective is reduced to almost half, and for four and six machines, the E op objective of our algorithm is reduced to less than half.
Table 20 shows the percentage of degradation of the results of industry and those selected from the Pareto front obtained from the bi-objective GA compared with results of the B&B algorithm.
We observe that the degradation of E op and C max observed in industry are closer to or higher than 50%.Comparing B&B with our bi-objective GA, we observe that our results are less than 1.53% worse for E op and 0.54% for C max .Due to the fact that the B&B algorithm finds the global optimum, we demonstrate the quality of our algorithm.

Conclusions
Our main contributions are multifold: (1) We formulated the complex problem of the real-life industry environment of tortilla production considering two optimization criteria: total completion time and energy consumption;

Figure 4 .
Figure 4. Example of tasks on Machine 1.

Figure 3 .
Figure 3. Example of the representation of the chromosome for m i machines per stage.

Figure 4 .
Figure 4. Example of tasks on Machine 1.

Figure 4 .
Figure 4. Example of tasks on Machine 1.

Figure 6 .
Figure 6.PMX crossover operator example with batch index.

Figure 8 .
Figure 8. Chromosome representation example with four machines per stage.

Figure 6 22 Figure 5 .
Figure 6 shows an example of the PMX crossover.The cross points in both parents serve to form the child chromosome.The 16 listed lots of the 4 jobs are randomly distributed on each parent chromosome.When crossed, they form the child chromosome with 16 lots.

Figure 6 .
Figure 6.PMX crossover operator example with batch index.

Figure 7 .
Figure 7. Chromosome representation example with two machines per stage.

Figure 8 .
Figure 8. Chromosome representation example with four machines per stage.

Figure 6 .
Figure 6.PMX crossover operator example with batch index.

Figure 6 .
Figure 6.PMX crossover operator example with batch index.

Figure 8 .
Figure 8. Chromosome representation example with four machines per stage.

Figure 7 .
Figure 7. Chromosome representation example with two machines per stage.

Figure 6 .
Figure 6.PMX crossover operator example with batch index.

Figure 8 .
Figure 8. Chromosome representation example with four machines per stage.

Figure 8 .
Figure 8. Chromosome representation example with four machines per stage.

Figure 13 .
Figure 13.Graph of the normal probability of waste for six machines per stage.

Figure 13 .
Figure 13.Graph of the normal probability of waste for six machines per stage.

Figure 14 .
Figure 14.Means and 95% LSD confidence intervals of mutation operators.Figure 14.Means and 95% LSD confidence intervals of mutation operators.

Figure 16 .
Figure 16.Means and 95% LSD confidence intervals of crossover probability-four machines per stage.

Figure 17 .
Figure 17.Means and 95% LSD confidence intervals of mutation probability-six machines per stage.

Figures 18 -Figure 17 . 22 Figure 18 .
show the solution space obtained by calibration experiments.The horizontal axis represents the energy   consumed by the machines.The vertical axis represents the time completion   of jobs.To determine the best individuals of all experiments, the Pareto front is calculated for each case.Each point represents a combination of parameters from 162 experiments explained in Section 6.1.Each front consists of numbered points from lowest to highest according to their DI.The closer to 1, the better the place in the enumeration is (see Section 3.3).

Figure 19 .
Figure 19.Pareto front for four machines per stage.

Figure 19 .
Figure 19.Pareto front for four machines per stage.

Figure 19 .
Figure 19.Pareto front for four machines per stage.

Figure 20 .
Figure 20.Pareto front for six machines per stage.

Figure 20 .
Figure 20.Pareto front for six machines per stage.

Figure 22 .
Figure 22.Pareto front for four machines per stage.

Figure 22 .
Figure 22.Pareto front for four machines per stage.

Figure 22 .
Figure 22.Pareto front for four machines per stage.

Figure 23 .
Figure 23.Pareto front for six machines per stage.

Figure 23 .
Figure 23.Pareto front for six machines per stage.

Algorithms 2018 , 22 7. 4 .
Figures 24 and 25  show   and   of the industry and the B&B algorithm.The term "Algorithm" in these graphs represents the results of the B&B algorithm.The term "Industry" represents the actual data of the production company.The industry results shown in Table19are obtained considering the average monthly production load in kilograms of tortillas per year.The results of the B&B algorithm are obtained for a different number of machines per stage.Table20shows the same results as in Table19in percentages.

Figure 24 .
Figure 24.C max of B&B algorithm and industry.

Table 1 .
The processing capacity of machines in Stage 2 (kg per minute).

Table 2 .
The processing capacity of machines in Stages 3-7 (kg per minute).

Table 1
demonstrates that Stage 2 has six machines with processing capabilities of 80, 80, 100, 130, 160, and 200 kg per minute that can process jobs from 1 to 5. Machines 1 and 2 have the same capacities.

Table 3 .
Energy consumption of machines (kW per minute).

Table 1 .
The processing capacity of machines in Stage 2 (kg per minute).

Table 2 .
The processing capacity of machines in Stages 3-7 (kg per minute).

Table 1
demonstrates that Stage 2 has six machines with processing capabilities of 80, 80, 100, 130, 160, and 200 kg per minute that can process jobs from 1 to 5. Machines 1 and 2 have the same capacities.

Table 3 .
Energy consumption of machines (kW per minute).

Table 5 .
Processing time P i k ,j of job j at machine k in stage i for each 10 kg.

Table 6 .
Sequence setup time for all machines in stage i.

Table 7 .
Example of enumeration of batches for jobs.

Table 7 .
Example of enumeration of batches for jobs. .
Figure 3. Example of the representation of the chromosome for  i machines per stage.

Table 7 .
Example of enumeration of batches for jobs.

Table 8 .
Enumeration of the batches for jobs.

Table 9 .
Parameters used for calibration.

Table 9 .
Parameters used for calibration.

Table 10 .
Relative increment of the best solution (RIBS) analysis of variance for two machines per stage.

Table 11 .
RIBS analysis of variance for four machines per stage.

Table 12 .
RIBS analysis of variance for six machines per stage.
Pareto front for two machines per stage.Pareto front for four machines per stage.

Table 13 .
Optimum parameter combination selection for bi-objective genetic algorithm (GA) for two machines per stage.

Table 14 .
Optimum parameter combination selection for bi-objective GA for four machines per stage.

Table 15 .
Optimum parameter combination selection for bi-objective GA for six machines per stage.

Table 16 .
Optimum parameter combination for bi-objective GA.
Figure 21.Pareto front for two machines per stage.
Figure 21.Pareto front for two machines per stage.Figure 22. Pareto front for four machines per stage.
shows the solution space for 6 machines per stage.  ranges from 16,899 to 17,221 kW, while   is in the range of 25.66 to 26.66 h.

Table 17 .
RIBS of the solution of the Branch and Bound (B&B) algorithm concerning the Pareto front solutions.

Table 18 .
Computational time for obtaining Pareto front solutions.

Table 19 .
Results of the industry and B&B algorithm.

Table 20 .
The industry and bi-objective GA degradation over B&B algorithm (%).E op of B&B algorithm and industry.

Table 19 .
Results of the industry and B&B algorithm.

Table 20 .
The industry and bi-objective GA degradation over B&B algorithm (%).