A Novel Parallel Simulated Annealing Methodology to Solve the No-Wait Flow Shop Scheduling Problem with Earliness and Tardiness Objectives

: In this paper, the no-wait ﬂow shop problem with earliness and tardiness objectives is considered. The problem is proven to be NP-hard. Recent no-wait ﬂow shop problem studies focused on familiar objectives, such as makespan, total ﬂow time, and total completion time. However, the problem has limited studies with solution approaches covering the concomitant use of earliness and tardiness objectives. A novel methodology for the parallel simulated annealing algorithm is proposed to solve this problem in order to overcome the runtime drawback of classical simulated annealing and enhance its robustness. The well-known ﬂow shop problem datasets in the literature are utilized for benchmarking the proposed algorithm, along with the classical simulated annealing, variants of tabu search, and particle swarm optimization algorithms. Statistical analyses were performed to compare the runtime and robustness of the algorithms. The results revealed the enhancement of the classical simulated annealing algorithm in terms of time consumption and solution robustness via parallelization. It is also concluded that the proposed algorithm could outperform the benchmark metaheuristics even when run in parallel. The proposed algorithm has a generic structure that can be easily adapted to many combinatorial optimization problems.


Introduction
Flow shop scheduling problems (FSSPs) consist of n jobs that should be processed on m different machines.In the classical FSSP, it is assumed that there exists an unlimited buffer between two successive machines.The no-wait flow shop scheduling problem (NWFSSP) is a special case of the generic FSSP in that each operation of a job has to be processed immediately after the preceding operation, without any delay.Consequently, the problem has a permutation schedule that should be processed on no-buffer machines [1].The permutation constraint ensures that all jobs are processed in the same order on the machines [2].The NWFSSP is fundamental to applications in real-life production environments, especially for jobs including quick obsolescence risk for manufactured products or their components.Examples are processes of canned food, critical operations of metals while forging, and production with chemical reactions [3].
Graham et al. [4] introduced the α | β | γ notation to define a scheduling problem, where α stands for the machine environment, β denotes any special constraint, and γ represents the objective function or functions of the problem.The machine environment shows the shopfloor setup, such as parallel machines, flow shop, and job shop.The constraints present the special cases of processes such as permutation (prmu), no-wait (nwt), blocking (block), and recirculation (rcrc).The last field hosts the objectives, e.g., total completion time, makespan, flow time, earliness, and tardiness.In this study, the NWFSSP is considered to minimize total earliness and tardiness that is denoted by F n | nwt| ∑ E j + ∑ T j .NWFSSP problems with m machines have been proven to be NP-hard in the strong sense, where m is higher than 3 [5].There are a considerable number of published studies that dealt with the NWFSSP with makespan, total completion time, or only tardiness objectives.Correspondingly, total earliness and tardiness objectives may be encountered in many studies dealing with scheduling problems.In some studies, the due date was also included in the model as a constraint rather than an objective.The survey by Allahverdi [6] covered over 300 papers in the literature containing no-wait constraints.The problem with the notation F m | nwt| ∑ T j that considers minimizing only total tardiness was recently studied by Aldowaisan and Allahverdi [7], Liu et al. [8], and Ding et al. [9].However, a relatively small body of literature is concerned with the NWFSSP by minimizing total earliness and tardiness.Some studies [10][11][12][13][14][15][16] built the problem to minimize multiple objectives by integrating due date constraints and objectives such as makespan, total flow time, and resource consumption.Huang et al. [17] studied the flexible FSSP to minimize total weighted earliness and tardiness under the due-window constraint.Some of these studies had the goal of minimizing total tardiness and earliness with minor nuances.Arabameri and Salmasi [18] handled the weighted objective under a sequence-dependent setup time constraint providing a mixed-integer linear programming (MILP) model and a timing algorithm, comparing the performance of the customized particle swarm optimization (PSO) and tabu search (TS) metaheuristics.Schaller and Valente [19] studied the NWFSSP by minimizing total earliness and tardiness and compared dispatching heuristics.They compared the dispatching heuristics under the additional time-allowed constraint in another study [20].Guevara-Guevara et al. [21] proposed a GA algorithm for the problem under a sequence-dependent setup time constraint and compared it with dispatching heuristics.Zhu et al. [22] implemented a discrete learning fruit fly algorithm for the distributed NWFSSP with weighted objectives under the common due date constraint and compared it to only iterated greedy with idle time insertion evaluation.Qian et al. [23] applied the matrix cube-based estimation of distribution algorithm (MCEDA) under sequence-dependent setup time and release time constraints, benchmarked with seven metaheuristic algorithms.
Ingber [24] addressed the primary shortcoming of the SA algorithm as its timeconsuming computational steps.Due to its oscillation during the search for alternative solutions, the algorithm requires extensive time to present reasonable incumbent solutions.Many parallelized versions have been developed to overcome this disadvantage.Figure 1 includes the taxonomy by Greening [25] that was produced during his doctoral dissertation.
Processes 2023, 11, x FOR PEER REVIEW 2 of 27 constraints present the special cases of processes such as permutation (prmu), no-wait (nwt), blocking (block), and recirculation (rcrc).The last field hosts the objectives, e.g., total completion time, makespan, flow time, earliness, and tardiness.In this study, the NWFSSP is considered to minimize total earliness and tardiness that is denoted by  | |∑ + ∑ .NWFSSP problems with  machines have been proven to be NP-hard in the strong sense, where  is higher than 3 [5].There are a considerable number of published studies that dealt with the NWFSSP with makespan, total completion time, or only tardiness objectives.Correspondingly, total earliness and tardiness objectives may be encountered in many studies dealing with scheduling problems.In some studies, the due date was also included in the model as a constraint rather than an objective.The survey by Allahverdi [6] covered over 300 papers in the literature containing no-wait constraints.The problem with the notation  |  | ∑  that considers minimizing only total tardiness was recently studied by Aldowaisan and Allahverdi [7], Liu et al. [8], and Ding et al. [9].However, a relatively small body of literature is concerned with the NWFSSP by minimizing total earliness and tardiness.Some studies [10][11][12][13][14][15][16] built the problem to minimize multiple objectives by integrating due date constraints and objectives such as makespan, total flow time, and resource consumption.Huang et al. [17] studied the flexible FSSP to minimize total weighted earliness and tardiness under the due-window constraint.Some of these studies had the goal of minimizing total tardiness and earliness with minor nuances.Arabameri and Salmasi [18] handled the weighted objective under a sequence-dependent setup time constraint providing a mixed-integer linear programming (MILP) model and a timing algorithm, comparing the performance of the customized particle swarm optimization (PSO) and tabu search (TS) metaheuristics.Schaller and Valente [19] studied the NWFSSP by minimizing total earliness and tardiness and compared dispatching heuristics.They compared the dispatching heuristics under the additional time-allowed constraint in another study [20].Guevara-Guevara et al. [21] proposed a GA algorithm for the problem under a sequence-dependent setup time constraint and compared it with dispatching heuristics.Zhu et al.
[22] implemented a discrete learning fruit fly algorithm for the distributed NWFSSP with weighted objectives under the common due date constraint and compared it to only iterated greedy with idle time insertion evaluation.Qian et al. [23] applied the matrix cube-based estimation of distribution algorithm (MCEDA) under sequence-dependent setup time and release time constraints, benchmarked with seven metaheuristic algorithms.Ingber [24] addressed the primary shortcoming of the SA algorithm as its time-consuming computational steps.Due to its oscillation during the search for alternative solutions, the algorithm requires extensive time to present reasonable incumbent solutions.Many parallelized versions have been developed to overcome this disadvantage.Figure 1 includes the taxonomy by Greening [25] that was produced during his doctoral dissertation.Greening [25] divided parallel simulated annealing (PSA) algorithms into two main categories.Synchronous algorithms share information during runtime, while asynchronous algorithms have limited or no communication.Synchronous algorithms calculate the same cost function.Asynchronous algorithms work faster by ignoring synchronization and allowing errors but result in lower outcome quality.Synchronous algorithms are divided into two subcategories: serial-like and altered generation.Serial-like algorithms generate new states as applied in a sequentially running algorithm.Altered generation applies different strategies for state generation.
In the past years, many studies have been proposed to overcome the disadvantages of the SA algorithm.Fast simulated annealing (FSA) by Szu and Hartley [26] and very fast simulated reannealing (VFSA) by Ingber [27] are among the prominent studies that altered the cooling process for more rapid convergence to the global minimum.Malek et al. [28] proposed a parallel SA strategy to solve the TSP problem in which parallel algorithms compare their solutions at a certain timepoint and restart with the initial temperature with the incumbent solution.On the basis of these studies, Yao [29] proposed a new simulated annealing algorithm (NSA) that is exponentially faster compared to VFSA.With a different approach, Roussel-Ragot and Dreyfus [30] suggested a general parallel form with two different temperature regimes.In this approach, parallel processors are in touch to update the approved current solution globally when a move is accepted.Mahfoud and Goldberg [31] integrated SA into the genetic algorithm to enrich the population at each iteration.Lee and Lee [32] evaluated various types of moving schemes for parallel synchronous and asynchronous SA strategies with multiple Markov chains.Wodecki and Bo żzejko [33] implemented a parallel SA version considering the flow shop problem with a makespan objective in which parallel threads run simultaneous searches.In this implementation, incumbent solutions of all threads are changed to new ones if a thread finds a better solution than the current best solution.As a similar approach to this study, Bo żejko and Wodecki [34] developed a procedure that runs a master thread and multiple slave threads.The slave threads iteratively search for better solutions.Upon finding a better incumbent solution, a thread tries intensification iterations and reports the result.All slave threads run again on the new incumbent solution with different temperatures.The results of the study demonstrated average values away from the optimum solution even for 20 jobs.The possible root cause of this failure could be premature convergence due to the greedy intensification of threads inconsistent with the cooling process of the SA algorithm.Czapi ński [35] worked on the permutation flow shop problem in order to reduce total flow time.The study included master and worker nodes.Worker nodes obtain feedback on their results after a certain number of iterations to the master node.The incumbent solution is updated by the best solution shared with the master node.Worker nodes run again with the updated incumbent solution as the initial solution.Ferreiro et al. [36] implemented parallelrunning asynchronous graphics processing unit (GPU) threads beginning with the same instructions but different initial solutions.Sonuc et al. [37] coded another GPU algorithm on the Compute Unified Device Architecture (CUDA) platform that runs independent threads to solve the binary knapsack problem.Richie and Ababei [38] provided a synchronous methodology with a managing thread that distributes the calculations to the worker threads.Turan et al. [39] suggested the multithread simulated annealing (MTSA) algorithm with master and slave threads.The created slave thread continues running until the defined number of non-improving iterations.Non-improving indicates that the global best solution cannot be improved.Upon completion of the thread runs, solutions are gathered into the pool.Vousden [40] compared the performances of asynchronous and synchronous SA implementations with regard to race conditions.Zhou et al. [41] benchmarked multiple threads of SA without communication initialized with different solutions.Coll et al. [42] built synchronous GPU threads that communicate in predefined time intervals and resume their runs on the best incumbent solution so far.Yildirim [43] deployed a multithread methodology with a hybrid structure where SA is fed by optimizer sub-threads.Even though many further studies are present in the literature, this review covered most of the parallelization approaches.
In the remainder of this paper, an MILP model is introduced to define the research problem.A simulated annealing (SA) metaheuristic variant, namely, the simulated annealing multithread (SAMT) algorithm, is proposed to solve the problem.The contributions of this algorithm and study are as follows: 1.
Improving the runtime drawback of the SA algorithm; 2.
Enhancing its robustness to converge to the global optimum solution; 3.
Providing a new solution approach to NWFSSP, minimizing total earliness and tardiness; 4.
Enabling parallel processing without excessive resource allocation.
The motivation behind this study is an aim to contribute to improvements in field optimization, more specifically metaheuristics.Even though metaheuristics were introduced after intense studies and analysis, there are still open issues to be optimized.A good example is the study by Deng et al. [44], where the authors improved the mutation strategy of the differential evolution algorithm and even compared the novel methodology to the previously improved differential evolution algorithm methodologies.Similarly, Cai et al. [45] worked on the quantum-inspired evolutionary algorithm to improve multiple shortcomings, such as poor runtime, search capability, and difficulty in assigning rotation angles.With the same focus, this study introduces a novel methodology that improves the original SA without requiring excessive computational resources.Section 2 of the paper states the research problem and details the proposed and benchmark metaheuristic algorithms.Section 3 introduces the benchmark results and comparative analysis.Section 4 includes the discussion, conclusion, and future prospects.

Methodology
The NWFSSP imposes processing of the jobs without interruption from the beginning on the first machine until completion on the last machine.Since the objective function also includes minimization of earliness in addition to tardiness, adding a delay between jobs may improve the objective function value (OFV) by avoiding the early completion of a job.However, unforced idleness would not be practical in most cases of the NWFSSP [46].The disadvantages may be operational costs, such as machine running costs, setup costs, and buffer costs that exceed any cost incurred due to earliness.As a result, the problem is proposed to have a non-delay schedule.An MILP model is suggested to formulate the problem.To provide integrality, two dummy jobs with zero processing times on all machines should be added to the model as the initial and last jobs.Thus, the MILP model has n = n + 2 jobs and m machines.Each job may be processed only on one machine at a time.Similarly, one machine can process only one job at a time.The proposed MILP model, its parameters, and its decision variables are outlined below.

Parameters:
g ijk = Slack time between completion time of Job j and starting time of Job k on Machine i, if Job k is processed immediately after Job j.M = a very large number.

Decision variables:
E j = Earliness of Job j.T j = Tardiness of Job j.C ij = Completion time of Job j on Machine i. x jk = 1, if Job k is processed immediately after Job j, and 0 otherwise.
The MILP model on NWFSSP minimizing total earliness and tardiness: Processes 2023, 11, 454 x jk = 1; j = 1, . . ., n + 2 (6) x jk ∈ {0, 1}; j = 1, . . ., n + 2 ; k = 1, . . ., n + 2 ; j = k, C ij, T j , E j , g ijk , p ik ≥ 0; i = 1, . . ., m ; j = 1, . . ., n + 2, ( where the objective function in Equation ( 1) minimizes the sum of total earliness and tardiness; the constraint in Equation (2) ensures that the completion time of a job on a machine is the summation of the completion time of the preceding job, the slack time between these jobs, and the processing time of the job; the constraint in Equation (3) guarantees that the completion time of the first job is its total processing time on all machines; the constraint in Equation ( 4) calculates the earliness or tardiness by comparing the completion time and due date of each job; and the constraints in Equations ( 5) and ( 6) ensure that each job has only one predecessor and successor, respectively.The model consists of n + 2 jobs to satisfy the constraints in Equations ( 5) and (6).Equation ( 7) is a virtual constraint that assigns the first (dummy) job as the predecessor of the last (dummy) job.It is assumed that each job is processed without any delay once any operation of the predecessor job does not cause any violation for no-wait processing to be aligned with real-world applications.Thus, the values of slack time g ijk are fixed for any consecutive jobs independent from their position in the schedule.The calculation of g ijk is broadly explained by Röck [47].

Neighborhood Operations
Neighborhood operations are applied to reveal neighbor solutions for metaheuristic algorithms.The types of neighborhood operations influence the performance of a metaheuristic algorithm.Thus, selecting the appropriate operations is the crucial key to exploring the search space wisely.Three different types are considered in this paper that are implemented in the proposed or benchmark algorithms.

Insert Operation
The insert operation embraces the insertion of a job in the sequence to a specified position.Two positions should be determined for this operation.Usually, these positions are generated randomly or tried successively in an order.Let i and j be integer numbers from the set {1, . . . ,n}.In an insert operation, Job i is removed from the sequence and inserted into the position prior to Job j.The positions of the jobs after Job j slide toward the end of the schedule.An example operation is shown in Figure 2, where Job 2 is inserted into Position 4.

Swap Operation
Congruently to an insert operation, two positions should be determined for swap operations.Let  and  be integers from the set 1, … ,  .In the swap operation, the positions of Job  and Job  are changed mutually.The positions of the remaining jobs do not change.The example in Figure 3 shows the swap operation of Job 2 and Job 4.

Swap Operation
Congruently to an insert operation, two positions should be determined for swap operations.Let i and j be integers from the set {1, . . . ,n}.In the swap operation, the positions of Job i and Job j are changed mutually.The positions of the remaining jobs do not change.The example in Figure 3 shows the swap operation of Job 2 and Job 4.

Swap Operation
Congruently to an insert operation, two positions should be determined for swap operations.Let  and  be integers from the set 1, … ,  .In the swap operation, the positions of Job  and Job  are changed mutually.The positions of the remaining jobs do not change.The example in Figure 3 shows the swap operation of Job 2 and Job 4.

Sub-Interchange Operation
Defined by Arabameri and Salmasi [18], sub-interchange is executed by changing two adjacent jobs Job  and Job  + 1.It is believed that running sub-interchange operations after updating the incumbent solution may result in a better solution.

Proposed Algorithm-Simulated Annealing Multithread (SAMT)
The SAMT algorithm is based on the simulated annealing (SA) metaheuristic.Proposed by Kirkpatrick [48], the SA simulates the annealing process of solids to solve combinatorial optimization problems.In physics, annealing specifies the process in which solids are heated rapidly to a maximum temperature and cooled down slowly in heat baths to allow them to arrange themselves in the ground state of the solid [49].The algorithm is initialized with a high temperature and cooled slowly.These moves during the cooling process avoid algorithm trapping in local optima [50].The flow of the classical SA algorithm is stated in Algorithm 1 [51].

Algorithm 1 Simulated Annealing 1:
Select an initial solution in the solution space  ∈  2: Set the initial temperature , 3: Set the temperature iteration counter  = 0, 4: Set the maximum number of repetitions  5: While (Stopping criteria are not met) 6: Set  = 0, 7: Generate a new solution  ∈

Sub-Interchange Operation
Defined by Arabameri and Salmasi [18], sub-interchange is executed by changing two adjacent jobs Job i and Job i + 1.It is believed that running sub-interchange operations after updating the incumbent solution may result in a better solution.

Proposed Algorithm-Simulated Annealing Multithread (SAMT)
The SAMT algorithm is based on the simulated annealing (SA) metaheuristic.Proposed by Kirkpatrick [48], the SA simulates the annealing process of solids to solve combinatorial optimization problems.In physics, annealing specifies the process in which solids are heated rapidly to a maximum temperature and cooled down slowly in heat baths to allow them to arrange themselves in the ground state of the solid [49].The algorithm is initialized with a high temperature and cooled slowly.These moves during the cooling process avoid algorithm trapping in local optima [50].The flow of the classical SA algorithm is stated in Algorithm 1 [51].
Algorithm 1 Simulated Annealing 1: Select an initial solution in the solution space s ∈ SS 2: Set the initial temperature T, 3: Set the temperature iteration counter i = 0, 4: Set the maximum number of repetitions n max 5: While (Stopping criteria are not met) 6: Set n = 0, 7: A review of the literature suggests that there are still open fields for parallel implementation of the SA algorithm.In this study, a novel approach including the utilization of multiple threads is proposed.The simulated annealing multithread (SAMT) algorithm is proposed to overcome the time disadvantage of the classical SA algorithm and boost the convergence process.The algorithm is named after its pragmatic structure in which classical SA runs on a main thread supported by a sub-thread that iteratively updates the incumbent solution of the main thread with much faster SA runs.Rather than distributing the calculations of the algorithm to available processors, this parallelism aims to run different SAs simultaneously to improve the consumed time and search the solution space intelligently to find the global optimum.The SAMT algorithm falls into the altered generation class of synchronous algorithms considering Greening's taxonomy [25] with shared memory.The novel methodology flow is demonstrated in Algorithm 2.
Algorithm 2 Simulated Annealing Multithread (SAMT) 1: Select initial solutions in the solution space s m = s t ∈ SS 2: Set the initial temperature T 0m 3: Set the temperature iteration counter i m = 0 4: Set the maximum number of repetitions n max 5: Set the operator selection probability Prob Operation ∈ (0, 1) 6: Set the fast sub-thread's initial temperature T 0 f t 7: Set the slow sub-thread's initial temperature T 0st 8: Set the fast sub-thread iteration counter i f t = 0 9: Set the fast sub-thread iteration counter i st = 0 10: Set the thread selection probability coefficient c ∈ (0, 1) 11: MAIN THREAD: 12: While (Stopping criteria are not met) 13: If (SUB-THREAD is not running) 14: Calculate Set s m = s t 17: Else 18: Set s t = s m 19: Run (SUB-THREAD) 20: Set n m = 0, 21: While (n m < n max ) 22: If (Uni f orm(0, 1) < Prob Operation ) 23: Set s m = s m 29: Else Set s m = s m 32: Set Set i m = i m + 1 36: Do 37: Return s m and f (s m ) 38: SUB-THREAD: The methodology should be initialized by the setting of parameters.Prior to finalizing this methodology, several alternative policies are considered, compared, and tested in terms of simplicity, runtime, efficiency, and implementation complexity.These policies include a different number of threads with distinct completion times and different schemes of solution updates.The threads in the proposed methodology have different roles.The fast thread decreases the runtime by jumping to better solutions in the vicinity of the current best solution with faster steps.This thread may also manipulate the search by orienting the direction toward different "hills" if the thread provides a better solution.The slow thread has different aims, such as enabling moves to longer distances in the vicinity, jumping to similar solutions of nearby optima, or diverging from local optima.Possible jumps after the completion of a sub-thread are presented in Figure 4.If the sub-thread results in a solution sequence with a better OFV than the global incumbent solution, the global incumbent solution is updated.An adaptive strategy with a single sub-thread provided the best solutions After some experimental runs with different parameters, the results also showed that adjusting the initial temperature and runtime of the sub-thread should be adaptive to achieve elite results.Therefore, the methodology is revised to have a single master (main) thread and a slave (sub) thread, and the assignment of the algorithm speed (slow or fast) is conducted dependent on a predefined parameter.This strategy attempts to seek a larger space and decreases the probability to be stuck in local optima due to premature convergence.Both insert and swap operators work If the sub-thread results in a solution sequence with a better OFV than the global incumbent solution, the global incumbent solution is updated.An adaptive strategy with a single sub-thread provided the best solutions After some experimental runs with different parameters, the results also showed that adjusting the initial temperature and runtime of the sub-thread should be adaptive to achieve elite results.Therefore, the methodology is revised to have a single master (main) thread and a slave (sub) thread, and the assignment of the algorithm speed (slow or fast) is conducted dependent on a predefined parameter.This strategy attempts to seek a larger space and decreases the probability to be stuck in local optima due to premature convergence.Both insert and swap operators work well on the NWFSSP.Both operators are utilized in the proposed methodology to benefit from each and avoid becoming trapped in a cycle.At each iteration, the algorithm runs the insert operator with the probability Prob Operation or the swap operator with the probability 1 − Prob Operation .The temperature-dependent type selection function in the first line of the sub-thread determines the type of sub-thread depending on the temperature of the main thread.The value of the constant c determines the initial probabilities to select a fast or slow thread.In cases where c > 0.50, the slow thread has a lower probability to be selected at the beginning of the run and the probability increases while T m decreases.If c < 0.50, then the fast thread has the advantage with a higher probability.The value c = 0.50 grants equal probability during all the runs.

Benchmark Algorithms
The variants of tabu search (TS) and particle swarm optimization (PSO) in the study by Arabameri and Salmasi [18] were selected as benchmark algorithms since these metaheuristics were applied to the same research problem.The benchmark algorithms are introduced in Appendix A, where item A1 introduces the TS and its variants: TS with the EDD initial solution and insertion neighborhood (TSEI), the TS algorithm with a random initial solution and insertion neighborhood (TSRI), the TS algorithm with the EDD initial solution and swap neighborhood (TSES), and the TS algorithm with a random initial solution and swap neighborhood (TSRS); while item A2 describes the PSO algorithm supported by integrating two different local search algorithms: insertion (PSOI) and variable neighborhood structure (PSOV).

Test Problems
The test problems are selected from the datasets already introduced by different studies.These problems are widely considered in the literature for benchmark purposes of scheduling problems.The proposed and benchmark algorithms are verified on Carlier's dataset [52], which has eight small-sized problems with 7-14 jobs and 4-9 machines.The process times in the dataset are generated with a pattern of sorting the digits adjacently.Another dataset for benchmarking consists of the scheduling problems defined by Reeves [53].Reeves generated this dataset to test his proposed genetic algorithm to solve the FSSP problem.The reason for generating this dataset was his failure to find a publicly shared dataset for this purpose since there is some evidence that process times cannot be completely random [54].Reeves generated this dataset using the suggestions by Rinnooy [55] and his parameters.The dataset has a maximum of 75 jobs and 20 machines beginning from 20 jobs and five machines.The famous Taillard dataset [56] is also included for comparison of the algorithms.The dataset comprises problems with a range of 20-500 jobs and 5-20 machines.Taillard included in his dataset hard problems to be solved.The process times are created randomly from a uniform distribution (1, 100).Due to the high number of problems, only the first problems with the same number of jobs and machines in the datasets by Reeves and Taillard are considered.
Due dates for the problems are created according to the rule proposed by Arabameri and Salmasi [18].The due date may be randomly produced from the interval in Equation (10).
where LB is the lower bound of the makespan, TF is the tightness factor, and R is the range parameter.The TF and R values are selected from the set {0.2, 0.5, 0.8}.To avoid the possibility of negative due date creation, {0.8, 0.5} and {0.8, 0.8} combinations are ignored for TF and R, respectively.Hence, a total of 27 different problems are solved with seven different due date schemes, resulting in 189 combinations.Among the methods to calculate the LB in the literature, the method by Taillard [56] is preferred due to its concrete and meaningful structure.The LB can be calculated according to Equation (11).
The problems are divided into three groups according to their sizes considering the number of jobs.Small-sized problems in Group 1 have up to 20 jobs to be processed.Medium-sized problems in Group 2 consist of 20-100 jobs.Large-sized problems in Group 3 consist of over 100 jobs.

Results of the Benchmark Runs
All of the metaheuristics involved in this study have several parameters that should be tuned to achieve favorable solutions.Additionally, the common parameters for benchmark runs should be determined.The maximum runtime of any run is limited by a function considering the number of jobs, as stated in Table 1.All metaheuristics except the TSEI and TSES algorithms are initialized with a random sequence to avoid any bias.The TSEI and TSES algorithms are initialized with the sequence provided by the EDD rule in which the jobs with earlier due dates are processed earlier.The initial temperature of the SA algorithm should be set to a value that allows acceptance of worse solutions with a determined probability, decreasing systematically.Since the test runs are terminated according to the runtime criteria for all algorithms, the temperature of the SA algorithm is intended to be decreased as a function of time according to Equation (15) in this study.
The parameter t cur in Equation ( 15) denotes the elapsed time, whereas t max is the total assigned runtime.The algorithm stops when the temperature drops to zero.Similarly, the temperatures of the main and support threads of the SAMT algorithm should also be managed for smooth synchronization.However, the time limit of the threads should be arranged within the time limits of the main thread.Tuning the parameters has a high influence on the performance of a metaheuristic.The guideline by LaTorre et al. [57] identifies several methods that have gained recent attention to tune the parameters of metaheuristic algorithms.The focus iterative local search (FocusILS) methodology [58] was preferred to tune the parameters of the SAMT algorithm.T 0m was selected from a set increasing by 50 until 500 degrees.The set {0.25, 0.50, 0.75} was assigned to find the best value of constant c and Prob Operation .T 0 f t and T 0st were tuned by dividing T 0m up to 10.The initial temperature of the SAMT algorithm may be selected as half of the SA algorithm for better and faster convergence.Insert or swap operators are selected with equal probability at each iteration.Running the fast thread more frequently at the initial phase of the algorithm eases discovering the vicinity of the current solution.Rare searches with a slower thread at this phase produce faster jumps toward the optima region if possible.The probability of using the slow thread toward the end of the runtime should be increased.This change is needed to allow the sub-thread to jump from local optima to different optimum regions by climbing the hills.Setting the parameter c to 0.25 assigns the probability of running the fast thread to 0.75 and the slow thread to 0.25.The probability of the fast thread decreases and the slow thread increases linearly during runtime until the probability of selecting the fast thread stabilizes at 0.25 and that of the slow thread stabilizes at 0.75.The parameters of the SAMT algorithm are shown in Table 2. T 0st : Fast Thread Runtime: Master Thread Runtime 150 Slow Thread Runtime: Master Thread Runtime 60 c: 0.25 The PSO algorithm is dependent on various parameters and variables that directly affect the performance of the algorithm.Tuning these parameters with ineligible strategies may lead to disadvantages, such as premature convergence or failing to converge to the region near the optimum solution.The constants and variables are set to the values suggested by Arabameri and Salmasi [18] to achieve a consequential benchmark environment.The settings for the PSO algorithm are shown in Table 3.While the w value for the PSOV algorithm is set to a fixed value, the value for the PSOI algorithm is updated regularly at each iteration dependent on a maximum number of iterations t max and current iteration t cur .
The metaheuristics are compared with the percentage gap (PG), which is the relative percentage deviation of the algorithm compared to the best solution.PG may be calculated according to Equation (16).
where Alg OFV is the OFV value of the selected algorithm, and Min OFV is the minimum OFV value provided by all algorithms for the corresponding problem.Since the metaheuristics have a stochastic nature, all metaheuristics had triple runs for each benchmark problem.The result tables show the average PG of three runs at each problem.SAMT has a main thread and a sub-thread during its runs.For the sake of fairness, the benchmark algorithms were run twice at each iteration, and the better solution was selected as if they ran two threads asynchronously.Algorithms were coded in the C# programming language.The codes were compiled and run on a single computer with Intel(R) Core (TM) i7-6500U CPU and 8 GB RAM.
The accuracies of the metaheuristic algorithms were verified by comparing them with the MILP results solved on the Carlier dataset.Each problem in the dataset was assigned seven due dates with different due dates creating schemes resulting in 56 test problems.The problems were solved using the Gurobi Solver optimally.Each algorithm could find the optimal result for each problem.
The results of the Reeves dataset are shared in Table A1

Comparative Analyses
The algorithms are evaluated under identical conditions and on the same test problems.Therefore, comparative analyses are performed through analysis of variance (ANOVA) considering solutions of the benchmark in terms of the robustness of the results.The two-way ANOVA results in Table 5 reveal that the factors job and algorithm, as well as their interaction, have a significant effect on the values of PG.Due to the significance, post hoc analyses were conducted to reveal the differences in the group × algorithm interaction with pairwise comparisons.A concern during this analysis is keeping the family-wise error rate under control [57].Tukey's HSD (honestly significant difference) and Scheffe tests were applied for pairwise comparisons.Tukey's HSD has a high sensitivity for pairwise comparisons in balanced conditions while the Scheffe method maintains the family-wise error rate under strict control [59].A p-value lower than 0.05 shows that the instances are significantly different.Small-sized problems, in terms of the number of jobs, are not taken into consideration for evaluation since all the algorithms return the same OFV for all problems in this group.
The comparison results of Tukey's HSD statistics for medium-sized problems are demonstrated in Table A3 of Appendix C. The results in Table A1 reveal that the TSRS algorithm performs significantly worse compared to the PSOI, PSOV, SA, and SAMT algorithms, whereas the TSES algorithm has worse results compared to the SA and SAMT algorithms in terms of OFV performance.The remaining values explain that there is no significant difference among the SA, SAMT, TSRI, TSEI, PSOI, and PSOV algorithms in this group.As a result, only the TS algorithms with the swap operator dissociate from the group since they provide considerably low-quality solutions.Tukey's HSD results for large-sized problems in Table A4 of Appendix C prove that the problems supply different solution qualities when the size of the problem increases.Only the means of three comparisons are similar according to Table A2, which are PSOV vs. PSOI, TSRI vs. PSOI, and TSRI vs. TSEI.The SAMT algorithm notably outperforms the remaining metaheuristic algorithms with a consistent solution quality at all levels.Figure 5 shows the increasing distinction of solution robustness provided by different algorithms in terms of PG means vs. the number of jobs.The post hoc results for PG according to the Scheffe method in Table A5 of Appendix C align with the results of Tukey's HSD.Having higher p-values to avoid type I error, the analysis suggests that the SAMT performs significantly better than TSRS considering medium-sized problems, in addition to outperforming all algorithms considering large-sized problems.
Apart from the robustness of the solution, the runtime to find the best OFV is another characteristic that should be assessed to determine the performance of the algorithms.Another two-way ANOVA table is established to understand whether there is a significant difference or not among the mean values of runtimes of the algorithms.
gorithms with a consistent solution quality at all levels.Figure 5 shows the increasing distinction of solution robustness provided by different algorithms in terms of PG means vs. the number of jobs.The post hoc results for PG according to the Scheffe method in Table A5 of Appendix C align with the results of Tukey's HSD.Having higher p-values to avoid type I error, the analysis suggests that the SAMT performs significantly better than TSRS considering medium-sized problems, in addition to outperforming all algorithms considering large-sized problems.Apart from the robustness of the solution, the runtime to find the best OFV is another characteristic that should be assessed to determine the performance of the algorithms.Another two-way ANOVA table is established to understand whether there is a significant difference or not among the mean values of runtimes of the algorithms.
The ANOVA results in Table 6 reveal that at least one mean of the runtimes is not equal to the remaining means.Only Tukey's HSD comparisons that are significantly different are listed in Table A6 of Appendix C, and the significant results from the Scheffe method are shown in Table A7.The p-value of Tukey's HSD test in Table A6 proves that the mean runtimes to find the best solution of the SA and SAMT algorithms are significantly different for problems with at least 50 jobs.The SAMT algorithm also provides a runtime advantage compared to all benchmark metaheuristics for problems with 200 and 500 jobs.Evaluating the results from the Scheffe method in Table A7 as a group suggests that SAMT can find the best OFV significantly more rapidly than SA, TSEI, PSOI, and PSOV for the problems that contain 200 jobs, as well as more rapidly than all algorithms in the case of the job number increasing to 500.According to Figure 6, it is possible to conclude that the SAMT algorithm needs The ANOVA results in Table 6 reveal that at least one mean of the runtimes is not equal to the remaining means.Only Tukey's HSD comparisons that are significantly different are listed in Table A6 of Appendix C, and the significant results from the Scheffe method are shown in Table A7.The p-value of Tukey's HSD test in Table A6 proves that the mean runtimes to find the best solution of the SA and SAMT algorithms are significantly different for problems with at least 50 jobs.The SAMT algorithm also provides a runtime advantage compared to all benchmark metaheuristics for problems with 200 and 500 jobs.Evaluating the results from the Scheffe method in Table A7 as a group suggests that SAMT can find the best OFV significantly more rapidly than SA, TSEI, PSOI, and PSOV for the problems that contain 200 jobs, as well as more rapidly than all algorithms in the case of the job number increasing to 500.According to Figure 6, it is possible to conclude that the SAMT algorithm needs a shorter runtime to find the best OV in comparison to the SA algorithm, thus yielding better or equal solutions.The SAMT algorithm provides better or equal objective function values in shorter run times compared to the SA algorithm.
a shorter runtime to find the best OV in comparison to the SA algorithm, thus yielding better or equal solutions.The SAMT algorithm provides better or equal objective function values in shorter run times compared to the SA algorithm.Figure 7 demonstrates the improvement by SAMT compared to the classical SA.The SAMT is compared to the average of two asynchronous runs of SA for being fair.In the case of GP, both Scheffe and Tukey's HSD post hoc analysis suggests that SAMT is significantly better for the problems in the large-sized set.The PG increases nearly linearly while the number of jobs increases as seen in Figure 7a.Considering the runtime to find the best solution, Tukey's HSD suggests that SAMT is better for problems with 75 or more jobs.The more conservative Scheffe method suggests that the number of jobs should be a minimum of 200 to derive significance.Figure 7b presents that the difference gradually increases with an exponential trend.The post hoc analysis and Figure 7 confirm that the proposed SAMT outperforms the classical SA in terms of both runtime and solution robustness, even if the SA runs two asynchronous threads and the better OFV of these threads is selected for comparison.Figure 7 demonstrates the improvement by SAMT compared to the classical SA.The SAMT is compared to the average of two asynchronous runs of SA for being fair.In the case of GP, both Scheffe and Tukey's HSD post hoc analysis suggests that SAMT is significantly better for the problems in the large-sized set.The PG increases nearly linearly while the number of jobs increases as seen in Figure 7a.Considering the runtime to find the best solution, Tukey's HSD suggests that SAMT is better for problems with 75 or more jobs.The more conservative Scheffe method suggests that the number of jobs should be a minimum of 200 to derive significance.Figure 7b presents that the difference gradually increases with an exponential trend.The post hoc analysis and Figure 7 confirm that the proposed SAMT outperforms the classical SA in terms of both runtime and solution robustness, even if the SA runs two asynchronous threads and the better OFV of these threads is selected for comparison.Figure 7 demonstrates the improvement by SAMT compared to the classical SA.The SAMT is compared to the average of two asynchronous runs of SA for being fair.In the case of GP, both Scheffe and Tukey's HSD post hoc analysis suggests that SAMT is significantly better for the problems in the large-sized set.The PG increases nearly linearly while the number of jobs increases as seen in Figure 7a.Considering the runtime to find the best solution, Tukey's HSD suggests that SAMT is better for problems with 75 or more jobs.The more conservative Scheffe method suggests that the number of jobs should be a minimum of 200 to derive significance.Figure 7b presents that the difference gradually increases with an exponential trend.The post hoc analysis and Figure 7 confirm that the proposed SAMT outperforms the classical SA in terms of both runtime and solution robustness, even if the SA runs two asynchronous threads and the better OFV of these threads is selected for comparison.

Discussion and Conclusions
In this paper, a novel parallel metaheuristic methodology named SAMT based on SA was proposed.The motivation of the study was to improve the poor runtime performance and search capability of the classical algorithm.In the methodology, a sub-thread runs in parallel to update the search direction of the main thread adaptively.The aim is to increase the capability of classical SA to find better solutions in shorter runtimes.The NWFSSP with earliness and tardiness objectives, F n | nwt| ∑ E j + ∑ T j , is considered for benchmarking.The literature review revealed that earliness and tardiness objectives for NWFSSP have not been widely studied.The most common practice to solve the research problem benefits from dispatching rules and heuristics.A major problem with dispatching rules and heuristics is their incapability to update themselves during runtime in contrast to metaheuristics.The study by Arabameri and Salmasi [18] was selected as the reference for benchmarking since the study included two important metaheuristic algorithms with different parameters.
The test runs and results of comparative analyses revealed that the SAMT algorithm could provide more robust solutions compared to the classical SA algorithm, variants of the PSO algorithms, and TS algorithms, whereby the solutions of the SAMT algorithm were slightly better in most cases of medium-sized problems and all cases of large-sized problems, even when the benchmark algorithms ran double asynchronous threads.Another contribution of this study was the enhancement of the runtime to provide a better solution in comparison to the SA algorithm.The SAMT algorithm consumed less time to find the best solution in large problems compared to benchmark problems.Unlike parallel computing, the proposed SAMT algorithm introduces independent parallel threads to enhance the robustness of the solution and overcome the runtime disadvantage of the classical SA algorithm.As intended, multiple threads of the SAMT algorithm grant a divergence-convergence strategy that enables the algorithm to explore the solution space more thoroughly and rapidly.The adaptive search strategy with a single slave thread is the novelty of this study.A temperature-dependent function stochastically determines the speed of the sub-thread at each run.Thus, the algorithm is adapted to jump into the solution space to decrease runtime and increase robustness.
The number of threads and parameters of the SA algorithm for each thread directly affected the performance of the SAMT algorithm.Hence, it is important to fine-tune each parameter systematically.An easy implementation for a single computer is through the FocusILS parameter tuning tool.The contribution of the study is the adaptive parameter tuning strategy of a single slave thread after analysis with the design of experiments (DOE).The purpose of the study was to present that the SA algorithm may be improved in terms of both time and result performance without excessive resource requirements, and the newly proposed algorithm is a robust methodology to solve the NWFSSP with the objective of addressing total earliness and tardiness.In future studies, the method may be enhanced by deploying it on multiple CPUs/GPUs with a distributed programming methodology.This method will be evaluated on different combinatorial optimization problems to confirm its efficiency.Another aspect will be to adapt different types of metaheuristics to a parallel methodology to solve the NWFSSP, before comparing them with the SAMT algorithm.

Appendix A
This appendix includes the descriptions of benchmark algorithms.
Appendix A.1.Tabu Search (TS) The TS algorithm was proposed by Glover [60] to solve integer programming problems.Furthermore, Glover et al. [61] published a user guide introducing perspectives for implementing TS on combinatorial or nonlinear problems.TS has been widely applied to scheduling problems.The algorithm iteratively improves the incumbent solution until the termination criteria are met.Short-term memory is utilized to restrict recent moves in the neighborhood to explore better solutions and avoid entrapment in local optima, while longterm memory helps to update the neighborhood dynamically to intensification [62].Recent moves are added to the tabu list (TL) and restricted for a defined number of iterations.Considering the study by Arabameri and Salmasi [18], the size of the TL is determined according to Equation (A1).
An aspiration criterion should be defined to enable a better move to be confirmed even if it is in the tabu list.The aspiration criterion for this study is set as the objective value of the incumbent solution.Any move that has a better objective function than the current best solution is accepted regardless of the TL.
The TS algorithm is an incremental metaheuristic algorithm that iteratively updates an initial solution.Arabameri and Salmasi [18] evaluated three types of initial solutioncreation mechanisms.Earliest due date (EDD) is the sequence in which the jobs are ordered according to their due dates in ascending order.The longest tardiness/earliness rate (LTER) rule considers the ratio of weights of earliness and tardiness, which are assumed to be "1" in this study.A random initial solution is formed with a sequence in which jobs are embedded in their positions randomly.Thus, two of these mechanisms are benchmarked in this study since LTER cannot grant an initial solution where all weights are equal to 1. Four combinations of TS algorithms are TS with the EDD initial solution and insertion neighborhood (TSEI), TS with a random initial solution and insertion neighborhood (TSRI), TS with the EDD initial solution and swap neighborhood (TSES), and TS with a random initial solution and swap neighborhood (TSRS).Until stopping criteria are met, a defined number of moves are carried out at each iteration, and the best move is selected as the new current solution if it is better than the current solution or meets the aspiration criteria.To improve the robustness of TS solutions, n moves are compared at each iteration to balance the tradeoff between the number of moves at each iteration and the total number of iterations.The pseudocode of the TS algorithm is shown in Algorithm A1.  [63] proposed the PSO algorithm as a social method to solve continuous nonlinear functions.The PSO algorithm is a population-based metaheuristic that iteratively updates its individuals.These individuals (namely, particles) represent solution instances that move with varying velocities and directions toward better solutions.The velocity and the direction of the particles are determined by the positions of both the global best solution and the best solution of the particle, the previous velocity, and the previous direction.The position of the i-th particle at the k-th iteration may be represented as in denotes its velocity.The best position until the k-th iteration of a particle that is called p-best may be denoted as B k = B k 1 , B k 2 , . . ., B k n .The global best particle at the k-th iteration (namely, gbest) is demonstrated with the notation The particles move toward p-best and g-best at each iteration to explore better solutions.Hence, the velocity of a particle may be updated according to Equation (A2) at each iteration as a function of the previous velocity, p-best, and g-best.
where w is the up-to-date inertia weight that arranges the impact of the previous velocity, c 1 and c 2 are acceleration coefficients, and rand 1 and rand 2 are uniform random numbers from the interval [0, 1].Upon calculation of V, P may be updated according to Equation (A3) at each iteration.
Being a powerful metaheuristic, the PSO algorithm also has drawbacks that limit convergence to the global best solution in some conditions.Commonly, the algorithm is trapped in local optima in high-dimensional spaces, and the rate of convergence is very low in the iterative process [64].Hence, many improved, hybrid, or integrated versions have been suggested by researchers.In line with the study by Arabameri and Salmasi [18], the PSO algorithm is supported by integrating two different local search algorithms, insertion (PSOI) and variable neighborhood structure (PSOV).The flow of the PSOI algorithm is shown in Algorithm A2.Pick random integers i and j from the set [1, n] 6: Set a new sequence S by executing Insert (i, j) on sequence S 7: Calculate OFV v of sequence S 8: Set S = S 10: Set v = v 11: Set k = k + 1 12: Do 13: Return S and v The PSOI algorithm has a simple local search strategy that iteratively collates the solutions acquired by random moves with an insert operation and updates the incumbent solution to check if the new sequence returns a better OFV.The PSOV algorithm constitutes a relatively complex local search methodology.The pseudocode of the PSOV local search methodology is summarized in Algorithm A3.PSO-VNS attempts to discover better solutions by consulting insertion and swap operations sequentially.Upon perceiving a better OFV, the sub-interchange operation is executed to investigate the solution space further with the aim of improving the solution.The local search terminates after a defined number of maximum iterations.Pick random integers i and j from the set [1, n] 6: Set a new sequence S by executing Insert (i, j)/Swap (i, j) on sequence S 7: Calculate OFV v of sequence S 8: Set a new sequence S by finding the best solution from 12: Subinterchange on sequence S 13: Calculate OFV v of sequence S 14: Set a new sequence S by executing Swap (i, j)/Insert (i, j)

Appendix C
The appendix contains the results of the post hoc analysis.

Figure 3 .
Figure 3. Swap operation of Job 2 and Job 4.

Figure 3 .
Figure 3. Swap operation of Job 2 and Job 4.
Processes 2023, 11, x FOR PEER REVIEW 9 of 27 terms of simplicity, runtime, efficiency, and implementation complexity.These policies include a different number of threads with distinct completion times and different schemes of solution updates.The threads in the proposed methodology have different roles.The fast thread decreases the runtime by jumping to better solutions in the vicinity of the current best solution with faster steps.This thread may also manipulate the search by orienting the direction toward different "hills" if the thread provides a better solution.The slow thread has different aims, such as enabling moves to longer distances in the vicinity, jumping to similar solutions of nearby optima, or diverging from local optima.Possible jumps after the completion of a sub-thread are presented in Figure 4.

Figure 4 .
Figure 4. Possible jumps by sub-threads on OFV curve.

Figure 4 .
Figure 4. Possible jumps by sub-threads on OFV curve.

Figure 5 .
Figure 5. Average percentage gap vs. no. of jobs.

Figure 5 .
Figure 5. Average percentage gap vs. no. of jobs.

Figure 6 .
Figure 6.Comparison of average runtimes to find the best solution.

Figure 7 .
Figure 7. Average PG (a) and runtime to find the best solution (b).Comparison of SA and SAMT for significantly different cases.

Figure 6 .
Figure 6.Comparison of average runtimes to find the best solution.

Processes 2023 ,
11, x FOR PEER REVIEW 15 of 27 a shorter runtime to find the best OV in comparison to the SA algorithm, thus yielding better or equal solutions.The SAMT algorithm provides better or equal objective function values in shorter run times compared to the SA algorithm.

Figure 6 .
Figure 6.Comparison of average runtimes to find the best solution.

Figure 7 .
Figure 7. Average PG (a) and runtime to find the best solution (b).Comparison of SA and SAMT for significantly different cases.

Figure 7 .
Figure 7. Average PG (a) and runtime to find the best solution (b).Comparison of SA and SAMT for significantly different cases.

Algorithm A2 PSOI Local Search 1 :
Set the iteration number k = 1 2: Set k max = The maximum number of iterations 3: Calculate OFV v for sequence S 4: While (k ≤ k max ) 5:

Algorithm A3 PSOV Local Search 1 :
Set the iteration number k = 1 2: Set k max = The maximum number of iterations 3: Calculate OFV v for sequence S 4: While (k ≤ k max ) 5:

Table 1 .
Runtime limits considering the size of the problem.

Table 2 .
The settings of the SAMT algorithm.

Table 3 .
The settings of PSO algorithms.

Table 4 .
of Appendix B. The table has solutions for 49 problems.The last digit in the problem names represents the index for the due date scheme.The results of the Taillard dataset are shown in Table A2 of Appendix B. The table contains 84 problems.The indices and corresponding TF and R values for due date creation are shown in Table 4. Due date creation scheme.

Table 5 .
ANOVA table for PG.

Table 6 .
ANOVA table for runtime.

Table 6 .
ANOVA table for runtime.
Comparison of SA and SAMT for significantly different cases.
Find the initial sequence S init by a construction heuristic or randomly 2: Set the best sequence S best = S init 3: Set the current sequence S curr = S init 4. Set tabu list TL = ∅ 5: Set k max = The maximum number of iterations 6: Set k = 1 7: While (k ≤ k max ) 8: Create the set N(S curr ) as neighbor solutions of S curr 9: Find the best solution S in the set N(S curr )

Table A3 .
Tukey's HSD results for algorithm × group for Group 2 considering PG.

Table A4 .
Tukey's HSD results for algorithm × group for Group 3 considering PG.

Table A5 .
Scheffe method's significant results for algorithm × group for GP.

Table A6 .
Tukey's HSD results for algorithm × job considering time to find the best OFV.