Next Article in Journal
How to Correctly Detect Face-Masks for COVID-19 from Visual Information?
Next Article in Special Issue
Overset Grid Assembler and Flow Solver with Adaptive Spatial Load Balancing
Previous Article in Journal
Analysis of Educational Data in the Current State of University Learning for the Transition to a Hybrid Education Model
Previous Article in Special Issue
Atomicity Violation in Multithreaded Applications and Its Detection in Static Code Analysis Process
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parallel Algorithm with Blocks for a Single-Machine Total Weighted Tardiness Scheduling Problem

by
Mariusz Uchroński
Department of Control Systems and Mechatronics, Wroclaw University of Science and Technology, 50-370 Wrocław, Poland
Appl. Sci. 2021, 11(5), 2069; https://doi.org/10.3390/app11052069
Submission received: 30 December 2020 / Revised: 12 February 2021 / Accepted: 22 February 2021 / Published: 26 February 2021
(This article belongs to the Special Issue Applications of Parallel Computing)

Abstract

:
In this paper, the weighted tardiness single-machine scheduling problem is considered. To solve it an approximate (tabu search) algorithm, which works by improving the current solution by searching the neighborhood, is used. Methods of eliminating bad solutions from the neighborhood (the so-called block elimination properties) were also presented and implemented in the algorithm. Blocks allow a significant shortening of the process of searching the neighborhood generated by insert type moves. The designed parallel tabu search algorithm was implemented using the MPI (Message Passing Interface) library. The obtained speedups are very large (over 60,000×) and superlinear. This may be a sign that the parallel algorithm is superior to the sequential one as the sequential algorithm is not able to effectively search the solution space for the problem under consideration. Only the introduction of diversification process through parallelization can provide an adequate coverage of the entire search process. The current methods of parallelization of metaheuristics give a speedup which strongly depends on the problem’s instances, rarely greater than number of used parallel processors. The method proposed here allows the obtaining of huge speedup values (over 60,000×), but only when so-called blocks are used. The above-mentioned speedup values can be obtained on high performance computing infrastructures such as clusters with the use of MPI library.

1. Introduction

Problems of scheduling tasks on a single machine with cost goal functions, despite the simplicity of formulation, mostly belong to the class of the most difficult (NP-hard) discrete optimization problems. In the literature many types of such problems differing in task parameters, functional properties of machines and criteria are considered. Starting from the simplest ones with minimizing the number of late tasks (problem denoted by 1 | | U i ) to complex ones with machine setups and time windows, where machine downtime may occur. Their optimization boils down to determining tasks starting times (or their order) that minimize the sum of penalties (costs of performing tasks). Optimal algorithms solve, within a reasonable time, examples with several tasks not exceeding 50 (80 in a multiprocessor environment, see [1]), therefore in practice there are almost exclusively approximate algorithms used. In the author’s opinion, the best of them are based on local search methods (as opposed to randomized methods, which will potentially give a different result each time you run, sometimes very good and sometimes poor).
In the problem considered in this work there is a set of tasks that must be performed on one machine. Each task has its requested execution time, deadline and weight of the objective function for tardiness. One should determine the order of performing tasks, minimizing the sum of delay costs. It is undoubtedly one of the most studied problems of scheduling theory. It belongs to the class of strongly NP-hard problems. First work on this subject, Rinnoy Kan et al. [2] was published in the 1970s. Despite the passage of over 40 years this problem is still attractive to many researchers. Therefore, its various variants are considered in the papers published in recent years: Cordone and Hosteins  [3], Rostami et al. [4], Poongothai et al. [5], Gafarow and Werner [6] polynomial algorithms for special cases) Ertem et al. [7]. A comprehensive review of the literature on scheduling problems with critical lines is presented in the following papers: Chenga, et al. [8], Adamu and Adewumi [9]. Single-machine scheduling problems with random execution times or desired completion dates are also considered in the literature (Rajba and Wodecki [10], Bożejko et al. [11,12]). In turn, the parallel algorithms for the single-machine scheduling problems are considered: parallel dynasearch algorithm [13], parallel population training algorithm [14], parallel path relinking for the problem version with setups [15]. The variant of single-machine scheduling is also studied in the port logistics Iris et al. [16,17,18].
The paper presents new properties of the problem under consideration, which will be used in local search algorithms. There was a neighborhood proposed based on insert moves and procedures for splitting permutations (solutions) into blocks (subpermutations). Their use leads to a significant reduction in neighborhood size. Elimination of “bad” solutions performed in such a way, considerably accelerates the calculations.
A taxonomy of parallel tabu search algorithms was proposed by Voß [19] referring to the classic classification of Flynn [20] characterizing parallel architectures (SIMD, MIMD, MISD and SISD models). Voß’s classification is independent from the classic search procedures taxonomy (single-walk and multiple-walk, see Alba [21]), differentiating the algorithms into:
  • SPSS (Single (Initial) Point Single Strategy)—allowing for parallelism only at the lowest level, such as objective functions calculation or parallel neighborhood search,
  • SPDS (Single (Initial) Point Different Strategies)—all processes start with the same initial solution, but they use different search strategies (e.g., different lengths of tabu list, different items stored in the tabu list, etc.)
  • MPSS (Multiple (Initial) Point Single Strategy)—processors begin operation from different initial solutions, using the same search strategy,
  • MPDS (Multiple (Initial) Point Different Strategies)—the widest class, embracing all previous categories as its special cases.
Here we propose the use of a parallel algorithm, tabu search multiple-walk version based on the MPSS model (Multiple starting Points Single Strategy). The MPI (Message Passing Interface) library is used for communication between computing processes run on several processors cores which execute its own tabu search processes.

Contributions

To sum up, the paper presents a new method of generating sub-neighborhoods based on the elimination properties of the block in a solution. Searching for sub-neighborhoods in the parallel tabu search algorithm gives a significant acceleration of calculations without losing the quality of the generated solutions. Compared to paper [1], where blocks have been introduced, here we consider the division into blocks of the entire permutation, and not only its middle fragment (for unfixed tasks, as in the work [1]). Moreover, the properties of blocks are used to eliminate solutions from the neighborhood.

2. Formulation of the Problem

In the formulation part and throughout the paper, we use the following notations:
n-number of tasks,
J -set of tasks,
p i -time of task execution,
w i -task’s tardiness cost factor,
d i -requested task completion time,
π -permutation of tasks,
Φ -set of all permutation of elements from J ,
N ( π ) -neighborhood of solution π ,
S i -task starting time i J ,
C i -task completion time i J ,
T i -task tardiness i ,
π T -semi block of early tasks,
π D -semi block of tardy tasks,
B -partition of permutation into blocks,
F ( ) -sum of tardiness costs (criterion),
Total Weighted Tardiness Problem, (in short TWT) can be formulated as follows:
TWT problem: Each task from the set J = { 1 , 2 , , n } is to be executed on one machine. Wherein the following restrictions must be met:
(a)
all jobs are available at time zero,
(b)
the machine can process at most one job at a time,
(c)
preemption of the jobs is not allowed,
(d)
associated with each job j J there is
(i)
processing time p j ,
(ii)
due date d j ,
(iii)
positive weight w j .
The order in which tasks will be performed must be determined, which minimizes the sum of the cost of delays. As in [22], we denote the problem as 1 | | w i T i .
Any solution to the considered problem (the order in which the tasks are performed on the machine) can be represented by the permutation of tasks (elements of the set J ). By Φ we denote the set of all such permutations.
Let π Φ be some permutation of tasks. For task π ( i ) ( i = 1 , 2 , , n ), let:
C π ( i ) be the completion time
of task execution,
T i = max { 0 , C i d i }tardiness,
f i ( C i ) = w i · T i tardiness cost.
In the considered problem there should be determined the order in which the tasks will be performed (permutation π Φ ) by machine, minimizing sum of tardiness costs, i.e., sum
F ( π ) = i = 1 n f π ( i ) ( C π ( i ) ) = i = 1 n w π ( i ) T π ( i ) .
The problem of minimizing the cost of tardiness is NP-hard (Lawler [23] and Lenstra et al. [22]). There were many papers published devoted to researching this problem. Emmons [24] introduced a partial order relationship on a set of tasks, thus limiting the process of searching the optimal solutions to some subset of the set of solutions. These properties are used in the best metaheuristic algorithms. Optimal algorithms (based on the dynamic programming method or branch and bound) were published by: Rinnoy Kan et al. [25], Potts i Van Wassenhove [26] oraz Wodecki [1]. Some of them were presented in the review of Abdul-Razaq et al. [27]. They are, however not very useful in solving most examples found in practice, because the calculation time increases exponentially relatively to the number of tasks. Hence, in a reasonable time one can solve only examples with   the number of tasks not exceeding 50 tasks (80 with the use of parallel algorithm [1]). There is extensive literature devoted to algorithms determining approximate solutions within acceptable time. Methods, on which the constructions of these algorithms are based, can be divided into the class of construction and correction style method.
Construction algorithms usually have low computational complexity. However, designated solutions may differ significantly (even by several hundred percent) from optimal ones. Most commonly used construction algorithms used in solving TWT problems are presented in the works of Fischer [28], Morton and Pentico [29] and in Potts’ and Van Wassenhove review [30].
In the correction algorithms we start with a solution (or a set of solutions) and we try to improve them by local search. Obtained in this way solution is the starting point in the next iteration of the algorithm. The most known implementations of the correction method to solving TWT problem are metaheuristics: tabu searches (Crauwels et al. [31]), simulated annealing (Potts and Van Wassenhove [30], Matsuo et al. [32]), genetic algorithm (Crauwels et al. [31]), the ant algorithm (Den Basten et al. [33,34]). A very interesting and effective implementation was also presented in the work of Congram et al. [35] and then it was developed by Grosso et al. [36]. Its main advantage is the use of neighborhood browsing procedure with an exponential number of elements in polynomial time.

3. Definitions and Properties of the Problem

For permutation π Φ C π ( i ) = j = 1 i p π ( j ) is completion time of execution of task π ( i ) ( i = 1 , 2 , , n ) in permutation π . The task π ( i ) is early), if its completion time is not greater than the requested completion time (i.e., C π ( i ) d π ( i ) ) and late), if this time is greater than the requested completion time i.e., C π ( i ) > d π ( i ) .
Hence, Total Weighted Tardiness (TWT) Problem consists of determining the optimal permutation π * Φ , which minimizes the value of the criterion function F  on the set Φ , such for which
F ( π * ) = min { F ( π ) : π Φ } .
First, we introduce certain methods of aggregating tasks for generating blocks. In any permutation π Φ , there are subpermutations (subsequences of following tasks) for which:
(1)
execution of each task from subpermutation ends before its desired completion time (all tasks are early), or
(2)
execution of each task from subpermutation ends after its desired deadline (all tasks are tardy).
In the further part we present two types of blocks: early tasks blocks and  tardy tasks ones. They will be used to eliminate worse solutions.

Blocks of Tasks

This section briefly introduces the definitions and properties of blocks and algorithms for their determination. They were described in detail in the work by Wodecki [1].
Blocks of early tasks
Subpermutation of tasks π T in permutation π Φ is T -block, if:
(a)
each task j π T is early and d j C l a s t , where C l a s t is the completion time of the last task in π T ,
(b)
π T is maximum subpermutation (in sense of number of elements) satisfying the restriction (a).
It is easy to see that if π T is T -block, then the inequality min { d j : j π T } C l a s t is satisfied. Therefore, in any permutation of elements from   π T  every task in permutation π is early. Using this property, we present the algorithm determining the first T -block in permutation π .
Input of Algorithm 1 is permutation π , and output some T -block of this permutation. In line 1 first due date job is determined. Next in lines 4–7, it is checked whether adding another due date task to the block will keep the following property: in any permutation of these tasks, all tasks are on time. The computational complexity of the algorithm is O ( n ) .
Algorithm 1: A T -block
Applsci 11 02069 i001
Blocks of tardy tasks.
Subpermutation of tasks π D in permutation π Φ is called D -block, if:
(a′)
each task j π D is tardy and d j < S f i r s t + p j , where S f i r s t is the starting time of the first task in π D ,
(b′)
π D is the maximum subpermutation (in sense of number of elements) satisfying the constraint (a ).
It is easy to see that in any permutation of elements from π D each task (belonging to π D ) in permutation π is tardy.
Similarly, as for T -block, according to the above definition, we present the algorithm for determining the first D -block in permutation π .
Input of Algorithm 2 is permutation π , and output some D -block of this permutation. In line 1 first tardy job is determined. Next, in lines 5–8, it is checked whether adding another tardy job to the block will keep the following property: in any permutation of these tasks, all tasks are tardy. The computational complexity of the algorithm is O ( n ) .
Algorithm 2: A D -blok
Applsci 11 02069 i002
Theorem 1
([1]). For any permutation π Φ there is a partition of π into subpermutations such, in which each of them is:
(i) 
T -block or
(ii) 
D -block.
The algorithm for splitting permutation π Φ into blocks has computational complexity O ( n ) .
Property 1.
From block definition and from Theorem 1:
1.
Each task belongs to a certain T or D block,
2.
Different blocks contain various elements,
3.
Two T or D blocks can appear directly next to each other,
4.
A block can contain only one task.
5.
The partition of permutations into blocks is not explicit. According to the block definitions and the Theorem 1:
If π D is D -block in permutation π , then for any task π ( i ) π D
d π ( i ) < S f i r s t + p π ( i ) .
Therefore, the cost function of performing this task
f π ( i ) ( x ) = w π ( i ) ( x d π ( i ) ) , for x S f i r s t + p π ( i ) ,
is a linear function. It follows from Smith’s theorem [37] that tasks in π D occur in optimal order if and only if
w π ( i 1 ) p π ( i 1 ) w π ( i ) p π ( i ) , i = a + 1 , a + 2 , , b
where subpermutation π D = ( π ( a ) , π ( a + 1 ) , , π ( b ) , 1 a < b n .
Permutation π Φ is ordered (in short D -OPT) due to the partition into blocks, if in each D -block each pair of neighboring tasks meets the relation (3), hence they appear in the optimal order.
Theorem 2
([1]). A change in the order of tasks in any block of permutation D -OPT does not generate permutations with a smaller value of criterion function.
From the above statement follows the so-called block elimination property. It will be used while generating neighborhoods.
Corollary 1.
For any D -OPTpermutation π Φ , if β Φ and
F ( β ) < F ( π ) ,
then in permutation β at least one task of some block from partition π was swapped before the first or the last task of this block.
Therefore, by generating from D -OPT permutation π Φ new solutions to the TWT problem we will only swap element of the block before the first or after the last element of this block.

4. Moves and Neighborhoods

The essential element found in approximate algorithms solving NP-hard optimization problems based on the local search method is neighborhood—mapping:
N : Φ n 2 Φ ,
attributing to every element π Φ a certain subset of N ( π ) set of acceptable solutions Φ , N ( π ) Φ .
The number of elements of the neighborhood, the method of their determination and browsing has a decisive impact on efficiency (calculation time and criterion values) of the algorithm based on the local search method. Classic neighborhoods are generated by transformations commonly known as moves, i.e., “minor” changes of certain permutation elements consisting of:
1.
Swapping positions of two elements in permutation—swap move s l k , changes the position of two elements π ( k ) and π ( l ) (found respectively in k and l in  π positions), generating permutation s l k ( π ) = π l k . In short, it will be called s-move. The computational complexity of s-move execution is O ( 1 ) .
2.
Moving the element in the permutation to a different position—insert move i l k , swaps element π ( k ) (from position k in π ) to position l, generating permutation i l k ( π ) = π l k . The insert type move will be abbreviated to i-move. Its computational complexity is O ( 1 ) .
One way to determine the permutation neighborhood is to define the set of moves that generate them. If M ( π ) is a certain set of moves specified for permutation π Φ , then
N ( π ) = { m ( π ) : m M ( π ) }
is neighborhood π generated by moves from M ( π ) .
In each iteration of an algorithm based on a local search method using the neighborhood (moves) generator, a subset of a set of solutions - neighborhood is determined. Let
B = [ B 1 , B 2 , B ν ] ,
be a partition of ordered ( D -OPT) permutation π into blocks.
We consider the task π ( j ) belonging to a certain block from the division B . Moves that can bring improvement of the criterion values consist of swapping the task π ( j ) (before) the first or (after) the last task of this block. Let M b f j and M a f j be sets of these moves (i.e., respectively all such i-moves and s-moves). These sets are symbolically shown in the Figure 1. The task π ( a k ) is the first whereas a π ( b k ) is the last element of B k block, which also includes the considered task π ( j ) .
Let
M ( π ) = j = 1 n M b f j j = 1 n M a f j ,
be a set of all moves that can bring improvement (see Corollary 1), i.e., moves before or after blocks of some π permutation.
Some properties of i-moves and s-moves were proven, which can be used to determine subneighborhood. These are both elimination criteria and procedures for determining sets of moves and their representatives.

4.1. Properties of Insert Moves

The move m * M ( π ) is a representative of a set of moves from a certain set of moves W M ( π ) , if
r W , F ( r ( π ) ) F ( m * ( π ) ) .
Let us assume that permutation π Φ is D -OPT, and the neighborhood is generated by insert type of (i-moves). If i l k is a representative of the moves of the set M k M ( π ) ( i l k M k ), then moves belonging to M k are removed from the set M b f k i M a f k i.e., they are modified as follows:
M b f k M b f k \ M k and M a f k M a f k \ M k .
This procedure makes it possible, in the process of generating the neighborhood, to omit the elements that do not directly improve the value of the criterion.
Theorem 3.
If task π ( k ) after swapping to the position l (after the move i l k ), 1 k < l n , in permutation π is early (i.e., C π ( l ) d π ( k ) ) , then for a pair of moves i l 1 k , i l k there is
F ( π l 1 k ) F ( π l k ) .
Proof. 
Let us assume that the task π ( k ) after swapping to position l (i.e., executing the move i l k ) in permutation π is early. We consider two moves: i l k and i l 1 k . In permutations π l k and π l 1 k generated by these moves there occurs:
π l k ( j ) = π l 1 k ( j ) for j = 1 , 2 , , l 2 , l + 1 , , n ,
π l 1 k ( l 1 ) = π l k ( l ) = π ( k ) , π l k ( l ) = π l k ( l 1 ) = π ( l ) .
We present both permutations.
P o s i t i o n k 1 , k , l 2 , l 1 , l , l + 1 π l 1 k = ( π ( k 1 ) , π ( k + 1 ) π ( l 1 ) , π ( k ) , π ( l ) ̲ , π ( l + 1 ) ) π l k = ( π ( k 1 ) , π ( k + 1 ) π ( l 1 ) , π ( l ) ̲ , π ( k ) , π ( l + 1 ) )
Since tasks π l 1 k ( l 1 ) , π l k ( l ) are assumed to be early and task completion time π l 1 k ( l ) , C π l 1 k ( l ) = C π l k ( l 1 ) + p π ( k ) , hence F ( π l 1 k ) F ( π l k ) .    □
Theorem 4.
Let π ( a ) be the first and π ( b ) the last element of some T -block of permutation π. If for task π ( k ) ( 0 < k < a ) the requested completion time d π ( k ) ) < C π ( a ) , then
F ( π l k ) F ( π l + 1 k ) , l = a 1 , a , a + 1 , , b 1 , b .
Proof. 
For task π ( k ) , 0 < k < a , we consider two moves i l k and i l + 1 k , l = a 1 , a , , b . Generated by these moves permutations π l k and π l + 1 k there is: π l k ( j ) = π l + 1 k ( j ) for j = 1 , 2 , , l 1 , l + 2 , , n and π l k ( l + 1 ) = π l + 1 k ( l ) . Since π l k ( l ) = π l + 1 k ( l + 1 ) = π ( k ) , therefore the completion time of the task π l + 1 k ( l + 1 ) , C π l + 1 k ( l + 1 ) = C π l k ( l ) + p π ( l + 1 ) , hence F ( π l k ) F ( π l + 1 k ) .    □
For task π ( k ) ( 1 k n ) , let
τ ( k ) = 0 , if d π ( k ) < p π ( k ) , max { j : 1 j n C π j k ( j ) d π ( k ) } , otherwise .
Thus, this is the last position in the permutation π , on which the task π ( k ) is swapped, after execution of i-move, is early. Therefore, the task π ( k ) is early in each permutation π l k , l = 1 , 2 , , τ ( k ) .
Corollary 2.
For a sequence of moves i 1 k , i 2 k , , i τ ( k ) k of task π ( k ) there:
F ( π 1 k ) F ( π 2 k ) F ( π τ ( k ) k ) ,
where parameter τ ( k ) is designated by(6).
Proof. 
The move i l k , l = 1 , 2 , , τ ( k ) generates permutation π l k , in which the task π ( k ) is early. Using the Theorem 3, it is easy to show that F ( π 1 k ) F ( π 2 k ) F ( π τ ( k ) k ) .    □
Therefore, the move i τ ( k ) k is a representative of a set { i 1 k , i 2 k , , i τ ( k ) 1 k } , hence these moves can be omitted by modifying accordingly the sets:
M b f k M b f k \ { i 1 k , i 2 k , , i τ ( k ) 1 k } ,
M a f k M a f k \ { i 1 k , i 2 k , , i τ ( k ) 1 k } .
Corollary 3.
Let π ( a ) i π ( b ) be the first and the last task T -bloku in permutation π. If 1 k < a and a τ ( k ) b , where parameter τ ( k ) is defined in (6), then for the sequence of moves i τ ( k ) + 1 k , i τ ( k ) + 2 k , , i b 1 k , i b k there is:
F ( π τ ( k ) + 1 k ) F ( π τ ( k ) + 2 k ) F ( π b 1 k ) F ( π b k ) .
Proof. 
The move i l k , l = τ ( k ) + 1 , τ ( k ) + 2 , , b generates permutation π l k , in which the task π ( k ) is tardy. Using the Theorem 4, it is easy to show that
F ( π τ ( k ) + 1 k ) F ( π τ ( k ) + 2 k ) F ( π b 1 k ) F ( π b k ) .
  □
The move i τ ( k ) + 1 k is a representative of a set { i τ ( k ) + 2 k , i τ ( k ) + 3 k , , i b 1 k , r b k } . It is then possible to assume
M a f k M a f k \ { i τ ( k ) + 2 k , i τ ( k ) + 3 k , , i b k } .
Computational complexity of algorithms checking for each proven property 2–3 is  O ( n ) .

4.2. Properties of Swap Moves

To eliminate some s-moves from neighborhoods there will be blocks and elimination criteria used.
There is a partial relation ’→’ introduced on the set of tasks J . Let Γ ( i ) and Γ + ( i ) be respectively, the sets of predecessors and successors of tasks i J in relation →. Properties enabling to determine the elements of the relationship →  are called in the literature elimination criteria.
Theorem 5.
If one of the conditions is met:
    (a)
p r p j , w r w j , d r d j , lub
    (b)
p r p j , w r w j , d r l Γ j p l + p j , lub
    (c)
w r w j , d r d j , d r l N \ Γ j + p l + p j ,
then there is an optimal solution in which the task r precedes j, i.e., r j .
Proof. 
Condition (a) is proved by Shwimer [38]. Conditions (b) and (c) are a generalized version of theorem 1 and 2 from Emmons’ work [24].    □
After establishing a new relationship between the task i and j to avoid a cycle, there should be a transitive closure of relations given, i.e., one should modify the sets accordingly:
l { i } Γ i , Γ l + Γ l + { j } Γ j + , l { j } Γ j + , Γ l Γ l { i } Γ i .
Using the relation → (using the statement 5), from the set of s-moves generating the neighborhood, there will be some elements removed.
Corollary 4.
For any s-move s l k M a f k M b f k , if
π ( l ) Γ π ( k ) + , t h e n M a f k M a f k \ { s l k } ,
and if
π ( l ) Γ π ( k ) , t h e n M b f k M b f k \ { s l k } .
Proof. 
It follows directly from the Theorem 5.    □

5. Construction of Algorithm

The local search algorithm starts with some startup solution. Then its neighborhood is generated and a certain element is selected from it, which is assumed as the starting solution for the next iteration. Thus, the space of solutions is browsed with the “moves”, from one element to the other. This process continues until certain stop criterion is met. In this way a sequence (trajectory) of solutions is created, from which the best element is the result of an algorithm running.
One of the deterministic and most commonly used implementations of the local search method is tabu search (TS for short). Its main ideas were presented by Glover in the works of [39,40] and monographs Glover and Lagoon [41]. To avoid ‘looping’ (going back to the same solution), a short-term memory mechanism is introduced, tabu (solutions or prohibited movements). performing motion—determining the starting solution in the next iteration—its attributes are remembered on the list. Generating new neighborhood solutions, whose attributes are on the list, are omitted, except for those meeting the so-called aspiration criterion (i.e., “exceptionally favorable”). The basic elements of this method are:
  • move—function transforming startup solution into another (one of the elements of the neighborhood),
  • neighborhood—set of solutions determined by moves belonging to a certain set,
  • tabu list—list containing attributes of recently considered startup solutions,
  • stop criterion—in practice, it is a fixed number of iterations or algorithm calculation time.
Let x X be any startup solution LT tabu list and   Ψ  selection criterion (i.e., function enabling comparing elements of the neighborhood), usually it is a goal function F .
Computational complexity of a single algorithm iteration depends on the number of elements of the neighborhood, the procedure for generating its elements and the complexity of the function that calculates the criterion value. A detailed description of the implementation of this algorithm for a single-machine task scheduling problem is presented in the work of Bożejko et al. [42]

5.1. Construction Algorithms

Most construction algorithms are quick and simple to implement, unfortunately the solutions they designate are usually “far from optimal”. Hence, they are almost exclusively used to determine “good” startup solutions for other algorithms. In case of the TWT problem, the following algorithms (or their simple modifications and different hybrids) has been used for years:
  • SWPT—Shortest Weighted Processing Time, (Smith [37]),
  • EDD—Earliest Due Date, (Baker [43]),
  • COVERT—Cost Over Time, (Potts and Van Wassenhove [30]),
  • AU—Apparent Urgency, (Potts and Van Wassenhove [30]),
  • META—Metaheuristic, (Potts and Van Wassenhove [30]).
The first two have a static priority function and computational complexity of O ( n ln n ) , whereas in the other two there is dynamic priority function used and their computational complexity is O ( n 2 ) .

5.2. Tabu Search Algorithm

Considered in this work problem were solved using tabu search algorithm. Next, we will describe each element of the algorithm in more detail.
Neighborhood. In TS algorithm there is neighborhood used, generated by swap and insert moves. Let B be a division of D -OPT permutation π into blocks. Using the elimination block properties for i m o v e s we determine (Corollaries 2–3) a set of moves before and after the block
M ( π ) = j I ( M b f j M a f j ) .
Next:
1.
According to Corollary 2 and 3 z M ( π ) we remove some subsets of i m o v e s leaving only their representatives.
2.
We remove s m o v e s , whose performance means that one of the conditions of the Theorem 5 is not met
Therefore, the neighborhood of permutation π
N ( π ) = { m l k ( π ) : m l k M ( π ) } .
The procedure for determining the neighborhood has a complexity O ( n 2 ).   
Startup solution. For each example, in TS algorithm, there were adopted startup solutions determined by the best construction algorithms: SWPT, EDD, COVERT, AU and META algorithms. Their description is presented in Section 5.1.
Stop condition. A stop condition for completing the calculations of both algorithms was the maximum number of iterations. In the process of parallel implementations of TS algorithm was a strategy of many computing processors working in parallel used. The maximum number of iterations for each processor was 1000 / p , where p is the number of processors.
Tabu List in TS algorithm. To prevent the formation of cycles too quickly (i.e., a return to the same permutation, after a small number of iterations of the algorithm), some attributes of each move are remembered on the so-called tabu list (abbreviated to LT ). It is supported on a FIFO queue basis. Making a move m j r M ( π ) (i.e., generating from π Φ permutation π j r ) , we write some attributes of this move on the list, i.e., threesome: ( π ( r ) , j , F ( π j r ) ) . Let us assume that we are considering the move m l k M ( β ) generating permutation β l k . If on the list LT there is a threesome ( r , j , Y ) such that β ( k ) = r , l = j and F ( β l k ) Y , then this move is eliminated (removed) from the set M ( β ) .

6. Parallelization of Algorithms

The proposed tabu search algorithm (in short TSA) has been parallelized using the MPI library using the scheme of independent search processes with different starting points (MSSS according to Voß [19] classification) On the cluster platform, there were parallel non-cooperative processes implemented with mechanism for diversifying startup solutions based on the Scatter Search idea. Each of the processors modified the solution generated by the MERTA algorithm by performing a certain number of swap moves proportional to the size of the problem and the number of a processor. A parallel reduction mechanism (MPI_Bcast) was used to collect the data. MWPTSA (Multiple-Walk Parallel Tabu Search Algorithm) parallel algorithm’s pseudocode is given in Algorithm 3, and the scheme in Figure 2.
Algorithm 3: Multiple-Walk Parallel Tabu Search Algorithm (MWPTSA)
Applsci 11 02069 i003

7. Computational Experiments

The parallel tabu search MWWTS algorithm has been implemented in C++ using the MPI library. The calculations were made on the BEM cluster installed in the Wrocław Network and Supercomputing Center. Parallel computing tasks were run on Intel Xeon E5-26702.30GHz on processors under the control of the PBS queue system.
Examples of test data of various sizes and degrees of difficulty, on which the calculations were made, were divided into two groups:
(a) the first group includes 375 examples of three different sizes ( n = 40 , 50 , 100 ). Together with the best solutions, they are placed on the OR-Library [44] website.
(b) the second group embracing test instances for n = 200 , 500 , 1000 was generated in following way:
  • p i – random integer from range [ 1 , 100 ] with uniform distribution,
  • w i – random integer from range [ 1 , 10 ] with uniform distribution,
  • d i – random integer from range [ P ( 1 T F R D D / 2 ) , P ( 1 T F + R D D / 2 ) ] with uniform distribution.
Where P = i = 1 v p i , R D D = 0.2 , 0.4 , 0.6 , 0.8 , 1.0 (relative range of due dates) and F T = 0.2 , 0.4 , 0.6 , 0.8 , 1.0 (average tardiness factor). For each of the 25 pairs of values of R D D and F T five instances were generated. Overall, 325 instances were generated – 125 for each value of n. Test instances was published on web page [45].
Computational experiments of the algorithms presented in the work were carried out in several stages. First, the construction algorithms SWPT, EDD, COVERT, AU and META algorithms, presented in Section 5.1, were compared. Calculations were made on examples from group a). The obtained results (percentage relative error (11)) are presented in Table 1, compared to the reference solutions taken, as well as benchmarks data, from the OR-Library [44].
The best solutions were determined by the META algorithm [30]. The solutions of this algorithm are the starting point of the TSA algorithm. When running these algorithms on multiple processors, each of them started from a different initial solution. Differentiation of initial solutions was made by running a sequential algorithm with a small number of iterations 10 i , i = 1 , 2 , , p , where p is the number of processors.
To determine the values of the parameters of approximation algorithms, preliminary calculations were made on a small number of randomly determined examples. Based on the analysis of the results obtained, the following were adopted:
  • length of tabu list: 7,
  • length of the list for a long-term memory: 5,
  • algorithm for determining the startup solution: META,
  • algorithm’s stop condition: computation time t = 120 s.
As the solution of the parallel algorithm there was the best solution obtained by individual processors. The percentage relative error was determined for each solution:
P R D = F r e f F a l g F r e f · 100 % ,
where:
F r e f —value of the reference solution obtained with META algorithm,
F a l g —the value of the solution determined by tested algorithm.
Table 2 contains PRD improvement values relative to the solution obtained by the META algorithm, which is also a starting solution for TSA (that is why they are negative) for bigger instances of the sizes ( n = 200 , 500 and 1000).
Table 3 contains PRD improvement values relative to the solution obtained by the META algorithm, which is also a starting solution for parallel TSA (that is why they are negative) for bigger instances of the sizes ( n = 200 , 500 and 1000). Experiments have been conducted for the number of processors p = 8 , 16 , 32 and 64. One can observe that increasing of the number of processors results in improving of the quality of the obtained solutions especially for algorithm with blocks.
Table 4 and Table 5 contain speedup results. The values have been calculated from the relationship s = t s / t p , where
  • t s time during which the sequential algorithm obtains the F solution,
  • t p is the time after which the parallel algorithm obtains the F solution.
From the results presented in the table, it can be seen that high speedups are obtained for smaller size problems ( s T S b values over 60,000 for n = 40 ). The block algorithm gives better acceleration (on average more than 2600 times better). Such results are probably the result of the fact that the size of the problem solution space (40!, 50!, 100!, 200!, 500!, 1000!) increases much faster than the number of processors increases (8, 16, 32, 64). The increase in speedup within a single size of the problem is also not large as shown above. The large disproportion in speedup of the algorithm with and without blocks results from the fact that the block algorithm strongly limits the space of the solutions being reviewed.
As one can see increasing the size of the problem reduces the value of the speedup obtained – which is probably due to the limited computation time for a larger space for search solutions; however, increasing the number of processors does not have a big impact on acceleration values even when increasing the size of the problem.

8. Conclusions

Based on the problem analysis and computational experiments performed, we can suggest the following conclusions. Tabu search method using block properties allows the solution of the problem not only faster than classic tabu search metaheuristic without using these properties, but also with huge speedups. Speedups achieved of the parallel tabu search algorithm with blocks are much greater than for classic tabu search algorithm, which confirms the legitimacy of using block elimination properties. The fact of achieving such huge speedups should be the subject of further research. Also proposed blocks elimination criteria can be used for constructing efficient parallel algorithms solving other NP-hard scheduling problems.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Test instances for a single-machine total weighted tardiness scheduling problem generated during the study are available online: https://zasobynauki.pl/zasoby/51561 (accessed on 25 February 2021).

Acknowledgments

The paper was supported by the National Science Centre of Poland, grant OPUS no. 2017/25/B/ST7/02181. Calculations have been carried out using resources provided by Wroclaw Centre for Networking and Supercomputing (http://wcss.pl (accessed on 25 February 2021)), grant No. 96.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wodecki, M. A Branch-and-Bound Parallel Algorithm for Single-Machine Total Weighted Tardiness Problem. Int. J. Adv. Manuf. Technol. 2008, 37, 996–1004. [Google Scholar] [CrossRef]
  2. Rinnoy Kan, A.H.G.; Lageweg, B.J.; Lenstra, J.K. Minimizing total costs in one-machine scheduling. Oper. Res. 1975, 25, 908–927. [Google Scholar] [CrossRef]
  3. Cordone, R.; Hosteins, P. A bio-objective model for the singla-machine scheduling with rejection cost and total tardiness minimization. Comput. Oper. Res. 2019, 102, 130–140. [Google Scholar] [CrossRef] [Green Version]
  4. Rostami, S.; Creemers, S.; Leus, R. Precedence theorems and dynamic programming for the single machine weighted tardiness problem. Eur. J. Oper. Res. 2019, 272, 43–49. [Google Scholar] [CrossRef] [Green Version]
  5. Poongothai, V.; Godh araman, A.P.; Arockia, J. Single machine scheduling problem for minimizing total tardiness of a weighted job in a batch delivery system, stochastic rework and reprocessing times. Aip Conf. Proc. 2019, 2112, 020132. [Google Scholar] [CrossRef]
  6. Gafarow, E.R.; Werner, F. Minimizing Total Weighted Tardiness for Scheduling Equal-Length Jobs on a Single Machine. Autom. Remote. Control 2020, 81, 853–868. [Google Scholar] [CrossRef]
  7. Ertem, M.; Ozcelik, F.; Saraç, T. Single machine scheduling problem with stochastic sequence-dependent setup times. Int. J. Prod. Res. 2019, 57, 3273–3289. [Google Scholar] [CrossRef]
  8. Chenga, T.C.E.; Nga, C.T.; Yuanab, J.J.; Liua, Z.H. Single machine scheduling to minimize total weighted tardiness. Eur. J. Oper. Res. 2005, 165, 423–443. [Google Scholar] [CrossRef]
  9. Adamu, M.O.; Adewumi, A.O. A survey of single machine scheduling to minimize weighted number of tardy jobs. J. Ind. Manag. Optim. 2014, 10, 219–241. [Google Scholar] [CrossRef]
  10. Rajba, P.; Wodecki, M. Stability of scheduling with random processing times on one machine. Oper. Res. 2012, 25, 169–183. [Google Scholar] [CrossRef]
  11. Bożejko, W.; Rajba, P.; Wodecki, M. Stable scheduling of single machine with probabilistic parameters. Bull. Pol. Acad. Sci. Tech. Sci. 2017, 65, 219–231. [Google Scholar] [CrossRef] [Green Version]
  12. Bożejko, W.; Rajba, P.; Wodecki, M. Scheduling Problem with Uncertain Parameters in Just in Time System; Lecture Notes in Artificial Intelligence No. 8468; Springer: Cham, Switzerland, 2014; pp. 456–467. [Google Scholar]
  13. Bozejko, W.; Wodecki, M. A Fast Parallel Dynasearch Algorithm for Some Scheduling Problems. In Proceedings of the International Symposium on Parallel Computing in Electrical Engineering (PARELEC’06), Bialystok, Poland, 13–17 September 2006; pp. 275–280. [Google Scholar] [CrossRef]
  14. Bożejko, W.; Wodecki, M. Parallel population training algorithm for single machine total tardiness problem. In Artificial Intelligence and Soft Computeing; Cader, A., Rutkowski, L., Tadeusiewicz, R., Zurada, J., Eds.; Academic Publishing House EXIT: Warsaw, Poland, 2006. [Google Scholar]
  15. Bożejko, W. Parallel path relinking method for the single machine total weighted tardiness problem with sequence-dependent setups. J. Intell. Manuf. 2010, 21, 777–785. [Google Scholar] [CrossRef]
  16. Iris, C.; Christensen, J.; Ropke, S. Flexible ship loading problem with transfer vehicle assignment and scheduling. Transp. Res. Part Methodol. 2018, 111, 113–134. [Google Scholar] [CrossRef] [Green Version]
  17. Iris, C.; Pacino, D.; Ropke, S.; Larsen, A. Integrated berth allocation and quay crane assignment problem: Set partitioning models and computational results. Transp. Res. Part Logist. Transp. Rev. 2015, 81, 75–97. [Google Scholar] [CrossRef] [Green Version]
  18. Iris, C.; Lalla-Ruiz, E.; Lam, J.S.L.; Voß, S. Mathematical programming formulations for the strategic berth template problem. Comput. Ind. Eng. 2018, 124, 167–179. [Google Scholar] [CrossRef]
  19. Voß, S. Tabu search: Applications and prospects. In Network Optimization Problems; Du, D.Z., Pardalos, P.M., Eds.; World Scientific Publishing Co.: Singapore, 1993; pp. 333–353. [Google Scholar]
  20. Flynn, M.J. Very high-speed computing systems. Proc. IEEE 1966, 54, 1901–1909. [Google Scholar] [CrossRef] [Green Version]
  21. Alba, E. Parallel Metaheuristics: A New Class of Algorithms; Wiley & Sons Inc: Hoboken, NJ, USA, 2005. [Google Scholar]
  22. Lenstra, J.K.; Rinnoy Kan, A.G.H.; Brucker, P. Complexity of Machine Scheduling Problems. Ann. Discret. Math. 1977, 1, 343–362. [Google Scholar]
  23. Lawler, E.L.A. pseudopolynomial algorithm for sequencing jobs to minimize total tardiness. Ann. Discret. 1977, 1, 331–342. [Google Scholar]
  24. Emmons, H. One machine sequencing to Minimize Certain Functions of Job Tardiness. Oper. Res. 1969, 17, 701–715. [Google Scholar] [CrossRef]
  25. Lawler, E.L. Efficient Implementation of Dynamic Programming Algorithms for Sequencing Problems; Report BW 106; Mathematisch Centrum: Amsterdam, The Netherlands, 1979. [Google Scholar]
  26. Potts, C.N.; Van Wassenhove, L.N. A Branch and Bound Algorithm for the Total Weighted Tardiness Problem. Oper. Res. 1985, 33, 177–181. [Google Scholar] [CrossRef]
  27. Abdul-Razaq, T.S.; Potts, C.N.; Van Wassenhove, L.N. A survey of algorithms for the single machine total weighted tardiness scheduling problem. Discret. Appl. Math. 1990, 26, 235–253. [Google Scholar] [CrossRef] [Green Version]
  28. Fisher, M.L. A Dual Algorithm for the One Machine Scheduling Problem. Math. Program. 1976, 11, 229–252. [Google Scholar] [CrossRef]
  29. Morton, E.T.; Pentico, D.W. Heuristic Scheduling Systems with Applications to Production Systems and Project Management; Wiley: Ney York, NY, USA, 1993. [Google Scholar]
  30. Potts, C.N.; Van Wassenhove, L.N. Single Machine Tardiness Sequencing Heuristics. IIE Trans. 1991, 23, 346–354. [Google Scholar] [CrossRef]
  31. Crauwels, H.A.J.; Potts, C.N.; Van Wassenhove, L.N. Local Search Heuristics for the Single machine Total Weighted Tardiness Scheduling Problem. INFORMS J. Comput. 1998, 10, 341–350. [Google Scholar] [CrossRef] [Green Version]
  32. Matsuo, H.; Suh, C.J.; Sullivan, R.S. Controlled Search Simulated Annealing Method for the General Job-Shop Scheduling Problem; Working Paper 03-04-88; Department of Management, Graduate School of Business, The University of Texas at Austin: Austin, TX, USA, 1988. [Google Scholar]
  33. Den Basten, M.; Stützle, T. Ant colony optimization for the total weighted tardiness problem, Precedings of PPSN–VI. Eur. J. Oper. Res. 2000, 1917, 611–620. [Google Scholar]
  34. Den Basten, M.; Stützle, T.; Dorigo, M. Design of Iterated Local Search Algoritms an Example Application to the Single Machine Total Weighted Tardiness Problem; Evo Worskshop; Lecture Notes in Computer Science; Boers, J.W., Ed.; Springer: Berlin, Germany, 2001; Volume 2037, pp. 441–451. [Google Scholar]
  35. Congram, R.K.; Potts, C.N.; van de Velde, S.L. An Iterated Dynasearch Algorithm for the Single-Machine Total Weighted Tardiness Scheduling Problem. Informs J. Comput. 2002, 14, 52–67. [Google Scholar] [CrossRef] [Green Version]
  36. Grosso, A.; Della Croce, F.; Tadei, R. An enhanced dynasearch neighborhood for single-machine total weighted tardiness scheduling problem. Oper. Res. Lett. 2004, 32, 68–72. [Google Scholar] [CrossRef]
  37. Smith, W.E. Various Optimizers for Single-Stage Production. Nav. Res. Logist. Q. 1956, 3, 59–66. [Google Scholar] [CrossRef]
  38. Shwimer, J. On the N-Job, One-Machine, Sequence-Independent Scheduling Problem with ardiness Penalties: A Branch-and-Bound Solution. Manag. Sci. 1972, 18, 301–313. [Google Scholar] [CrossRef] [Green Version]
  39. Glover, F. Tabu search. Part I. ORSA J. Comput. 1989, 1, 190–206. [Google Scholar] [CrossRef]
  40. Glover, F. Tabu search. Part II. ORSA J. Comput. 1990, 2, 4–32. [Google Scholar] [CrossRef]
  41. Glover, F.; Laguna, M. Tabu Search; Kluwer: Alphen aan den Rijn, The Netherlands, 1997. [Google Scholar]
  42. Bożejko, W.; Grabowski, J.; Wodecki, M. Block approach-tabu search algorithm for single machine total weighted tardiness problem. Comput. Ind. Eng. 2006, 50, 1–14. [Google Scholar]
  43. Baker, K.R. Introduction to Sequencing and Scheduling; Wiley: New York, NY, USA, 1974. [Google Scholar]
  44. OR Library. Available online: http://people.brunel.ac.uk/~mastjjb/jeb/info.html (accessed on 25 February 2021).
  45. Uchroński, M. Test Instances for a Single-Machine Total Weighted Tardiness Scheduling Problem. Available online: https://zasobynauki.pl/zasoby/51561 (accessed on 25 February 2021).
Figure 1. Task moves π ( j ) that can bring improvement goal function values.
Figure 1. Task moves π ( j ) that can bring improvement goal function values.
Applsci 11 02069 g001
Figure 2. Tabu Search MWPTSA parallelization scheme.
Figure 2. Tabu Search MWPTSA parallelization scheme.
Applsci 11 02069 g002
Table 1. PRD [%] to the best-known solutions for SWPT, EDD, COVERT, AU and META algorithms.
Table 1. PRD [%] to the best-known solutions for SWPT, EDD, COVERT, AU and META algorithms.
Instance Group S W P T COVERT AU EDD META
w t 40 579.92152.3915.77138.5510.46
w t 50 4315.94951.9921.93155.2110.21
w t 100 4692.651417.0723.64143.2812.90
average3196.17840.4920.45145.6811.19
Table 2. Computational results for sequential tabu search algorithm ( t = 120   s , META as reference).
Table 2. Computational results for sequential tabu search algorithm ( t = 120   s , META as reference).
Instance Group wt 200 wt 500 wt 1000
TS TS b TS TS b TS TS b
001–025−18.43−18.49−15.55−16.85−5.21−7.54
026–050−11.38−11.29−13.68−13.88−4.05−3.54
051–075−9.84−9.78−9.18−9.14−3.79−3.65
076–100−5.77−5.73−3.32−3.46−4.60−3.82
101–125−2.80−2.79−3.99−3.93−1.32−0.78
average−9.64−9.62−9.15−9.45−3.79−3.87
Table 3. Computational results for parallel tabu search algorithm ( t = 120   s , META as reference).
Table 3. Computational results for parallel tabu search algorithm ( t = 120   s , META as reference).
Number of Processors wt 200 wt 500 wt 1000
TS TS b TS TS b TS TS b
1−9.64−9.62−9.15−9.45−3.79−3.87
8−9.65−9.63−9.17−9.53−3.88−3.91
16−9.65−9.64−9.17−9.55−3.88−3.92
32−9.65−9.66−9.17−9.55−3.88−3.94
64−9.66−9.71−9.19−9.59−3.88−3.95
Table 4. Speedup values for parallel tabu search algorithm.
Table 4. Speedup values for parallel tabu search algorithm.
p = 8 p = 16
s TS s TS b s TS s TS b
w t 40 182.6061,491.57185.5761,466.13
w t 50 39.1728,943.1939.5423,866.83
w t 100 5.632150.465.782234.89
w t 200 3.9958,752.364.2455,967.32
w t 500 3.4221.803.5822.21
w t 1000 2.231280.582.271724.82
Table 5. Speedup values for parallel tabu search algorithm.
Table 5. Speedup values for parallel tabu search algorithm.
p = 32 p = 64
s TS s TS b s TS s TS b
w t 40 193.8560,606.01186.1460,595.42
w t 50 40.0153,659.1838.9850,530.73
w t 100 5.942768.765.882793.09
w t 200 4.1356,021.054.1355,445.96
w t 500 3.4722.103.5322.90
w t 1000 2.29985.342.301020.59
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Uchroński, M. Parallel Algorithm with Blocks for a Single-Machine Total Weighted Tardiness Scheduling Problem. Appl. Sci. 2021, 11, 2069. https://doi.org/10.3390/app11052069

AMA Style

Uchroński M. Parallel Algorithm with Blocks for a Single-Machine Total Weighted Tardiness Scheduling Problem. Applied Sciences. 2021; 11(5):2069. https://doi.org/10.3390/app11052069

Chicago/Turabian Style

Uchroński, Mariusz. 2021. "Parallel Algorithm with Blocks for a Single-Machine Total Weighted Tardiness Scheduling Problem" Applied Sciences 11, no. 5: 2069. https://doi.org/10.3390/app11052069

APA Style

Uchroński, M. (2021). Parallel Algorithm with Blocks for a Single-Machine Total Weighted Tardiness Scheduling Problem. Applied Sciences, 11(5), 2069. https://doi.org/10.3390/app11052069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop