A Hybrid Genetic-Hierarchical Algorithm for the Quadratic Assignment Problem

In this paper, we present a hybrid genetic-hierarchical algorithm for the solution of the quadratic assignment problem. The main distinguishing aspect of the proposed algorithm is that this is an innovative hybrid genetic algorithm with the original, hierarchical architecture. In particular, the genetic algorithm is combined with the so-called hierarchical (self-similar) iterated tabu search algorithm, which serves as a powerful local optimizer (local improvement algorithm) of the offspring solutions produced by the crossover operator of the genetic algorithm. The results of the conducted computational experiments demonstrate the promising performance and competitiveness of the proposed algorithm.


Introduction
The quadratic assignment problem (QAP) [1][2][3][4][5][6] is mathematically formulated as follows: Given two non-negative integer matrices A = (a ij ) n×n , B = (b kl ) n×n , and the set Π n of all possible permutations of the integers from 1 to n, find a permutation p = (p(1), p(2), . . . , p(n)) ∈ Π n that minimizes the following objective function: One of the examples of the applications of the quadratic assignment problem is the placement of electronic components on printed circuit boards [7,8]. In this context, the entries of the matrix A are associated with the numbers of the electrical connections between the pairs of components. The entries of the matrix B correspond to the distances between the fixed positions on the board. The permutation p = (p(1), p(2), . . . , p(n)) can be interpreted as a separate configuration for the arrangement of components in the positions. The element p(i) in this case indicates the number of the position to which the i-th component is assigned. In this way, z (or more precisely, z/2) can be understood as the total (weighted) length of the connections between the components, when all n components are placed into the corresponding n positions.
The quadratic assignment problem is also a complicated theoretical-mathematical problem. It is proved that the QAP belongs to the class of the NP-hard optimization ; i = 1, . . . , n. ( This means that the permutation p u,v is obtained from the permutation p by interchanging exactly the elements p(u) and p(v) in the permutation p. The formal operator (move, or transition operator) φ(p, u, v): Π n × N × N → Π n swaps the uth and vth elements in the given permutation such that p u,v = φ (p, u, v). Note that the following equations hold: p u,u = p, p u,v = p v,u , (p u,v ) u,v = p.
The difference in the objective function (z) values when the permutation elements p(u) and p(v) are interchanged is calculated according to the following formula: ∆(p u,v , p) = z(p u,v ) − z(p) = (a uu − a vv )(b p(v)p(v) − b p(u)p(u) )+ (a uv − a vu )(b p(v)p(u) − b p(u)p(v) )+ n ∑ k=1,k =u,v [(a uk − a vk )(b p(v)p(k) − b p(u)p(k) ) + (a ku − a kv )(b p(k)p(v) − b p(k)p(u) )] (3) The difference in the objective function values can be calculated more faster-under condition that the previously calculated differences (∆(p i,j , p) (i, j = 1, . . . , n)) are memorized (stored in a matrix Ξ). The difference value is calculated using O(1) operations [54,120]: After the interchange of the elements p(u) and p(v) has been performed, new differences Ξ (i, j) (i, j = 1, . . . , n, i = u, i = v, j = u, j = v) are calculated according this equation: If i = u or i = v or j = u or j = v, then the Formula (3) should be applied. So, all the differences are calculated using only O(n 2 ) operations. Still, the initial calculation of the values of the matrix Ξ requires O(n 3 ) operations (but only once before starting the optimization algorithm). If the matrix A and/or matrix B are symmetric, then Formula (3) becomes simpler. Assume that the matrix B is symmetric. Then, the (asymmetric) matrix A can be transformed to symmetric matrix A . Thus, we get the following formula: (a uk − a vk )(b p(v)p(k) − b p(u)p(k) ); (6) here, a uk = a uk + a ku , u = 1, . . . , n, v = 1, . . . , n, u = v. Analogously, Formula (5) turns to the formula: If i = u(v) or j = u(v), Equation (6) must be applied. In addition to this, suppose that we dispose of 3-dimensional matrices A = (a uvk ) n×n×n and B = (b lrt ) n×n×n . Also, let a uvk = a uk − a vk , b lrt = b lt − b rt , l = p(j), r = p(i), t = p(k). Then, we can apply the following formulas for calculation of the differences in the objective function values: Using the matrices A and B allows to save up to 20% of computation (CPU) time [66]. The distance (Hamming distance) between two permutations p and p is defined as: The following equations hold: δ(p, p) = 0, δ(p, p ) = 1, 0 ≤ δ(p, p ) ≤ n, δ(p, p ) = δ(p , p), δ(p, p u,v ) = 2 for any u, v (u = v). In the case of disposing of k different numbers u 1 , u 2 , . . . , u k , this equation holds: δ(p, (((p u 1 , u 2 ) u 2 , u 3 ) ... ) u k−1 , u k ) = k.
The neighbourhood function Θ: Π n → 2 Π n assigns for each p ∈ Π n its neighbourhood (the set of neighbouring solutions) Θ(p) ⊆ Π n . The 2-exchange neighbourhood function Θ 2 is defined in the following way: Θ 2 (p) = {p : p ∈ Π n , δ(p, p ) = 2}; (11) where δ(p, p ) is the Hamming distance between solutions p and p . The neighbouring solution p ∈ Θ 2 (p) can be obtained from the current solution p by using the operator φ(p, ·, ·). The computational complexity of exploration of the neighbourhood Θ 2 is proportional to O(n 2 ). Solution p loc_opt ∈ Π n is said to be locally optimal with respect to the neighbourhood Θ if for every solution p from the neighbourhood Θ(p loc_opt ) the following inequality holds: z(p loc_opt ) ≤ z(p ).

Hybrid Genetic-Hierarchical Algorithm for the Quadratic Assignment Problem
Our proposed genetic algorithm (for more thorough information on the genetic algorithms, we refer the reader to [121]) is based on the hybrid genetic algorithm framework, where explorative (global) search is in tandem with the exploitative (local) search. The most important feature of our genetic algorithm is that it is hybridized with the so-called hierarchical (self-similar) iterated tabu search (HITS) algorithm (see Section 3.4).
The permutation elements p(1), p(2), . . . , p(n) are directly linked to the individuals' chromosomes-such that the chromosomes' genes correspond to the single elements p(1), p(2), . . . , p(n) of the solution p. No encoding is needed. The fitness of the individual is associated with the corresponding objective function value of the given solution, z(p). The following are the main essential components (parts) of our genetic-hierarchical algorithm: (1) initial population construction; (2) selection of parents for crossover procedure; (3) crossover procedure; (4) local improvement of the offspring; (5) population replacement; (6) population restart. The top-level pseudo-code of the genetic-hierarchical algorithm is presented in Algorithm 1 (Notes: (1) The subroutine GetBestMember returns the best solution of the given population. (2) The mutation process is integrated within the k-level hierarchical iterated tabu search algorithm. The mutation process depends on the mutation variant parameter MutVar.). Algorithm 1. High-level pseudo-code of the genetic-hierarchical algorithm. Genetic_Hierarchical_Algorithm Procedure; / input: n-problem size, A, B-data matrices, / PS-population size, N gen -total number of generations, / InitPopVar-initial population variant, SelectVar-parents selection variant, CrossVar-crossover variant, / MutVar-mutation variant, RepVar-population replacement variant, / σ-selection factor, DT-distance threshold, L idle_gen -idle generations limit / output: p -the best found solution (final solution)

begin
/ create a starting population P of size PS, depending on the initial population variant InitPopVar switch (InitPopVar) 1: create the initial population P by applying the algorithm k-Level_Hierarchical_Iterated_Tabu_Search; 2: create the initial population P by applying a copy of Genetic_Hierarchical_Algorithm using InitPopVar = 1 endswitch; p = GetBestMember(P); / initialization of the best so far solution for i := 1 to N gen do begin / main loop sort the members of the population P in the ascending order of the values of the objective function; select parents p , p" ∈ P for reproduction (crossover), depending on the selection variant SelectVar and the selection factor σ; perform the crossover operator on the solution-parents p , p" and get the offspring p" , taking into account the crossover variant CrossVar; apply improvement procedure k-Level_Hierarchical_Iterated_Tabu_Search to the offspring p" , get the (improved) offspring p ; get new population P from the union of the existing parents' population and the offspring P ∪ {p } (such that |P| = PS) (the minimum distance criterion and population replacement variant RepVar are taken into account); if z(p ) < z(p ) then p = p ; / the best found solution is memorized if number of idle generations exceeds the predefined limit L idle_gen then begin perform the population restart process; if z(GetBestMember(P)) < z(p ) then p = GetBestMember(P) endif endfor; return p end.

Creation of Initial Population
There are two main population construction phases. In the first one, the pre-initial population is constructed and improved; in the second one, the culling of the improved population is performed. So, firstly, PS = PS × C 1 members of the pre-initial population P are created using the version of the GRASP algorithm [70] implemented by the authors. PS denotes the population size, and C 1 (C 1 ≥ 1) is the user-defined parameter and is to regulate the size of the pre-initial population.
There are several options of the population construction in the first phase controlled by the parameter InitPopVar. If InitPopVar = 1, then every generated solution is improved by the hierarchical iterated tabu search algorithm. There are few conditions. If the improved solution (p ) is better than the best so far found solution in the population P, then the improved solution replaces the best found solution. Otherwise, it is tested if the minimum mutual distance between the improved solution (p ) and the existing population members (min p∈P {δ(p , p)}) is greater than or equal to the predefined distance threshold, DT. If this is the case, the improved solution is added to the population P. Otherwise, the improved solution is disregarded and simply a random solution is added instead. (Remind that the distance between solutions is calculated using Equation (10)). The distance threshold is obtained from the following equation: DT = max{2, εn }, where ε denotes the distance threshold factor (0 < ε ≤ 1). This presented scheme is to ensure the high level of diversity of the population members and, at the same time, to enhance the searching ability of the genetic algorithm. To obtain better initial population, the HITS algorithm with the increased number of iterations is used during the initial population formation. This is similar to a compounded approach proposed in [122].
The second option (InitPopVar = 2) is almost identical to the first one, except that the genetic algorithm itself (a de facto copy of the hybrid genetic-hierarchical algorithm) (instead of the HITS algorithm) is employed for the creation of the initial population. As an alternative option (InitPopVar = 3) of the population improvement, two-level genetichierarchical algorithm (master-slave genetic algorithm) can be employed for the initial population improvement.
In the second phase-which is very simple-(C 1 − 1)PS worst members of the pre-initial population are truncated and only PS best members survive for the subsequent generations.

Selection of Parents
The selection of parents is performed by using the parametrized rank-based selection rule [123]. In this case, the positions (κ 1 , κ 2 ) of the parents within the sorted population are determined according to the following formulas: κ 1 = (ς 1 ) σ , κ 2 = (ς 2 ) σ , κ 1 = κ 2 , where ς 1 , ς 2 are uniform (pseudo-)random numbers in the interval [1, PS 1 σ ], here PS denotes the population size, and σ is a real number from the interval [1,2] (it is referred to as a selection factor). It is clear that the better the individual, the larger the selection probability.

Crossover Operators
Two-parent crossover is formally defined by using operator Ψ : Π n × Π n → Π n such that: where p , p , p • denote parental solutions, and p • is the offspring solution. (The child can coincide with one of the parents if, for example, the parents are very similar.) The crossover operator must ensure that the chromosome of the offspring necessarily inherits those genes that are common in both parent chromosomes, i.e., (also see Figure 1): here, p , p , p • refer to the parents and the offspring, respectively. In our work, the crossover procedure takes place at every generation of the genetichierarchical algorithm, i.e., the crossover probability is equal to 1. Several crossover operators were implemented and examined. Short descriptions of the crossover procedures are provided below (see also [124,125]).
In our work, the crossover procedure takes place at every generation of the genetichierarchical algorithm, i.e., the crossover probability is equal to 1. Several crossover operators were implemented and examined. Short descriptions of the crossover procedures are provided below (see also [124,125]). 1PX is among the most popular genetic crossover operators. Very briefly, the basic idea is as follows. A single crossover point (position, or locus) is chosen in one of the two chromosomes. The position can be determined by generating a uniformly distributed (pseudo-)random number within the interval [1, − 1] ( is the chromosome length). The offspring is obtained by copying genes from one parent, the rest of genes are copied from the opposite parent. If there are empty loci left, they are filled in randomly; in addition, the feasibility of the offspring must be preserved.

Two-Point Crossover-2PX
Two-point crossover works similarly to the one-point crossover, except that two crossover points 1 and 2 (1 ≤ 1 < 2 < ) are used.

Shuffle Crossover-SX
The shuffle crossover is obtained by randomly reordering the parents' genes before applying the uniform crossover. The same rearrangement rule must be used for both parents. After the uniform crossover is finished, the same (initial) rearrangement rule is again applied.

Partially-mapped Crossover-PMX
Partially-mapped crossover can be seen as a modified variant of the k-point (multipoint) crossover. The basic principle relies on the so-called mapping sections (the chromosome segments between mapping points). So, at first, the segments of the chromosome of one parent are moved to the offspring's chromosome. The same is done for the other parent. At last, the empty loci (if any) are filled in in a random way. 1PX is among the most popular genetic crossover operators. Very briefly, the basic idea is as follows. A single crossover point (position, or locus) is chosen in one of the two chromosomes. The position x can be determined by generating a uniformly distributed (pseudo-)random number within the interval [1, n − 1] (n is the chromosome length). The offspring is obtained by copying x genes from one parent, the rest of genes are copied from the opposite parent. If there are empty loci left, they are filled in randomly; in addition, the feasibility of the offspring must be preserved.

Two-Point Crossover-2PX
Two-point crossover works similarly to the one-point crossover, except that two crossover points x 1 and x 2 (1 ≤ x 1 < x 2 < n) are used.

Uniform Crossover-UX
In this case, the common genes are copied to the offspring's chromosome. Then, the unoccupied positions in the offspring's chromosome are scanned form left to right and the empty loci are assigned the genes-one at a time-from one of the parents with probability 1 2 , i.e., p • (i) = p (i), ς < 1 2 p (i), otherwise ; here, ς is a (pseudo-)random number from the interval [0, 1]. The assigned gene must be unique.

Shuffle Crossover-SX
The shuffle crossover is obtained by randomly reordering the parents' genes before applying the uniform crossover. The same rearrangement rule must be used for both parents. After the uniform crossover is finished, the same (initial) rearrangement rule is again applied.

Partially-Mapped Crossover-PMX
Partially-mapped crossover can be seen as a modified variant of the k-point (multipoint) crossover. The basic principle relies on the so-called mapping sections (the chromosome segments between mapping points). So, at first, the segments of the chromosome of one parent are moved to the offspring's chromosome. The same is done for the other parent. At last, the empty loci (if any) are filled in in a random way.
3.3.6. Swap-Path Crossover (SPX) 3.3.6.1. Swap-Path Crossover (Basic Version)-SPX1 The main distinguishing feature of SPX is that instead of transferring genes from parents to a child, the genes are, so to speak, rearranged in the chromosomes of the parents. Let (p , p ) be a pair of parents. Then, the process starts from an arbitrary position and the positions are scanned from left to right. The process continues until a predefined number of swaps, s (s < n), have been performed. If, in the current position, the genes are the same for both parents, then one moves to the next position; otherwise, a pairwise interchange of genes of the parents' chromosomes is accomplished. The interchange is performed in both parents. For example, if the current position is i and a = p (i), b = p (i), then there exists a position j such that b = p (j), a = p (j); then, after a swap, p (i) = b, p (i) = a and p (j) = a, p (j) = b. Consequently, new chromosomes, say p , p , are produced. In the next iteration, a pair (p , p ) is considered, and so on.

Swap-Path Crossover (Modified Version I)-SPX2
This modification is achieved when the best offspring (with respect to the fitness of the offspring) is retained in the course of gene interchanges.

Swap-Path Crossover (Modified Version II)-SPX3
The essential feature this crossover procedure is that the offspring fitness is dynamically evaluated: only the gene interchanges that improve the value of the objective function are accepted.

Cycle Crossover-CX
The cycle crossover is based on the pairwise gene interchanges. The key property of CX is the ability to produce the offspring without distortion of the genetic code; in the other words, CX enables to create the chromosome with no random (foreign) genes. The negative aspect of CX is that the offspring may genetically be very close to their predecessors.

Cohesive Crossover-COHX
The cohesive crossover was proposed by Z. Drezner [75] to efficiently recombine individuals' genes by taking into account the problem data, in particular, the distances between objects' locations. From several recombinations of genes, the recombination is selected that minimizes the objective function.

Multi-Parent Crossover-MPX
In the multi-parent crossover, several (or all) members of a population participate in creation of the offspring. More precisely, the ith position (locus) of the offspring's chromosome p • is assigned the value j with the probability P(p(i) = j) (under condition that the value j has not been utilized before).
The probability that p(i) = j (P(p(i) = j)) is calculated according to the formula: where q ij is an element of the matrix Q = (q ij ) n×n ; here, q ij denotes the number of times that the ith locus takes the value j in the parental chromosomes. If there exist several values (j 1 , j 2 , . . . ) with the same probability, then one of them is chosen randomly.

Universal Crossover-UNIVX
The universal crossover (UNIVX) [124] distinguishes for its versatility and the possibility of flexible usage depending on the specific needs of the user. It is somewhat similar to what is known as a simulated binary crossover [126].
Our operator is based on the use of a random mask. There are four controlling parameters: χ 1 , χ 2 , χ 3 , χ 4 . The mask length is equal to χ 1 , where χ 1 is a (pseudo-)random number within the interval [ε 1 , n], n is the length of the chromosome, ε 1 = r × n , r is the user's parameter close to 1, for example, 0.9. The mask contains binary values 0 and 1, where 1 signals that the corresponding gene of the first parent's chromosome must be chosen and 0 is to indicate that the second parent's gene needs to be replicated. The degree of randomness of the mask is controlled by the parameters χ 2 , χ 3 . The parameter χ 2 (χ 2 ∈ [ε 2 , ε 3 ], 0 < ε 2 ≤ ε 3 < 1) dictates how many 0's and 1's are there in the mask: the higher the value of χ 2 , the bigger total number of 1's, and vice versa. The juxtaposition of bits is regulated by the parameter χ 3 . The bit generation itself is performed by using a kind of "anytime" min-max sorting algorithm. That is, if the sorting algorithm is interrupted at some random moment, this results in a randomized ("quasi-sorted") sequence of bits. The Entropy 2021, 23, 108 9 of 33 moment of interruption is defined by the number η, where η = χ 3 w, here χ 3 is a (pseudo-) random real number from the interval [0, 1], and w denotes the maximum number of iterations required to fully sort all the bits. (As an example, if the bits "0000001111" are to be sorted in the descending order and the algorithm is stopped at χ 3 = 0.9, then the random mask similar, for example, to "1011000100" would be generated.) Having the mask generated, the decision is made as to about what genes have to be transmitted to the offspring's chromosome. The index of the starting locus of the transferred genes, χ 4 , is generated randomly-in such a way that χ 4 ∈ [1, n]. Eventually, the empty loci (if any) are filled in randomly.

Hierarchical Iterated Tabu Search Algorithm
Every created offspring is improved by using the hierarchical iterated tabu search algorithm, which inspires from the philosophy of iterated local search [127] and also the spirit of self-similarity-one of the fundamental properties of nature (see Figure 2). Basically, this means that the algorithm is (almost) exactly similar to the part of itself. In the other words, the main idea is the repeated, cyclical adoption (reuse) of the iterated tabu search algorithm, that is, the iterated tabu search can be reused multiple times. This idea is not very new, and some variants of hierarchical-like algorithms have been already investigated [128][129][130][131][132][133][134]. The k-level hierarchical iterated tabu search algorithm consists of three basic components: (1) invocation of the k-1-level hierarchical iterated tabu search algorithm to improve a given solution; (2) acceptance of the candidate (improved) solution for perturbation, i.e., mutation; (3) mutation of the accepted solution.
Perturbation (mutation) is applied to a chosen optimized solution that is selected by the defined candidate acceptance rule (see Section 3.4.3). The mutated solution serves as an input for the self-contained TS procedure. The TS procedure returns an improved solution, and so on. The overall process continues until a pre-defined number of iterations have been performed (see Algorithm 2 (Note. The iterated tabu search procedure (see Algorithm 3) is in the role of the 0-level hierarchical iterated tabu search algorithm.)). The best found solution is regarded as the resulting solution of HITS. The paradigm of the hierarchicity based (self-similar) algorithm is as follows: (1) Set the number of levels, k (1 ≤ k ≤ k max ).
(2) Generate an initial solution p. The k-level hierarchical iterated tabu search algorithm consists of three basic components: (1) invocation of the k-1-level hierarchical iterated tabu search algorithm to improve a given solution; (2) acceptance of the candidate (improved) solution for perturbation, i.e., mutation; (3) mutation of the accepted solution.
Perturbation (mutation) is applied to a chosen optimized solution that is selected by the defined candidate acceptance rule (see Section 3.4.3). The mutated solution serves as an input for the self-contained TS procedure. The TS procedure returns an improved solution, and so on. The overall process continues until a pre-defined number of iterations have been performed (see Algorithm 2 (Note. The iterated tabu search procedure (see Algorithm 3) is in the role of the 0-level hierarchical iterated tabu search algorithm.)). The best found solution is regarded as the resulting solution of HITS. if z(p ∇ ) < z(p ) then p : = p ∇ ; / the best found solution is memorized if q k < Q k then begin p: = Candidate_Acceptance(p ∇ , p ); apply mutation procedure to p endif endfor end.

Algorithm 3. Pseudocode of the iterated tabu search algorithm.
Iterated_Tabu_Search procedure; / 0-level hierarchical iterated tabu search algorithm / input: p-current solution / output: p 0 -the best found solution / parameter: Q 0 -number of iterations of the ITS algorithm begin p 0 : = p; for q 0 := 1 to Q 0 do begin apply Tabu_Search to p and get p • ; if z(p • ) < z(p 0 ) then p 0 : = p • ; / the best found solution is memorized if q 0 < Q 0 then begin p: = Candidate_Acceptance(p • , p 0 ); apply mutation procedure to p endif endfor end.
The 0-level HITS algorithm is in fact simply iterated tabu search algorithm (for more details on the ITS algorithm, see [135]) (see Algorithm 3 ((Note. The tabu search procedure is in the role of the self-contained algorithm.))), which, in turn, uses a self-contained tabu search algorithm-the "kernel" tabu search procedure. It is this procedure that directly improves a given solution. This procedure is thus in the role of the search intensification mechanism, while the mutation procedure is responsible for the diversification of the search process. It can be seen that the structure of the individual hierarchical levels of the HITS algorithm is quite simple, but the overall efficacy of the resulting multi-level algorithm increases significantly, which is demonstrated by the computational experiments. Of course, the run time increases as well, but this is compensated by the higher quality of the final results.
The interesting analogy between intensification and diversification (on the one side) and the phenomenon of entropy (on the other side) can be perceived. Indeed, the inten-sification process can be thought of as a process of the decrease of the entropy; on the other hand, diversification could be viewed as the increase of the entropy. Actually, the similar processes are seen in the open real physical systems. An example is the process of evolution of stars, where formation (birth) of the stars (along with the planets, organic matter, etc.) can be linked to the apparent decrease of the entropy, while the death of the stars (supernovae) may be associated with the significant increase of the entropy.
The self-contained tabu search procedure (for a more detailed description of the principles of TS algorithms, the reader is referred to [136]) iteratively analyses the neighbourhood of the current solution p (in our case-Θ 2 (p)) and performs the non-prohibited move that most improves the value of the objective function. If there are no improving moves, then the one that least degrades the value of the objective function is accepted. In order to eliminate search cycles, the return to recently visited solutions is disabled for a specified period. The list of prohibitions-the tabu list T-is implemented as a two-dimensional matrix of size n × n. In this case, the entry t ij stores the sum of the number of the current iteration and the tabu tenure, h; in this way, this value indicates from which iteration the ith and jth elements of a given solution can be again interchanged. The value of the parameter h depends on the problem size, n, and is chosen to be equal to 0.3n. The tabu status is ignored at random moments with a very low probability α (α ≤ 0.05). This allows to slightly increase the number of non-tabu solutions and not to limit the available search directions too much. The tabu condition is also ignored when the aspiration criterion is met, i.e., the current obtained solution is better than the best so far found solution. The resulting formal tabu and aspiration criteria are thus as follows: tabu_criterion (i,j) = TRUE, (t ij ≥ q) and (ς ≥ α) and HT (z(p) + ∆(p i,j , p))mod HashSize = TRUE FALSE, otherwise , where i, j are the current elements' indices, q denotes the current iteration number, ς is a (pseudo-)random real number within the interval [0, 1], and z • denotes the best so far found value of the objective function. HT denotes the hash table, which is simply a one-dimensional array, and HashSize is the capacity of the hash table.
In addition, our tabu search procedure uses a so-called secondary memory Γ [137] to avoid stagnation manifestations during the search process. The purpose of this memory is to gather high-quality solutions, which although are rated as very good, but are not chosen. In particular, the secondary memory includes solutions left "second" after the exploration of the neighbourhood Θ 2 . So, if the best found solution does not change more than βτ iterations, then the tabu list is cleared and the search is restarted from one of the "second" solutions in the secondary memory Γ (here, τ denotes the number of iterations of the TS procedure, and β is a so-called idle iterations limit factor such that 1 ≤ βτ ≤ τ). The TS procedure is completed as soon as the total number of iterations, τ, has been performed.
The time complexity of the TS algorithm is proportional to O(n 2 ) for the reason that the exploration of the neighbourhood Θ 2 requires n(n−1) 2 operations and also one needs to recalculate the differences of the objective function after each transition from a given solution to the new one.
The pseudo-code of the tabu search algorithm is shown in Algorithm 4 (Notes.

Mutation
Each solution found by the tabu search algorithm is subject to perturbation in the mutation procedure. Remind that formally the mutation process can be defined by the use of the operator ϕ: Π n → Π n . Thus, if p ∼ = ϕ(p), then p ∼ ∈ Π n , p ∼ = p. More formalized operator can be described as follows: ϕ(p, ξ): Π n × N → Π n , which transforms the current solution p to the new solution p ∼ such that δ(p, p ∼ ) = δ(p, ϕ(p)) = ξ. In this way, 100ξ n per cent elements of the solution are affected. The parameter ξ (2 ≤ ξ ≤ n) regulates the strength of mutation and is referred to as a mutation rate. (In our algorithm, ξ = 0.2n .) It is clear that for any p, p ∼ (such that p = p ∼ , δ(p, p ∼ ) = ξ) there always exists a sequence of distinct integers u 1 , u 2 , . . . , u ξ such that p ∼ = (((p u 1 , Choosing the right value of the mutation rate, ξ, plays a very important role in the mutation procedure and the HITS algorithm and, at the same time, the whole genetic algorithm. A proper compromise between two extreme cases must be achieved: (1) the value of ξ is (very) low (close to 0); (2) the value of ξ is (very) high (close to n). In the first case, the mutation would not guarantee that the obtained mutated solution is "far" away enough from a given solution, which would lead to cycling search trajectories. In the second case, useful accumulated information would be lost and the algorithm would become very similar to a crude random multi-start method.
It should be stressed that the mutation processes are quite different from the crossover procedures. Mutation processes are by their nature purely random processes. Whilst crossover procedures only recombine the genetic code contained in the parents, the mutation processes generate, in essence, new information that hadn't existed in predecessors earlier. It is the mutation process that implicitly is a true creative process and potentially produces a real novelty. In our work, twelve different mutation procedures and their modifications were proposed and tested.
Choosing the right value of the mutation rate, , plays a very important role in the mutation procedure and the HITS algorithm and, at the same time, the whole genetic algorithm. A proper compromise between two extreme cases must be achieved: (1) the value of is (very) low (close to 0); (2) the value of is (very) high (close to ). In the first case, the mutation would not guarantee that the obtained mutated solution is "far" away enough from a given solution, which would lead to cycling search trajectories. In the second case, useful accumulated information would be lost and the algorithm would become very similar to a crude random multi-start method.
It should be stressed that the mutation processes are quite different from the crossover procedures. Mutation processes are by their nature purely random processes. Whilst crossover procedures only recombine the genetic code contained in the parents, the mutation processes generate, in essence, new information that hadn't existed in predecessors earlier. It is the mutation process that implicitly is a true creative process and potentially produces a real novelty. In our work, twelve different mutation procedures and their modifications were proposed and tested.

Random Pairwise Interchanges Using Random Key (M2)
In this case, the mutation process consists of two basic steps: (1) random pairwise interchanges; (2) shuffling the interchanged elements using a random key. A random key, , is a list of random indices of size -(1), (2), … , ( ). The random key values simply determine which elements are again interchanged. The intention is to get a more "deeply" mutated solution and avoid returning to previously visited solutions.

Mutation Using Opposite Values (M3)
In this mutation procedure, the position's index, let's say , is randomly On the basis of the random pairwise interchanges, other modified mutation procedures can be developed [138].

Random Pairwise Interchanges Using Random Key (M2)
In this case, the mutation process consists of two basic steps: (1) random pairwise interchanges; (2) shuffling the interchanged elements using a random key. A random key, rk, is a list of random indices of size ξ-rk(1), rk(2), . . . , rk(ξ). The random key values simply determine which elements are again interchanged. The intention is to get a more "deeply" mutated solution and avoid returning to previously visited solutions.

Mutation Using Opposite Values (M3)
In this mutation procedure, the position's index, let's say k, is randomly determined. Then, the element e = p(k) is replaced by the following opposite value: o = ((p(k) + n 2 − 1) mod n) + 1, where mod denotes the modulo operation. After this replacement, the solution element that was previously equal to o must also be changed. After both changes, p(k) becomes equal to o, p(l)-equal to e; l indicates the element which was equal to o. The process is repeated ξ 2 times, where ξ is the muation rate.

Distance-Based Mutation (M4)
In this procedure, the indices of the pairs of elements to be interchanged are generated in such a way that the "distance" (d) between those indices is as large as possible. The following formula for generating the indices k 1 , k 2 , . . . , k ξ is used: k l = ((dq l + ς − 1) mod n) + 1 , here d = n ξ , ς-(pseudo)random real number from the interval [0, 1], q l = (q l−1 mod n) + 1, l = 1, 2, . . . , ξ; the initial value q 0 is a random integer from the interval [1, n].

Modified Random Pairwise Interchanges-Variant I (M5)
This is similar to the random pairwise interchanges. The sequence of random realcoded values from the interval [0, 1] is generated. The generated numbers along with their corresponding indices-known as smallest positive values-are sorted in the ascending order. These values, in particular, determine the elements to be interchanged.

Modified Random Pairwise Interchanges-Variant II (M6)
The list of random indices is obtained by directly generating random integers from the interval [1, n]. The integers may duplicate each other. To avoid duplications, the integers are sorted according to the ascending order. Indices corresponding to the sorted numbers indicate the elements that are to be interchanged.

Combined Mutation (M7)
This mutation procedure consists of two combined mutation procedures. Initially, the list of indices of the pairs of elements to be interchanged is constructed (see Section 3.4.2). The selected elements are then changed using opposite values (see Section 3.4.2).

Greedy Adaptive Search Based Mutation (M8)
The basic principle of this mutation procedure is that the solution is disintegrated in some way, and then reconstructed. The mutation process consists of two steps: (1) disintegration of the solution, which is random; (2) reconstruction of the solution, which is greedy-deterministic. In the first step, ξ elements are disregarded. In the second step, a greedy constructive algorithm is applied, which tries to find the best possible solution out of ξ! available options. The value of ξ in this case should be quite small to prevent large increase in the run time of the mutation procedure. This mutation procedure (and also other procedures described below) are no longer problem-independent as the problem domain-specific knowledge is taken into account.

Greedy Randomized Adaptive Search Based Mutation (M9)
This mutation procedure resembles the one described above. The difference is that a greedy randomized adaptive search procedure (GRASP) [70] is used in the partial solution reconstruction phase to obtain an improved solution.

Randomized Local Search Based Mutation-Variant I (M10)
In this case, quick procedure based on random pairwise interchanges is initially performed (see Section 3.4.2). Then, a set of randomly selected elements is formed. A local search is then performed using the constructed set, i.e., only transitions between solutions that improve the value of the objective function are accepted.

Randomized Local Search Based Mutation-Variant II (M11)
This mutation variant is similar to the previous randomized local search variant, except that the randomly constructed neighbourhood is fully explored in a systematic way. Again, only improving transitions between solutions are accepted.

Randomized Tabu Search Based Mutation (M12)
Let p (1) = argmin i=1, ..., n−1, j=i+1, ..., n, move_acceptance_criterion (i,j) =TRUE {z(p i,j )} and where: move_acceptance_criterion (i,j) = TRUE, (tabu_criterion (i,j) = FALSE) or (aspiration_criterion (i,j) = TRUE) FALSE, otherwise , , q denotes the current iteration number, ς is a (pseudo-)random number within the interval [0, 1], α denotes the randomization coefficient, z • denotes the best so far value of the objective function. Then, in the randomized tabu search procedure, the best achieved solution ("winner solution") p (1) is accepted with the probability γ, meanwhile the second solution p (2) is chosen with the probability 1 − γ (note that, in the case of γ = 1, we get the standard (deterministic) tabu search.) In our algorithm, we use γ = 0.2. So, the central idea of the randomized tabu search is just this quasi-random mixing between the "winner solution" and "next to the winner solution" in the course of the tabu search process. Based on the extensive experimentation, we found out that this type of mutation is the most promising mutation procedure among all the procedures examined. The explanation would be that this type of mutation rather is more gentle, moderate and controlled than the other mutation procedures.
In the end, note that the computational complexity of all our mutation procedures is proportional to O(ξn 2 ). This is due to the fact that our mutation procedures recalculate the differences of the objective function (i.e., the values of the matrix Ξ) approximately ξ times (see Algorithm 5 (Note. The values of the matrix Ξ are recalculated using Equation (9).)). So, the smaller the value of ξ, the faster is the mutation procedure. Also, note that the difference matrix Ξ is (permanently) stored in a RAM (operating memory), so there is no need to calculate the differences of the objective function from scratch. Algorithm 5. Pseudo-code of the procedure for recalculation of the differences of the objective function.

Candidate Acceptance
Regarding the candidate solution acceptance rule, we always choose the most recently (newly) found improved solution (the latest result of the HITS (or TS) algorithm) instead of Entropy 2021, 23, 108 16 of 33 the overall best found solution. Such an approach is thought to allow to explore potentially larger regions of the solution space.

Population Replacement
For the population replacement, a modified rule is used to respect not only the quality of the solutions, but also the difference (distance) between solutions.
We have, in particular, implemented an extended variant of the well-known "µ + λ" update rule [139]. The new advanced replacement rule is denoted as "µ + λ, ε". (This rule is also used for the initial population construction (see Section 3.1).) We remind that if the minimum mutual distance between population members and the new obtained offspring is less than the distance threshold, DT, then the offspring is omitted. The only exception is the case where the offspring appears better than the best population member. Otherwise, the offspring enters the current population, but only under condition that it is better than the worst population member. The worst individual is removed in this case. This our replacement strategy ensures that only individuals that are diverse enough survive for the further generations.
There are a few replacement variations (options), depending on the parameter RepVar. If RepVar = 1, then exactly the above replacement strategy is adopted. If RepVar = 2, then the new offspring replaces the best member of the current population if the offspring is better than the best population individual. If the offspring is worse than the best individual, then RepVar = 2 is identical to RepVar = 3. If RepVar = 3, then the offspring replaces the worst member of the population, ignoring the fitness of the worst individual. The minimum distance criterion must be taken into account.

Population Restart
The important feature of our genetic algorithm is the use of the population restart mechanism to try to avoid the premature convergence and search stagnation. The restart process is triggered in the situations where the solutions of the population do not change at all for some number of consecutive generations. This can be operationalized by the use of a priori parameter called an idle generations limit, L idle_gen , where L idle_gen = max{L min , ωN gen }, here L min is a constant (we use L min = 3), ω is to denote a stagnation control parameter and N gen is the total number of generations of the genetic algorithm. (The standard value of ω is equal to 0.05.) The restart itself is performed by applying a so-called multi-mutation, where the mutation process is applied to all the members of the stagnated population. Such approach is preferred to the complete destroying of the population, which seems to be too aggressive.

Computational Experiments
Our genetic-hierarchical algorithm is implemented by using C# programming language. The computational experiments have been carried out on a 3.1 GHz personal computer running Windows 7 Enterprise. The CPU model is an Intel Core i5-3450.
As a performance criterion, we adopt the average relative percentage deviation (θ) of the yielded solutions from the best known solution (BKS). It is calculated by the following formula: , where z is the average objective function value over 10 runs of the algorithm, while z bkv denotes the best known value (BKV) of the objective function that corresponds to the BKS (BKVs are from [14,29,86]).
At every run, the algorithm is applied to the particular instance. Each time, the algorithm starts from a new random initial population. The algorithm is terminated if either the maximum number of generations, N gen , has been reached or the best known solution has been achieved.
In the experiments, the goal was to empirically test the performance of the basic setup of our algorithm and also its various variants in terms of the quality of solutions and the run time of the algorithm. To do so, we have identified some essential algorithm's components (ingredients) (namely, "initial population", "selection", "crossover", "local improvement (hierarchical tabu search)", "mutation", "population replacement") to reveal their influence on the efficiency of GHA and to "synthesize" the preferable fine-tuned architecture of the algorithm. The following combination of the particular options (parameters) related to these components is declared as the basic version of GHA: {InitPopVar = 1 , PS = 10, N gen = 100, σ = 1.5, CrossVar = "1PX", τ = 20, Q hier = 2 8 = 256, MutVar = "M1", RepVar = 1}; here, Q hier denotes the total cumulative number of hierarchical iterations (7) ), Q (0) , . . . , Q (7) denote, respectively, the corresponding numbers of iterations of the 0th-level, . . . , 7th-level hierarchical iterated tabu search algorithms. The prescribed default values of the control parameters corresponding to the basic version of GHA are shown in Tables 1 and 2. (These values were later over-ridden in particular separate experiments). Table 1. Values of the control parameters of the basic version of GHA used in the experiments.

Parameter Value Remarks
Number of hierarchical iterated tabu search iterations, Q hier 256 In the initial experiments, twelve crossover procedures (1PX, 2PX, UX, SX, PMX, SPX1, SPX2, SPX3, CX, COHX, MPX, UNIVX) have been compared against each other. The obtained results (presented in Table 3) demonstrate that our proposed universal crossover (UNIVX) with the tuned control parameters yields the most promising results.
Further, we were interested in how various options (configurations) of the initial population construction affect the performance of the genetic-hierarchical algorithm. The particular separate configurations differ with respect to the option of the population construction (InitPopVar), the size of pre-initial population (PS ), as well as the number of TS iterations during the population initialization (τ ). In particular, the following variants were investigated: We have observed that maintaining the higher quality initial populations, in general, allows to significantly increase the overall efficiency of GHA when comparing to the lower quality initial populations (see Table 5).
Additionally, we experimented with some few population replacement options. The particular population replacement variants are as follows: (1)  It was observed that the aggressive strategy of replacement of the best population member (RepVar = 2) seems to be superior to other options (see Table 6). Further, more extensive experiments are required to strengthen this conjecture.
The results of the experiments (see Table 7) demonstrate that the scenario of diversified extensive search is obviously preferable to other examined scenarios.
The results confirm that, as expected, there exists a clear correlation between the number of improving iterations (the number of TS/HITS iterations) and the quality of the obtained solutions (see Table 8).
To have a reflection of the obtained results from a different perspective-in particular, a demonstration of the stability and robustness properties of our algorithm-we have constructed histograms of the frequency of the objective function values for one of the most difficult instances of the "learning set"-bl64 (see Figure A1 in the "Appendix A" Section). In fact, we have created the histograms of the frequency of the average percentage deviation, θ, over 10 algorithm runs within the interval [0.0, 1.0), where 0.0 stands for zero deviation and 1.0 means the maximum possible deviation. (Note that the average deviation never exceeded 1.0 for the instance bl64 (see Table 8).) (Regarding the selection factor, σ, the obtained results are quite "flat" and not statistically significant, so they are omitted).
On the whole, we have found the best known solutions in the 9191 cases (runs) out of 14400 cases (64% of cases). The BKS was found at least once for all examined instances. The cumulative average percentage deviation is equal to 1.452% and the cumulative average CPU time per run is equal to approximately 65.9 s. The average deviation is less than 0.5 in 73% of cases, while the average deviation is less than 1.0 in 89% of cases. 14 instances (ci49, ci64, dre42, lipa70a, lipa70b, sko56, sko64, tai35a, tai35b, tai40b, tai45e1, tai50b, tai60b, wil50) were solved to pseudo-optimality in more than 300 runs.
After experimenting with the "learning set" of instances, the other instances (the "testing set" of instances) were examined using the fine-tuned parameters in order to find out how quickly the genetic-hierarchical algorithm converges to the best known/optimal solutions. The obtained results are presented in Table 9. It can be seen that all tested instances (88 instances) are solved to pseudo-optimality within extremely small computation time.   Note. In all cases, the 1PX crossover is used.  Note. In all cases, the UNIVX crossover and the mutation variant M12 are used. Also, the initial population construction option InitPopVar = 1 is used.  Note. Time denotes the average CPU time per one run. In parentheses, we present the number of times that the BKS has been found.

Concluding Remarks
In this paper, we have presented the hybrid genetic-hierarchical algorithm for the solution of the quadratic assignment problem. The key feature of the proposed algorithm is that the genetic algorithm is hybridized with the hierarchicity-based (self-similar) iterated tabu search algorithm, which serves as a powerful local optimizer of the offspring solutions produced by the crossover operator.
The algorithm was examined on the QAP benchmark instances of various sizes and complexity. The results obtained from the experiments demonstrate the excellent performance of the genetic-hierarchical algorithm. Our algorithm seems to outperform other state-of-the-art heuristic algorithms for many examined QAP instances or is at least very much competitive with them. A more pronounced improvement in the quality of the results might be achieved by a thorough calibration of the algorithm's parameters.
The following are some possible future research directions: balancing of the number of tabu search iterations and the number of hierarchical iterated tabu search iterations, as well as the number of hierarchical levels; extensive experimental analysis of the particular components and configurations of the genetic-hierarchical algorithm; designing and implementing a multi-level hierarchical (master-slave) genetic algorithm.