Empirical Convergence Theory of Harmony Search Algorithm for Box-Constrained Discrete Optimization of Convex Function

Yoon, Jin Hee; Geem, Zong Woo

doi:10.3390/math9050545

Open AccessCommunication

Empirical Convergence Theory of Harmony Search Algorithm for Box-Constrained Discrete Optimization of Convex Function

by

Jin Hee Yoon

¹

and

Zong Woo Geem

^2,*

¹

Department of Mathematics and Statistics, Sejong University, Seoul 05006, Korea

²

College of IT Convergence, Gachon University, Seongnam 13120, Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(5), 545; https://doi.org/10.3390/math9050545

Submission received: 19 January 2021 / Revised: 25 February 2021 / Accepted: 26 February 2021 / Published: 4 March 2021

Download

Browse Figures

Versions Notes

Abstract

:

The harmony search (HS) algorithm is an evolutionary computation technique, which was inspired by music improvisation. So far, it has been applied to various scientific and engineering optimization problems including project scheduling, structural design, energy system operation, car lane detection, ecological conservation, model parameter calibration, portfolio management, banking fraud detection, law enforcement, disease spread modeling, cancer detection, astronomical observation, music composition, fine art appreciation, and sudoku puzzle solving. While there are many application-oriented papers, only few papers exist on how HS performs for finding optimal solutions. Thus, this preliminary study proposes a new approach to show how HS converges on an optimal solution under specific conditions. Here, we introduce a distance concept and prove the convergence based on the empirical probability. Moreover, a numerical example is provided to easily explain the theorem.

Keywords:

harmony search; convergence; empirical probability; optimization; metaheuristics

1. Introduction

In recent years, many researchers have utilized various nature-inspired metaheuristic algorithms for solving scientific and engineering optimization problems. One of the popular algorithms is harmony search (HS), which was inspired by jazz improvisation [1]. As a musician plays a musical note from its memory or randomly, HS generates a value from its memory or randomly. As a new harmony, which is composed of musical notes, is evaluated in each practice and memorized if it is good, a new solution vector, which is composed of values, is evaluated in each iteration and memorized if it performs well. This HS optimization process, which utilizes three basic operations (memory consideration, pitch adjustment, and random selection), continues until it finds an optimal solution vector [2].

When compared with other algorithms, there is a similarity between HS and them [1]. HS is similar to Tabu Search with respect to keeping the past vectors in a memory called harmony memory (HM). In addition, HS can use the adaptative parameters of HMCR (Harmony Memory Consideration Rate) and PAR (Pitch Adjustment Rate), which are similar to Simulated Annealing in varying temperature. Furthermore, both HS and genetic algorithm (GA) manage multiple vectors simultaneously. However, there is also a difference between HS and GA. While HS generates a new vector by considering all the existing vectors, GA generates the new vector by considering only two of the existing vectors (the parents). In addition, HS considers each variable in a vector independently while GA cannot consider in that way because its major operation is crossover, which keeps the gene sequence (multiple variables together).

The HS algorithm has been applied to various optimization problems including project scheduling [3], structural design [4], energy system operation [5], car lane detection [6], ecological conservation [7], model parameter calibration [8], portfolio management [9], banking fraud detection [10], law enforcement [11], disease spread modeling [12], cancer detection [13], astronomical observation [14], music composition [15,16], fine art appreciation [17], and sudoku puzzle solving [18]. Furthermore, there are some application-oriented reviews of HS [19,20,21,22,23,24,25].

While there are many applications proposed so far, only a few studies have been dedicated to the theoretical background of the HS algorithm. Beyer [26] dealt with the expected population variance of several evolutionary algorithms, and Das et al. [27] proposed an approximated variance of the expectation of solution candidates and discussed the exploratory power of the HS algorithm. However, there has been no study discussing the convergence of the HS algorithm while the convergence of other optimization algorithms has been discussed in some studies [28,29,30,31].

As an evolutionary computation algorithm, HS considers an optimization problem:

m i n_{x \in X} f (x),

(1)

where

X = {x = (x_{i}) \in R^{n} | l_{i} < x_{i} < u_{i}, i = 1, \dots, n}, and f : R^{n} \to R .

If all the bounds are infinite, the above problem becomes an unconstrained optimization. However, most practical optimization problems adopt variables having certain prefixed range of values, and, therefore, become box-constrained problems [32].

In this paper, we propose the convergence theory based on the empirical probability for the HS, which is one of box-constrained optimization algorithms. In particular, we consider the convex optimization functions with single and multiple variable cases. For this, we define a discrete sequence to prove convergence using distance. This new approach can be applied to the algorithms, such as HS, which have candidate sets that store the improved solutions in the storage and iteratively update them.

2. Harmony Search Algorithm

2.1. Basic Structure of Harmony Search

The HS algorithm is an optimization method inspired by musical phenomena. This algorithm mimics the musical performing process that occurs when a musician searches for a better harmonic sound, such as jazz improvisation. Jazz improvisation finds a musically pleasing harmony (perfect state) determined by aesthetic standards, just as the optimization process finds a global solution (perfect state) determined by an objective function. While the pitch of each instrument determines its aesthetic quality, the value of each decision/design variable determines its solution quality of the objective function.

In order to optimize a problem using HS, we first define all possible candidate value set called the candidate set (universal set)

Λ

. Let us assume that

Λ

includes

K candidate values .

That is,

Λ = [Λ_{1}, \dots Λ_{n}]

and

Λ_{i} = {x_{i} (1), \dots, x_{i} (K_{i})} .

Here, the memory storage HM of the HS algorithm can be expressed as in Equation (2). To initialize HM, we randomly choose values from the universal set, and generate vectors as many as HMS (harmony memory size, that is, the number of vectors stored in HM). The value of the objective function is also kept next to each solution vector.

HM = [\begin{matrix} x_{1}^{1} & x_{2}^{1} & \dots & x_{n}^{1} \\ x_{1}^{2} & x_{2}^{2} & \dots & x_{n}^{2} \\ ⋮ & \dots & \dots & \dots \\ x_{1}^{H M S} & x_{2}^{H M S} & \dots & x_{n}^{H M S} \end{matrix} | \begin{matrix} f (x^{1}) \\ f (x^{2}) \\ ⋮ \\ f (x^{H M S}) \end{matrix}]

(2)

Once HM is prepared, the HM is refreshed with better solution vectors iteration by iteration. If a newly generated vector

x_{N e w}

is better than the worst vector

x_{Worst}

stored in the HM in terms of an objective function value, the new vector is swapped with the worst one.

{HM}_{New} = HM \ {x_{W o r s t}} \cup^{} {x_{N e w}}

The value of variable

x_{i} (i = 1, 2, \dots, n)

can be randomly selected from the set of all candidate discrete values

Λ_{i} =

{x_{i} (1), x_{i} (2), \dots, x_{i} (K_{i})}

with a probability of

P_{R}

(random selection rate). Otherwise, the value of

x_{i}

can be selected from the set of stored values, named

H M_{i} = {x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{H M S}}

, which is the i^th column of HM, with a probability of

P_{M}

(the probability with which the value is selected solely from HM), or, once

x_{i} (k_{p}^{i})

is selected from

H M_{i}

, it can be slightly adjusted by moving it to neighboring values

x_{i} (k_{p}^{i} + m)

with a probability of

P_{P}

(the probability with which the value is selected solely using pitch adjustment) as follows:

x_{i}^{N e w} \leftarrow {\begin{matrix} x_{i} (k_{R}^{i}) \in Λ_{i}, w i t h p r o b a b i l i t y P_{R}, \\ x_{i}^{h} \in H M_{i}, w i t h p r o b a b i l i t y P_{M}, \\ x_{i} (k_{P}^{i} + m) \in Λ_{i}, w i t h p r o b a b i l i t y P_{P}, \end{matrix}

(3)

where

= 1, \dots, n; k_{R}^{i}, k_{P}^{i} \in {1, \dots, K_{i}}; h \in {1, \dots, H M S},

P_{R} + P_{M} + P_{P} = 1 (100 %)

,

0 \leq P_{R}, P_{M}, P_{P} \leq 1, and m

is some non-zero integer such that

x_{i} (k_{P}^{i} + m) \in Λ_{i}

. Here,

m

is a predetermined parameter for adjustment. That is,

m

=

\pm 1

or

\pm 2

or

\pm 3 \dots \dots

, etc. For example, if

m

=

\pm 1,

then we take

m = {\begin{matrix} 1, w i t h p r o b a b i l i t y \frac{1}{2} \\ - 1, w i t h p r o b a b i l i t y \frac{1}{2} \end{matrix}

For

P_{M}

and

P_{P}

, we first define HMCR by HMCR

= 1 - P_{R}

. Then, after we define PAR,

P_{M}

and

P_{P}

are defined as

P_{M} = HMCR (1 - PAR)

and

P_{P} = HMCR \cdot PAR

.

Here, for the pitch adjustment, we first select

x_{i}^{h}

randomly from

H M_{i}

using a uniform distribution between 0 and 1, i.e.,

h = I n t e g e r (R a n d (0, 1) \times H M S + 0.5) .

Then, we identify

k_{P}^{i},

which satisfies

x_{i}^{h} = x_{i} (k_{P}^{i}), where k_{P}^{i} \in {1, \dots, K_{i}} .

Next, we further tweak

x_{i} (k_{P}^{i})

into

x_{i} (k_{P}^{i} + m),

where

x_{i} (k_{P}^{i} + m) \in Λ_{i} .

This is the basic structure of the HS algorithm [33]. Although there are many structural variants of the HS algorithm [34], most variants basically have the above-mentioned three operations: memory consideration, pitch adjustment, and random selection.

2.2. Solution Formula for Harmony Search without Pitch Adjustment

Let us formulate the solution for the HS algorithm without a pitch adjustment case.

At the first generation,

x_{i}^{1} (i = 1, \dots, n)

will be chosen with given probabilities as follows:

x_{i}^{1} = {\begin{matrix} x_{i} (k) f r o m Λ_{i} w i t h p r o b . P_{R}, \\ x_{i}^{h} f r o m H M_{i}^{0} w i t h p r o b . P_{M}, \end{matrix}

(4)

where

i = 1, \dots, n; k = k_{P} \in {1, \dots, K_{i}}; h \in {1, \dots, H M S},

P_{R} + P_{M} = 1

and

H M_{i}^{0}

is the initial harmony memory for

x_{i}

. And let

x_{i}^{g}

be the solution with given probabilities at the

g^{t h}

generation stage, then

x_{i}^{g} = {\begin{matrix} x_{i} (k) f r o m Λ_{i} w i t h p r o b . P_{R}, \\ x_{i}^{h} f r o m H M_{i}^{g - 1} w i t h p r o b . P_{M}, \end{matrix}

(5)

where

H M_{i}^{g - 1}

is the newly updated

i^{t h}

column of HM after the

{(g - 1)}^{t h}

generation. At the first generation,

x^{1} = (x_{1}^{1}, x_{2}^{1}, \dots, x_{n}^{1})

is obtained by two operations (random selection or memory consideration). If the newly generated vector

x^{1}

is better than

x_{worst}^{0}

which is the worst vector in

H M^{0}

in terms of an objective function value, then

x_{worst}^{0}

will be replaced with

x^{1} .

Otherwise, the worst vector

x_{worst}^{0}

will stay in the memory.

The element that we get from this comparison after the first generation will be represented by

x_{new}^{1}

. That is,

x_{new}^{1} \leftarrow {\begin{matrix} x^{1} i f x^{1} i s b e t t e r t h a n x_{worst}^{0}, \\ x_{worst}^{0} o t h e r w i s e . \end{matrix}

(6)

Following a similar procedure, at the

g^{t h}

generation, if

x^{g}

is better than

x_{worst}^{g - 1}

, which is the worst solution vector in

H M^{g - 1}

, then

x_{worst}^{g - 1}

will be replaced with

x^{g} .

Otherwise, the worst element

x_{worst}^{g - 1}

will stay in the memory.

The element that we get from this comparison after the

g^{t h}

generation will be represented by

x_{new}^{g}

. That is,

x_{new}^{g} \leftarrow {\begin{matrix} x^{g} i f x^{g} i s b e t t e r t h a n x_{worst}^{g - 1}, \\ x_{worst}^{g - 1} o t h e r w i s e, \end{matrix}

(7)

Using an indicator function, the solution formula for

x_{new}^{g}

in Equation (7) can be equivalently represented by:

x_{new}^{g} = x^{g} \cdot I^{g} + x_{worst}^{g - 1} \cdot (1 - I^{g}),

(8)

where

I^{g} = {\begin{matrix} 1 i f x^{g} i s b e t t e r t h a n x_{worst}^{g - 1}, \\ 0 o t h e r w i s e . \end{matrix}

(9)

2.3. Solution Formula for Harmony Search with Pitch Adjustment

Next, let us formulate the solution for the HS algorithm with a pitch adjustment case.

At the first generation,

x_{i}^{1} (i = 1, \dots, n)

will be chosen with given probabilities as follows:

x_{i}^{1} = {\begin{matrix} x_{i} (k_{R}^{i}) f r o m Λ_{i} w i t h P_{R}, \\ x_{i}^{h} f r o m H M_{i}^{0} w i t h P_{M}, \\ x_{i} (k_{P}^{i} + m) f r o m Λ_{i} w i t h P_{P}, \end{matrix}

(10)

where

i = 1, \dots, n; k_{R}^{i}, k_{P}^{i} \in {1, \dots, K}; h \in {1, \dots, H M S},

P_{R} + P_{M} + P_{P} = 1

and

H M_{i}^{0}

is the initial harmony memory. Then, let

x_{i}^{g}

be the solution with given probabilities at the

g^{t h}

generation stage, in which:

x_{i}^{g} = {\begin{matrix} x_{i} (k_{R}^{i}) f r o m Λ_{i} w i t h P_{R}, \\ x_{i}^{h} f r o m H M_{i}^{g - 1} w i t h P_{M}, \\ x_{i} (k_{R}^{i} + m) f r o m Λ_{i} w i t h P_{P} \end{matrix}

(11)

where

H M_{i}^{g - 1}

is the newly updated

i^{t h}

column of HM after the

{(g - 1)}^{t h}

generation. At the first generation,

x^{1} = (x_{1}^{1}, x_{2}^{1}, \dots, x_{n}^{1})

is obtained by three operations (random selection, memory consideration or pitch adjustment). If

x^{1}

is better than

x_{worst}^{0}

which is the worst one in

H M^{0}

in terms of an objective function value, then

x_{new}^{1} = x^{1}

. Otherwise,

x_{new}^{1} = x_{worst}^{0}

. That is,

x_{new}^{1} \leftarrow {\begin{matrix} x^{1} i f x^{1} i s b e t t e r t h a n x_{worst}^{0}, \\ x_{worst}^{0} o t h e r w i s e . \end{matrix}

(12)

Therefore, after the first generation,

H M^{1}

will be updated by substituting

x_{worst}^{0}

with

x_{new}^{1} .

Then, after the

g^{t h}

generation,

x_{new}^{g} \leftarrow {\begin{matrix} x^{g} i f x^{g} p e r f o r m s b e t t e r t h a n x_{worst}^{g - 1}, \\ x_{worst}^{g - 1} o t h e r w i s e, \end{matrix}

(13)

where

x_{worst}^{g - 1}

is the solution vector that performs the worst in the

H M^{g - 1}

. Therefore, after the

g^{t h}

generation,

H M^{g}

will be updated by substituting

x_{worst}^{0}

with

x_{new}^{1} .

Let

x_{new}^{g} = \hat{x^{g}}

. Then, using an indicator function, the solution formula for

\hat{x^{g}}

in Equation (13) can be represented by:

\hat{x^{g}} = x^{g} \cdot I^{g} + x_{worst}^{g - 1} \cdot (1 - I^{g}),

(14)

where

I^{g} = {\begin{matrix} 1 i f x^{g} p e r f o r m s b e t t e r t h a n x_{worst}^{g - 1}, \\ 0 o t h e r w i s e . \end{matrix}

(15)

3. Empirical Convergence of Harmony Search

3.1. One Variable Case

In this section, we discuss the behavior of the solutions in HM. Let us first consider

f (x)

is a function of one discrete variable

x

. Without loss of generality, we can assume that

Λ = {x (k) \in R^{1} | k = 1, \dots, K}

satisfies

x (1) < x (2) < \dots < x (k) < \dots < x (K)

after rearrangement, and we have

K - 1

subintervals that are separated by

x (1), x (2), \dots, x (K)

(See Figure 1). In addition,

H M = {x^{h} \in R^{1} | h = 1, \dots, H M S}

satisfies

x^{1} \leq x^{2} \leq \dots \leq x^{h} \leq \dots \leq x^{H M S}

after rearrangement. The

h^{t h}

element of HM at the

g^{t h}

generation will be written as

x^{h, g}

.

Here, let us define the total ranges of the candidate set

Λ

and HM. In addition, we define the distances between neighboring endpoints, which are the lengths of subintervals as follows:

D = | Λ | = | x (K) - x (1) |

D^{g} = | H M^{g} | = | x_{}^{H M S, g} - x_{}^{1, g} |

(16)

δ_{k} = | x (k + 1) - x (k) |, k = 1, \dots, K - 1

δ_{h_{j}}^{g} = | x^{j + 1, g} - x^{j, g} |, j = 1, \dots, H M S - 1 (at g^{t h} generation)

(17)

and we define the minimum and maximum value of

δ_{k}

as follows:

δ_{m} = \min_{1 \leq k \leq K - 1} δ_{k} and δ_{M} = \max_{1 \leq k \leq K - 1} δ_{k} .

(18)

Here, note that we have an important relation as follows:

(K - 1) δ_{m} \leq D = \sum_{k = 1}^{K - 1} δ_{k} \leq (K - 1) δ_{M} .

(19)

As we mentioned, we deal with the convex function with discrete variables for its minimum optimal solution. However, note that it can be also applied to the concave function for the maximum optimal solution without loss of generality. If

f (x)

has the minimum value at

x^{h *}

, then

x^{h *}

can be located in four different ways. Figure 2 shows these four different cases. (a) and (b) show the convex objective functions. In this paper, we focus on the convex cases where

x^{h *}

is located in the middle of the range of HM, which is also applied to the concave cases. However, this theory can be also applied to the monotone increasing/decreasing objective functions such as (c) and (d) as trivial cases, where

x^{h *}

is located at the right or left end point.

By Equations (5) and (7), the worst element of HM is replaced with a better element as

g

is increased. Even though not every generation replaces the worst element of HM with a new one, it is true that the worst element of HM is replaced as

g

is increased. Now, let us consider the case when HM is updated at the

g^{t h}

generation. From Figure 2, we can consider the following properties.

(a): & (c): Because $x^{1} = x_{}^{w o r s t},$ the left-end smallest-value point $x_{}^{1}$ will be replaced by some value except in the case of $x_{}^{’} = x^{1} .$ Here, note that $x_{}^{1} \leq x_{}^{’} \leq x_{}^{H M S}$ . The smallest-value element $x_{}^{1}$ can be newly updated by $x_{}^{’}$ or $x_{}^{2}$ . Consider that we are at the $g^{t h}$ generation. Then, the previous $x_{}^{1}$ is denoted as $x_{}^{1, g - 1}$ , and the newly updated smallest one from HM will be denoted as $x_{}^{1, g}$ . Here,

$x_{}^{1, g} = {\begin{matrix} x_{}^{2, g - 1} i f x_{}^{2, g - 1} < x_{}^{’}, \\ x_{}^{’} i f x_{}^{2, g - 1} \geq x_{}^{’} . \end{matrix}$

(20)

It is clear that

$| x_{}^{H M S, g} - x_{}^{1, g} | \leq | x_{}^{H M S, g - 1} - x_{}^{1, g - 1} | .$

(21)

(b): & (d): Because $x_{}^{H M S} = x_{}^{w o r s t},$ the right end point $x_{}^{H M S}$ will be replaced by some value except in the case of $x_{}^{’} = x^{H M S} .$ $x_{}^{’}$ , which shows better performance than $x_{}^{H M S}$ . Here, note that $x_{}^{1} \leq x_{}^{’} \leq x_{}^{H M S}$ . The largest-value element $x_{}^{H M S}$ can be updated by $x_{}^{’}$ or $x_{}^{H M S - 1}$ . Consider that we are at the $g^{t h}$ generation. Then, the previous $x_{}^{H M S}$ is denoted as $x_{}^{H M S, g - 1}$ , and the newly updated largest one from HM will be denoted as $x_{}^{H M S, g}$ . Here,

$x_{}^{H M S, g} = {\begin{matrix} x_{}^{(H M S - 1), g - 1} i f x_{}^{(H M S - 1), g - 1} > x_{}^{’}, \\ x_{}^{’} i f x_{}^{(H M S - 1), g - 1} \leq x_{}^{’} . \end{matrix}$

(22)

It is clear that

$| x_{}^{H M S, g} - x_{}^{1, g} | \leq | x_{}^{H M S, g - 1} - x_{}^{1, g - 1} | .$

(23)

Now we prove the following theorem.

Theorem 1.

Assume that there exists exactly one solution in the candidate set

Λ

for the objective function

f (x)

. If

D^{g} = | H M^{g} |

is the length of HM at the

g^{t h}

generation, then

(i): ${D^{g}}$ is a monotone decreasing sequence as $g$ is increased.
(ii): Furthermore, the solutions in HM converge.

Proof.

(i) As we have seen from Equations (21) and (23), we satisfy

D^{g} \leq D^{g - 1},

(24)

which means the sequence

{D^{g}}

is a monotone decreasing sequence.

Note that not every generation makes HM updated. Therefore, we need to define a new notation for the generation that updates HM. If the worst element of HM is replaced by a new element

x^{'}

at

g_{1}^{t h}, g_{2}^{t h}, \dots

generation such that

D^{g_{l}} < D^{g_{l - 1}},

(25)

then

{g_{l} | l = 1, 2, 3, \dots}

is the set of generation numbers that update HM when each update reduces

| x^{H M S} - x^{1} |

in HM. It is clear that

{g_{l}} \subset {g},

(26)

i.e.,

{g_{l}}

a subsequence of

{g} .

Note that, if there are more than one worst solution in HM, then even if HM is updated, it does not satisfy Equation (25). Therefore, the generation number at that time, it is not an element of

{g_{l}}

.

Now, we consider that either

D^{g_{l}} > 0

or

D^{g_{l}} = 0

is true. Equivalently, if either

D^{g_{l}} \geq δ_{m}

or

D^{g_{l}} = 0

is true, then, in the case of

D^{g_{l}} > 0

, we have

D^{g_{l}} \geq δ_{m}

Furthermore, for the smallest length of subinterval

δ_{m},

it is easily seen that:

D^{g_{l}} \leq D^{g_{l - 1}} - δ_{m} .

(27)

Here, we need to check the existence of {

g_{l}

} because we are not sure if HM is steadily updated as

g

is increased. That is, we are not sure if we can guarantee

x^{*} (= a r g m i n f (x))

is selected at certain generation. In addition, it is finally guaranteed based on the relation between empirical probability and theoretical probability. That is, if we repeat the generation, it is guaranteed that

x^{*}

is surely selected at certain generation. Therefore, there exist each generation number

g_{l}

(

l = 1, 2, \dots),

which updates HM.

Meanwhile, we have the following property:

D^{g_{1}} \leq D^{g_{1} - 1} - δ_{m} and D^{g_{2}} \leq D^{g_{2} - 1} - δ_{m} .

(28)

HM is updated only at

g_{l}^{t h}

generation (

l = 1, 2, 3, \dots

). Therefore,

D^{g_{1} - 1} = D^{0} and

D^{g_{2} - 1} = D^{g_{1}}

because

g_{1}^{t h}

is the first generation which updates HM. Thus,

H M^{g_{1} - 1} = H M^{g_{1} - 2} = \dots = H M^{0}

, where

H M^{0}

is the initial HM. Since

g_{2}^{t h}

is the second generation that makes HM updated,

H M^{g_{2} - 1} = H M^{g_{2} - 2} = \dots = H M^{g_{1}} .

If we repeat this procedure

l

times, we get:

D^{g_{l}} \leq D^{0} - l δ_{m},

(29)

where

D^{0} = | H M^{0} |

is the initial range of HM. Here, {

D^{g_{l}}

} is a subsequence of {

D^{g}

}. Here,

D^{g_{l}} = D^{0} - l δ_{m},

is unreasonable form practical point of view. Therefore, we assume that

D^{g_{l}} < D^{0} - l δ_{m}

. Since {

D^{g}

} is decreasing via monotone, {

D^{g_{l}}

} is decreasing via monotone as well. Furthermore, {

D^{g}

} is bounded below by 0. For {

D^{g}

}, note that we have:

D^{0} = \dots = D^{g_{1} - 1} > D^{g_{1}} = D^{g_{1} + 1} = \dots > D^{g_{2}} = \dots > D^{g_{3}} \dots

(30)

Now, let us define

ϵ_{l}

as follows:

ϵ_{l} = D^{0} - l δ_{m} .

(31)

Since

ϵ_{l} > D^{g_{l}} \geq 0,

as

l

increases,

ϵ_{l}

can be negative, but we consider only the case of

ϵ_{l} > 0

. Therefore, let us have

ϵ_{l} > 0

for

l = 1, 2, 3, \dots, L

. Since

ϵ_{l}

(

l = 1, 2, 3, \dots, L

) have discrete values, we see that

{ϵ_{l}

} is a positive decreasing sequence (See Figure 3a). As

g

is increased,

g_{1,} g_{2}, \dots

exist empirically. Therefore, every time HM is updated, the sequence

ϵ_{l}

is decreased. That is, as

l

increases,

ϵ_{1} > ϵ_{2} > \dots > ϵ_{l} > \dots > ϵ_{L} > 0 .

(32)

Assume that the solutions in HM converge after the

g_{t}^{t h}

generation. Then,

D^{0} > D^{g_{1}} > D^{g_{2}} > \dots > D^{g_{t}} = D^{g_{t + 1}} = D^{g_{t + 2}} = \dots = D^{g_{L}}

(33)

And it is true that:

D^{g_{t}} < D^{0} - t δ_{m} = ϵ_{t}, but ϵ_{t} is not very small,

(34)

because

δ_{m}

is the minimum length of subintervals. Therefore, we continue the procedure even after the solutions in HM converge until

ϵ_{l}

is very small but positive. We repeat the generation

s

more times after the

g_{t}^{t h}

generation until we satisfy:

D^{g_{t + s}} < D^{0} - (t + S) δ_{m} = ϵ_{t + s}, and ϵ_{t + s} > 0,

(35)

D^{g_{t + s + 1}} < D^{0} - (t + S + 1) δ_{m} = ϵ_{t + s + 1}, but ϵ_{t + s + 1} < 0,

(36)

for some positive integer

s

. However, Equation (36) is not possible because we assume that

ϵ_{l} > 0,

so we stop the procedure after

s

more generations. Therefore,

ϵ_{t + s} = ϵ_{L} .

Here,

ϵ_{L}

is still a positive constant as in Figure 3b. But,

ϵ_{L} < δ_{m} \leq δ_{H M S - 1} .

(37)

Since we have either

D^{g_{l}} \geq δ_{m}

or

D^{g_{l}} = 0

, Equations (35)–(37) guarantee that

D^{g_{t + s}} = D^{g_{L}}

is now 0. It means that, at the

g_{L}^{t h}

generation, it is guaranteed that:

x_{}^{1, g_{L}} = x_{}^{2, g_{L}} = \dots = x_{}^{H M S, g_{L}} .

(38)

In fact, Equation (38) has been already satisfied from the

g_{t}^{t h}

generation, but now the convergence is finally guaranteed by the inequality (34) mathematically. That is, the values in HM are convergent before the

g_{L}^{t h}

generation. Therefore, we can conclude that there exists some

t < L

for some positive integers

t and L

such that

H M^{g_{t}}

is convergent.

Therefore, the values in HM converge empirically, which proves the theorem. □

Remark.

In the general probability theory, for each eventE of the sample spaceS, we definen(E) to be the number of times in the firstn repetitions of the experiment that the eventE occurs. ThenP(E), which is the probability of the eventE, is defined as:

P (E) = P_{E} = \lim_{n \to \infty} \frac{n (E)}{n} .

(39)

Here, P(E) is defined as the (limiting) proportion of time that E occurs. It is, thus, the limiting frequency of E.

P (E) = P_{E}

is called the “theoretical probability” and

\frac{n (E)}{n}

is called the “empirical probability.” The empirical probability approaches the theoretical probability as the number of experiments is increased. It means when the repetition number n is small, the event E may not occur. However, if we repeat the experiment many times, i.e., if n is large enough, the ratio between n(E) and n converges

P_{E}

. It guarantees that the E should occur (even n(E) times) if we repeat the experiment many times. Therefore, even when HM does not include the solution, that minimizes

f (x)

, if

Λ

includes the unique solution, say

x^{*} .

Then it is absolutely selected as we repeat the iteration. That is:

P [x^{*} i s s e l e c t e d] = P_{R} \cdot \lim_{g \to \infty} \frac{n (E)}{g} = P_{R} \cdot \frac{1}{K}

(40)

where the event E is the event of selection of

x^{*}

. In addition, Equation (25) guarantees that

x^{*}

should be selected as

g \to \infty

.

Example 1.

Let us consider the situation when the minimum value of the objective function

f (x)

is at the right end side, as in Figure 2c. Define:

Λ = {\frac{k}{10} | k = 0, 1, 2, \dots, 120} and H M^{0} = {0, 2, 3, 3.7, 6, 9, 11} .

(41)

Then, as

g

increases, we get

g_{1}, g_{2}, \dots

. As we see from Figure 4, we have

D^{0} = 11,

and

δ_{h_{1}} = 2, δ_{h_{2}} = 1, δ_{h_{3}} = 0.7, δ_{h_{4}} = 2.3, δ_{h_{5}} = 3, δ_{h_{6}} = 2, and δ_{m} = 0.7 .

(42)

Furthermore, we get:

D^{0} = 11, D^{g_{1}} = 9, D^{g_{2}} = 8, D^{g_{3}} = 7.3, D^{g_{4}} = 5, D^{g_{5}} = 2, D^{g_{6}} = 0 .

(43)

Therefore,

t = 6

from Equation (43). However, from

δ_{m} = 0.7,

ϵ_{0} = 11, ϵ_{1} = 10.3, ϵ_{2} = 9.6, ϵ_{3} = 8.9, ϵ_{4} = 8.2, ϵ_{5} = 7.5, ϵ_{6} = 6.8 .

(44)

we get

ϵ_{6} = 6.8 > 0

and it is not a very small number yet. Therefore, we repeat the procedure s more times until we get Equations (35) and (36). It can be easily calculated that

s = 15

. Hence,

D_{6 + 15} < ϵ_{6 + 15} = 0.5 < δ_{m} = 0.7

(45)

We have either

D^{g_{l}} \geq δ_{m}

or

D^{g_{l}} = 0

by the definition of

D^{g_{l}}

. Therefore, Equation (45) guarantees that

D^{g_{21}} = 0

. Figure 4 shows the convergence of HM in Example 1. In fact, we already have

D^{g_{6}} = 0

, but it is not proven mathematically yet. Therefore, it is finally shown after 15 more generations as follows.

D^{g_{6}} = D^{g_{7}} = \dots = D^{g_{21}} = 0

(46)

Therefore, there exists a positive integer

t = 6 that is less than L = 21

such that the solution in

H M^{g_{6}}

is convergent. Example 1 shows Theorem 1 is confirmed.

3.2. Multiple Variable Case

Let

f (x)

be a function of n variables. Let

x = (x_{1} (k), \dots, x_{n} (k)) \in R^{n}

be a candidate solution from the candidate set

Λ .

Let us consider the norm

| | \cdot | |

of each solution vector as follows:

| | x (k) | | = {[\sum_{i = 1}^{n} x_{i}^{2} (k)]}^{1 / 2} .

(47)

Without loss of generality, we can assume that

Λ = {x (k) \in R^{n} | k = 1, \dots, K}

satisfies

| | x (1) | | < | | x (2) | | < \dots < | | x (k) | | < \dots < | | x (K) | |

after rearrangement, and we have

K - 1

distances between

K

vectors. The distance between two vectors

x (k_{1})

and

x (k_{2})

is defined by:

d (x (k_{1}), x (k_{2})) = | | x (k_{1}) - x (k_{2}) | | = {[\sum_{i = 1}^{n} {(x_{i} (k_{1}) - x_{i} (k_{2}))}^{2}]}^{1 / 2} .

(48)

In addition,

H M = {x^{h} \in R^{n} | h = 1, \dots, H M S}

satisfies

| | x^{1} | | \leq | | x^{2} | | \leq \dots \leq | | x^{h} | | \leq \dots \leq | | x^{H M S} | |

after rearrangement. The

h^{t h}

vector of HM at the

g^{t h}

generation will be written as

x^{h, g}

.

Here, let us define the total ranges of the candidate set

Λ

and HM. Similarly, without loss of generality, we focus on the minimum case for this optimal solution problem in this section because we can easily apply this theory to the maximum case. In addition, we define the distances between neighboring solution candidate vectors, which are the lengths of subintervals as follows.

D = | Λ | = | | x (K) - x (1) | |

D^{g} = | H M^{g} | = | | x^{H M S, g} - x^{1, g} | |

(49)

δ_{k} = | | x (k + 1) - x (k) | |, k = 1, \dots, K - 1

δ_{h_{j}} = | | x^{j + 1} - x^{j} | |, j = 1, \dots, H M S - 1

(50)

Then we define the minimum and maximum value of

δ_{k}

as follows:

δ_{m} = \min_{1 \leq k \leq K - 1} δ_{k} and δ_{M} = \max_{1 \leq k \leq K - 1} δ_{k} .

(51)

Here, note that we have an important relation as follows:

(K - 1) δ_{m} \leq D = \sum_{k = 1}^{K - 1} δ_{k} \leq (K - 1) δ_{M}

H M = {x^{h} \in R^{n} | | | x^{1} | | \leq \dots \leq | | x^{h} | | \leq \dots | | x^{H M S} | |}

(i): If $x^{1} = x^{w o r s t},$ then $x^{1}$ will be replaced by some $x',$ which shows better performance than $x^{1}$ . Here, note that $| | x^{1} | | < | | x' | | < | | x^{H M S} | | .$ Therefore, $x^{1},$ which has the smallest norm, will be newly replaced by $x'$ . Consider that we are at the $g^{t h}$ generation, then the previous $x^{1}$ is denoted as $x^{1, g - 1}$ . At that point, the newly updated smallest vector from HM will be denoted as $x^{1, g}$ . Here,

x^{1, g} = {\begin{matrix} x^{2, g - 1} i f | | x^{2, g - 1} | | < | | x' | |, \\ x' o . w . \end{matrix}

(52)

It is clear that

| | x^{H M S, g} - x^{1, g} | | \leq | | x^{H M S, g - 1} - x^{1, g - 1} | | .

(53)

(ii): If $x^{H M S} = x^{worst},$ then $x^{H M S}$ will be replaced by another vector $x',$ which shows better performance than $x^{H M S}$ . Here, note that $| | x^{1} | | < | | x' | | < | | x^{H M S} | |$ . Therefore, $x^{H M S},$ which has the largest norm, will be newly replaced by $x'$ . Consider that we are at the $g^{t h}$ generation. Then the previous $x^{H M S}$ is denoted as $x^{H M S, g - 1}$ . Then, the newly updated largest vector from HM will be denoted as $x^{H M S, g}$ . Here,

x^{H M S, g} = {\begin{matrix} x^{(H M S - 1), g - 1} i f | | x^{(H M S - 1), g - 1} | | < | | x' | |, \\ x' o . w . \end{matrix}

(54)

It is clear that:

| | x^{H M S, g} - x^{1, g} | | \leq | | x^{H M S, g - 1} - x^{1, g - 1} | | .

(55)

Corollary 2. Assume that there exists exactly one solution vector in the candidate set

Λ

for the objective function

f (x) = f (x_{1}, x_{2}, \dots, x_{n})

. In addition, if

D^{g} = | H M^{g} | = | | x^{H M S, g} - x^{1, g} | |

is the length of HM at the

g^{t h}

generation, then

(i): ${D^{g}}$ is a monotone decreasing sequence as $g$ is increased.
(ii): Furthermore, the values in HM converge.

Proof.

Based on the notation that are defined in Equations (47)–(51), the proof is clearly done by Theorem 1. □

4. Conclusions

In this communication, we employed the distance concept and proved the convergence of the HS algorithm based on the empirical probability. The solution behavior of HS for one or more discrete variables was discussed and the given theorem was demonstrated with a numerical example.

For the future study, we will expand the theorem to include non-discrete variables, multi-modal functions, and adaptive parameters [35].

Author Contributions

J.H.Y. developed the conceptualization, proving the theorems, and drafted the manuscript. Moreover, supervising, reviewing, and editing were done by Z.W.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C1A01011131).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Geem, Z.W.; Kim, J.H.; Loganathan, G.V. A new heuristic optimization algorithm: Harmony search. Simulation 2001, 76, 60–68. [Google Scholar] [CrossRef]
Lee, K.S.; Geem, Z.W. A new metaheuristic algorithm for continuous engineering optimization: Harmony search theory and practice. Comput. Methods Appl. Mech. Eng. 2004, 194, 3902–3933. [Google Scholar] [CrossRef]
Geem, Z.W. Multiobjective Optimization of Time-Cost Trade-Off Using Harmony Search. J. Constr. Eng. Manag. ASCE 2010, 136, 711–716. [Google Scholar] [CrossRef]
Geem, Z.W. Harmony Search Algorithms for Structural Design Optimization; Springer: Berlin, Germany, 2009. [Google Scholar]
Nazari-Heris, M.; Mohammadi-Ivatloo, B.; Asadi, S.; Kim, J.-H.; Geem, Z.W. Harmony search algorithm for energy system applications: An updated review and analysis. J. Exp. Theor. Artif. Intell. 2019, 31, 723–749. [Google Scholar] [CrossRef]
Moon, Y.Y.; Geem, Z.W.; Han, G.-T. Vanishing point detection for self-driving car using harmony search algorithm. Swarm Evol. Comput. 2018, 41, 111–119. [Google Scholar] [CrossRef]
Geem, Z.W. Can Music Supplant Math in Environmental Planning? Leonardo 2015, 48, 147–150. [Google Scholar] [CrossRef]
Lee, W.-Y.; Ko, K.-E.; Geem, Z.W.; Sim, K.-B. Method that determining the Hyperparameter of CNN using HS Algorithm. J. Korean Inst. Intell. Syst. 2017, 27, 22–28. [Google Scholar] [CrossRef] [Green Version]
Tuo, S.H. A Modified Harmony Search Algorithm for Portfolio Optimization Problems. Econ. Comput. Econ. Cybern. Stud. Res. 2016, 50, 311–326. [Google Scholar]
Daliri, S. Using Harmony Search Algorithm in Neural Networks to Improve Fraud Detection in Banking System. Comput. Intell. Neurosci. 2020, 2020, 6503459. [Google Scholar] [CrossRef] [PubMed]
Shih, P.-C.; Chiu, C.-Y.; Chou, C.-H. Using Dynamic Adjusting NGHS-ANN for Predicting the Recidivism Rate of Commuted Prisoners. Mathematics 2019, 7, 1187. [Google Scholar] [CrossRef] [Green Version]
Fairchild, G.; Hickmann, K.S.; Mniszewski, S.M.; Del Valle, S.Y.; Hyman, J.M. Optimizing human activity patterns using global sensitivity analysis. Comput. Math. Organ Theory 2014, 20, 394–416. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Elyasigomari, V.; Lee, D.A.; Screen, H.R.C.; Shaheed, M.H. Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J. Biomed. Inform. 2017, 67, 11–20. [Google Scholar] [CrossRef] [PubMed]
Deeg, H.J.; Moutou, C.; Erikson, A.; Csizmadia, S.; Tingley, B.; Barge, P.; Bruntt, H.; Havel, M.; Aigrain, S.; Almenara, J.M.; et al. A transiting giant planet with a temperature between 250 K and 430 K. Nature 2010, 464, 384–387. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Geem, Z.W.; Choi, J.Y. Music Composition Using Harmony Search Algorithm. Lect. Notes Comput. Sci. 2007, 4448, 593–600. [Google Scholar]
Navarro, M.; Corchado, J.M.; Demazeau, Y. MUSIC-MAS: Modeling a harmonic composition system with virtual organizations to assist novice composers. Expert Syst. Appl. 2016, 57, 345–355. [Google Scholar] [CrossRef]
Koenderink, J.; van Doorn, A.; Wagemans, J. Picasso in the mind’s eye of the beholder: Three-dimensional filling-in of ambiguous line drawings. Cognition 2012, 125, 394–412. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Geem, Z.W. Harmony Search Algorithm for Solving Sudoku. Lect. Notes Comput. Sci. 2007, 4692, 371–378. [Google Scholar]
Geem, Z.W. Music-Inspired Harmony Search Algorithm: Theory and Applications; Springer: New York, NY, USA, 2009. [Google Scholar]
Manjarres, D.; Landa-Torres, I.; Gil-Lopez, S.; Del Ser, J.; Bilbao, M.N.; Salcedo-Sanz, S.; Geem, Z.W. A Survey on Applications of the Harmony Search Algorithm. Eng. Appl. Artif. Intell. 2013, 26, 1818–1831. [Google Scholar] [CrossRef]
Askarzadeh, A. Solving electrical power system problems by harmony search: A review. Artif. Intell. Rev. 2017, 47, 217–251. [Google Scholar] [CrossRef]
Yi, J.; Lu, C.; Li, G. A literature review on latest developments of Harmony Search and its applications to intelligent manufacturing. Math. Biosci. Eng. 2019, 16, 2086–2117. [Google Scholar] [CrossRef]
Ala’a, A.; Alsewari, A.A.; Alamri, H.S.; Zamli, K.Z. Comprehensive Review of the Development of the Harmony Search Algorithm and Its Applications. IEEE Access 2019, 7, 14233–14245. [Google Scholar]
Alia, M.; Mandava, R. The variants of the harmony search algorithm: An overview. Artif. Intell. Rev. 2011, 36, 49–68. [Google Scholar] [CrossRef]
Gao, X.Z.; Govindasamy, V.; Xu, H.; Wang, X.; Zenger, K. Harmony Search Method: Theory and Applications. Comput. Intell. Neurosci. 2015, 2015, 258491. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Beyer, H.-G. On the dynamics of EAs without selection. In Foundations of Genetic Algorithms; Banzhaf, W., Reeves, C., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1999; Volume 5, pp. 5–26. [Google Scholar]
Das, S.; Mukhopadhyay, A.; Roy, A.; Abraham, A. Exploratory Power of the Harmony Search Algorithm: Analysis and Im-provements for Global Numerical Optimization. IEEE Trans. Sys. Man Cybern. Part B Cybern. 2001, 41, 89–106. [Google Scholar] [CrossRef]
Wu, C.F.J. On the convergence properties of the EM algorithm. Ann. Stat. 1983, 11, 95–103. [Google Scholar] [CrossRef]
Bull, A.D. Convergence Rates of Efficient Global Optimization Algorithms. J. Mach. Learn. Res. 2011, 12, 2879–2904. [Google Scholar]
Trelea, I.C. The particle swarm optimization algorithm: Convergence analysis and parameter selection. Inf. Process. Lett. 2003, 85, 317–325. [Google Scholar] [CrossRef]
Zhang, X.; Zheng, X.; Cheng, R.; Qiu, J.; Jin, Y. A competitive mechanism based multi-objective particle swarm optimizer with fast convergence. Inf. Sci. 2018, 427, 63–76. [Google Scholar] [CrossRef]
Facchinei, F.; Júdice, J.; Soares, J. Generating Box-Constrained Optimization Problems. ACM Trans. Math. Softw. 1997, 23, 443–447. [Google Scholar] [CrossRef]
Geem, Z.W. Novel derivative of harmony search algorithm for discrete design variables. Appl. Math. Comp. 2008, 199, 223–230. [Google Scholar] [CrossRef]
Zhang, T.; Geem, Z.W. Review of Harmony Search with Respect to Algorithm Structure. Swarm Evol. Comput. 2019, 48, 31–43. [Google Scholar] [CrossRef]
Almeida, F.; Giménez, D.; López-Espín, J.J.; Pérez-Pérez, M. Parameterized Schemes of Metaheuristics: Basic Ideas and Applications with Genetic Algorithms, Scatter Search, and GRASP. IEEE Trans. Syst. Man Cybern. Syst. 2013, 43, 570–586. [Google Scholar] [CrossRef]

Figure 1. Elements of the candidate set and the subintervals.

Figure 2. Candidate solutions and the worst element of one-variable functions.

Figure 3. Convergence process when

x_{}^{w o r s t}

is located at the left end side.

Figure 3. Convergence process when

x_{}^{w o r s t}

is located at the left end side.

Figure 4. Convergence process in Example 1.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yoon, J.H.; Geem, Z.W. Empirical Convergence Theory of Harmony Search Algorithm for Box-Constrained Discrete Optimization of Convex Function. Mathematics 2021, 9, 545. https://doi.org/10.3390/math9050545

AMA Style

Yoon JH, Geem ZW. Empirical Convergence Theory of Harmony Search Algorithm for Box-Constrained Discrete Optimization of Convex Function. Mathematics. 2021; 9(5):545. https://doi.org/10.3390/math9050545

Chicago/Turabian Style

Yoon, Jin Hee, and Zong Woo Geem. 2021. "Empirical Convergence Theory of Harmony Search Algorithm for Box-Constrained Discrete Optimization of Convex Function" Mathematics 9, no. 5: 545. https://doi.org/10.3390/math9050545

APA Style

Yoon, J. H., & Geem, Z. W. (2021). Empirical Convergence Theory of Harmony Search Algorithm for Box-Constrained Discrete Optimization of Convex Function. Mathematics, 9(5), 545. https://doi.org/10.3390/math9050545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Empirical Convergence Theory of Harmony Search Algorithm for Box-Constrained Discrete Optimization of Convex Function

Abstract

1. Introduction

2. Harmony Search Algorithm

2.1. Basic Structure of Harmony Search

2.2. Solution Formula for Harmony Search without Pitch Adjustment

2.3. Solution Formula for Harmony Search with Pitch Adjustment

3. Empirical Convergence of Harmony Search

3.1. One Variable Case

3.2. Multiple Variable Case

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI