1. Introduction
The task of locating the global minimum of a function
f can be defined as:
with
S:
This task finds application in a variety of realworld problems, such as problems from physics [
1,
2,
3], chemistry [
4,
5,
6], economics [
7,
8], medicine [
9,
10], etc. The methods aimed at finding the global minimum are divided into two major categories: deterministic methods and stochastic methods. The mostfrequently encountered techniques of the first category are interval techniques [
11,
12], which partition the initial domain of the objective function until a promising subset is found to find the global minimum. The second category includes the vast majority of methods, and in its ranks, one can find methods such as controlled random search methods [
13,
14,
15], simulated annealing methods [
16,
17,
18], differential evolution methods [
19,
20], particle swarm optimization (PSO) methods [
21,
22,
23], ant colony optimization methods [
24,
25], etc. Furthermore, a variety of hybrid techniques have been proposed, such as hybrid Multistart methods [
26,
27], hybrid PSO techniques [
28,
29,
30], etc. Also, many parallel optimization methods [
31,
32] have appeared during the past few years or methods that take advantage of the modern graphics processing units (GPUs) [
33,
34].
One of the basic techniques included in the area of stochastic techniques is genetic algorithms, initially proposed by John Holland [
35]. The operation of genetic algorithms is inspired by biology, and for this reason, they utilize the idea of evolution through genetic mutation, natural selection, and crossover [
36,
37,
38].
Genetic algorithms can be combined with machine learning to solve complex problems and optimize models. More specifically, the genetic algorithm has been applied in many machine learning applications, such as in the article by Ansari et al., which deals with the recognition of digital modulation signals. In this article, the genetic algorithm is used to optimize machine learning models by adjusting their features and parameters to achieve better signal recognition accuracy [
39]. Additionally, in the study by Ji et al., a methodology is proposed that uses machine learning models to predict amplitude deviation in hot rolling, while genetic algorithms are employed to optimize the machine learning models and select features to improve prediction accuracy [
40]. Furthermore, in the article by Santana, Alonso, and Nieto, which focuses on the design and optimization of 5G networks in indoor environments, the use of genetic algorithms and machine learning models is identified for estimating path loss, which is critical for determining signal strength and coverage indoors [
41].
Another interesting article is by Liu et al., which discuss the use of genetic algorithms in robotics [
42]. The authors propose a methodology that utilizes genetic algorithms to optimize the trajectory and motion of digital twin robots. A similar study was presented by Nonoyama et al. [
43], where the research focused on optimizing energy consumption during the motion planning of a dualarm industrial robot. The goal of the research is to minimize energy consumption during the process of object retrieval and placement. To achieve this, both genetic algorithms and particle swarm optimization algorithms are used to adjust the robot’s motion trajectory, thereby increasing its energy efficiency.
The use of genetic algorithms is still prevalent even in the business world. In the article by Liu et al. [
44], the application of genetic algorithms in an effort to optimize energy conservation in a highspeed methanol spark ignition engine fueled with methanol and gasoline blends is discussed. In this study, genetic algorithms are used as an optimization technique to find the best operating conditions for the engine, such as the air–fuel ratio, ignition timing, and other engine control variables, aiming to save energy and reduce energy consumption and emissions. In another research, the optimization of the placement of electric vehicle charging stations is carried out [
45]. Furthermore, in the study by Chen and Hu [
46], the design of an intelligent system for agricultural greenhouses using genetic algorithms is presented to provide multiple energy sources. Similarly, in the research by Min, Song, Chen, Wang, and Zhang [
47], an optimized energymanagement strategy for hybrid electric vehicles is introduced using a genetic algorithm based on fuel cells in a neural network under startup conditions.
Moreover, genetic algorithms are extremely useful in the field of medicine, as they are employed in therapy optimization, medical personnel training, genetic diagnosis, and genomic research. More specifically, in the study by Doewes, Nair, and Sharma [
48], data from blood analyses and other biological samples are used to extract characteristics related to the presence of the SARSCoV2 virus that causes COVID19. In this article, genetic algorithms are used for data analysis and processing to extract significant characteristics that can aid in the effective diagnosis of COVID19. Additionally, there are studies that present the design of dental implants for patients using artificial neural networks and genetic algorithms [
49,
50]. Lastly, the contribution of genetic algorithms is significant in both implant techniques [
51,
52] and surgeries [
53,
54].
The current work aims to improve the efficiency of the genetic algorithm in global optimization problems, by introducing a new way of initializing the population’s chromosomes. In the new initialization technique, the kmeans [
55] method is used to find initial values of the chromosomes that will lead to finding the global minimum faster and more efficient than chromosomes generated by some random distribution. Also, the proposed technique discards chromosomes, which, after applying the kmeans technique, are close to each other.
During the past few years, many researchers have proposed variations for the initialization of genetic algorithms, such as the work of Maaranen et al. [
56], where they discuss the usage of quasirandom sequences in the initial population of a genetic algorithm. Similarly, Paul et al. [
57] propose initializing the population of genetic algorithms using a varibegin and varidiversity (VV) population seeding technique. Also, in the same direction of research, Li et al. propose [
58] a knowledgebased technique to initialize genetic algorithms used mainly in discrete problems. Recently, Hassanat et al. [
59] suggested the incorporation of regression techniques for the initialization of genetic algorithms.
The rest of this article is organized as follows: in
Section 2, the proposed method is discussed in detail; in
Section 3, the test functions used as well the experimental results are fully outlined, and finally, in
Section 4, some conclusions and future guidelines are listed.
2. The Proposed Method
The fundamental operation of a genetic algorithm mimics the process of natural evolution. The algorithm begins by creating an initial population of solutions, called chromosomes, which represents a potential solution to the objective problem. The genetic algorithm operates by reproducing and evolving populations of solutions through iterative steps. Following the analogy to natural evolution, the genetic algorithm allows optimal solutions to “evolve” through successive generations. The main steps of the used genetic algorithm are described below:
Initialization step:
 (a)
Set ${N}_{c}$ as the number of chromosomes.
 (b)
Set ${N}_{g}$ as the maximum number of allowed generations.
 (c)
Initialize randomly the
${N}_{c}$ chromosomes in
S. In most implementations of genetic algorithms, the chromosomes will be selected using some random number distribution. In the present work, the chromosomes will be selected using the sampling technique described in
Section 2.3.
 (d)
Set ${p}_{s}$ as the selection rate of the algorithm, with ${p}_{s}\le 1$.
 (e)
Set ${p}_{m}$ as the mutation rate, with ${p}_{m}\le 1$.
 (f)
Set iter = 0.
For every chromosome ${g}_{i},\phantom{\rule{4pt}{0ex}}i=1,\dots ,{N}_{c}$: Calculate the fitness ${f}_{i}=f\left({g}_{i}\right)$ of chromosome ${g}_{i}$.
Genetic operations step:
 (a)
Selection procedure: The chromosomes are sorted according to their fitness values. Denote ${N}_{b}$ as the integer part of $\left(1{p}_{s}\right)\times {N}_{c}$; chromosomes with the lowest fitness values are transferred intact to the next generation. The remaining chromosomes are substituted by offspring created in the crossover procedure. During the selection process, for each offspring, two parents are selected from the population using tournament selection.
 (b)
Crossover procedure: For every pair
$(z,w)$ of selected parents, two additional chromosomes
$\tilde{z}$ and
$\tilde{w}$ are produced using the following equations:
where
$i=1,\dots ,n$. The values
${a}_{i}$ are uniformly distributed random numbers, with
${a}_{i}\in [0.5,\phantom{\rule{3.33333pt}{0ex}}1.5]$ [
60].
 (c)
Replacement procedure:
 i.
For $i={N}_{b}+1$ to ${N}_{c}$, do:
 ii.
EndFor:
 (d)
Mutation procedure:
 i.
For every chromosome ${g}_{i},\phantom{\rule{4pt}{0ex}}i=1,\dots ,{N}_{c}$, do:
For each element $\phantom{\rule{4pt}{0ex}}j=1,\dots ,n$ of ${g}_{i}$, a uniformly distributed random number $r\in \left[0,1\right]$ is drawn. The element is altered randomly if $r\le {p}_{m}$.
 ii.
EndFor
Termination check step:
 (a)
Set $iter=iter+1$.
 (b)
If $iter\ge {N}_{g}$ or the proposed stopping rule of Tsoulos [
61] holds, then goto the local search step, else goto Step 2.
Local search step: Apply a local search procedure to the chromosome of the population with the lowest fitness value, and report the obtained minimum. In the current work, the BFGS variant of Powell [
62] was used as a local search procedure.
The current work proposes a novel method to initiate the chromosomes that utilizes the wellknown technique of kmeans. The significance of the initial distribution in the solution finding within optimization is essential across various domains and techniques. Apart from genetic algorithms, the initial distribution impacts other optimization methods like particle swarm optimization (PSO) [
21], evolution strategies [
63], and neural networks [
64]. The initial distribution defines the starting solutions that will evolve and improve throughout the algorithm. If the initial population contains solutions close to the optimum, it increases the likelihood of evolved solutions being in proximity to the optimal solution. Conversely, if the initial population is distant from the optimum, the algorithm might need more iterations to reach the optimal solution or even get stuck in a suboptimal solution. In conclusion, the initial distribution influences the stability, convergence speed, and quality of optimization algorithm outcomes. Thus, selecting a suitable initial distribution is crucial for the algorithm’s efficiency and the discovery of the optimal solution in a reasonable time [
65,
66].
2.1. Proposed Initialization Distribution
The present work replaces the randomness of the initialization of the chromosomes by using the kmeans technique. More specifically, the method takes a series of samples from the objective function, and then, the kmeans method is used to locate the centers of these points. These centers can then be used as chromosomes in the genetic algorithm.
The kmeans algorithm emerged in 1957 by Stuart Lloyd in the form of Lloyd’s algorithm [
67], although the concept of clustering based on distance had been introduced earlier. The name “kmeans” was introduced around 1967 by James MacQueen [
68]. The kmeans algorithm is a clustering algorithm widely used in data analysis and machine learning. Its primary objective is to partition a dataset into k clusters, where data points within the same cluster are similar to each other and differ from data points in other clusters. Specifically, kmeans seeks cluster centers and assigns samples to each cluster, aiming to minimize the distance within clusters and maximize the distance between cluster centers [
69]. The algorithm steps are presented in Algorithm 1.
The algorithm terminates when there is no change in cluster centers between consecutive iterations, implying that the clusters have stabilized in their final form [
70,
71].
2.2. Chromosome Rejection Rule
An additional technique for discarding chromosomes where they are similar or close to each other is listed and applied below. Specifically, each chromosome is extensively compared to all the other chromosomes, and those that have a very small or negligible Euclidean distance between them are sought, implying their similarity. Subsequently, the algorithm incorporates these chromosomes into the final initial distribution table, while chromosomes that are not similar are discarded.
2.3. The Proposed Sampling Procedure
The proposed sampling procedure has the following major steps:
Take ${N}_{m}$ random samples from the objective function using a uniform distribution.
Calculate the k centers of the ${N}_{m}$ points using the kmeans algorithm provided in Algorithm 1.
Remove from the set of centers C points that are close to each other.
Return the set of centers C as the set of chromosomes.
Algorithm 1 The kmeans algorithm. 
Set the number of clusters k. The input of the algorithm is the ${N}_{m}$ initial points ${x}_{i},\phantom{\rule{4pt}{0ex}}i=1,\dots ,{N}_{m}$. For the current algorithm, the points ${x}_{i}$ are randomly selected samples in S. For every point ${x}_{i},\phantom{\rule{4pt}{0ex}}i=1,\dots ,{N}_{m}$, do assign randomly the point ${x}_{i}$ in a cluster ${S}_{j}$. For every center ${c}_{j},\phantom{\rule{4pt}{0ex}}j=1,\dots ,k$, do:  (a)
Set ${M}_{j}$ as the number of points in ${S}_{j}$.  (b)
EndFor. Repeat.  (a)
Set ${S}_{j}=\left\{\right\},\phantom{\rule{4pt}{0ex}}j=1,\dots ,k$.  (b)
For every point ${x}_{i},\phantom{\rule{4pt}{0ex}}i=1,\dots ,{N}_{m}$, do: i. Set ${j}^{*}={\mathrm{argmin}}_{m=1}^{k}\left\{D\left({x}_{i},{c}_{m}\right)\right\}$, where $D(x,y)$ is the Euclidean distance of $(x,y)$. ii. Set ${S}_{{j}^{*}}={S}_{{j}^{*}}\cup \left\{{x}_{i}\right\}$.  (c)
EndFor:  (d)
For every center ${c}_{j},\phantom{\rule{4pt}{0ex}}j=1,\dots ,k$, do: i. Set ${M}_{j}$ as the number of points in ${S}_{j}$  (e)
EndFor:
Stop the algorithm, if there is no change in centers ${c}_{j}$.
