Next Article in Journal
Facial Expression Recognition from Multi-Perspective Visual Inputs and Soft Voting
Next Article in Special Issue
Capped Linex Metric Twin Support Vector Machine for Robust Classification
Previous Article in Journal
Collaborative Robotic Wire + Arc Additive Manufacture and Sensor-Enabled In-Process Ultrasonic Non-Destructive Evaluation
Previous Article in Special Issue
A Novel Binary Hybrid PSO-EO Algorithm for Cryptanalysis of Internal State of RC4 Cipher
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Multi-Swarm Algorithm for Extreme Learning Machine Optimization

Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11010 Belgrade, Serbia
Romanian Institute of Science and Technology, 400022 Cluj-Napoca, Romania
College of Academic Studies “Dositej”, Bulevar Vojvode Putnika 7, 11000 Belgrade, Serbia
Author to whom correspondence should be addressed.
Sensors 2022, 22(11), 4204;
Submission received: 10 April 2022 / Revised: 19 May 2022 / Accepted: 26 May 2022 / Published: 31 May 2022


There are many machine learning approaches available and commonly used today, however, the extreme learning machine is appraised as one of the fastest and, additionally, relatively efficient models. Its main benefit is that it is very fast, which makes it suitable for integration within products that require models taking rapid decisions. Nevertheless, despite their large potential, they have not yet been exploited enough, according to the recent literature. Extreme learning machines still face several challenges that need to be addressed. The most significant downside is that the performance of the model heavily depends on the allocated weights and biases within the hidden layer. Finding its appropriate values for practical tasks represents an NP-hard continuous optimization challenge. Research proposed in this study focuses on determining optimal or near optimal weights and biases in the hidden layer for specific tasks. To address this task, a multi-swarm hybrid optimization approach has been proposed, based on three swarm intelligence meta-heuristics, namely the artificial bee colony, the firefly algorithm and the sine–cosine algorithm. The proposed method has been thoroughly validated on seven well-known classification benchmark datasets, and obtained results are compared to other already existing similar cutting-edge approaches from the recent literature. The simulation results point out that the suggested multi-swarm technique is capable to obtain better generalization performance than the rest of the approaches included in the comparative analysis in terms of accuracy, precision, recall, and f1-score indicators. Moreover, to prove that combining two algorithms is not as effective as joining three approaches, additional hybrids generated by pairing, each, two methods employed in the proposed multi-swarm approach, were also implemented and validated against four challenging datasets. The findings from these experiments also prove superior performance of the proposed multi-swarm algorithm. Sample code from devised ELM tuning framework is available on the GitHub.

1. Introduction

Extreme machine learning (ELM) represents one of the recent and promising approaches that can be applied to the single hidden layer feed-forward artificial neural networks (SLFN). This approach was initially proposed in [1], and it introduced the concept that the input weight and bias values in the hidden layer are allocated in a random fashion, while the output weight values are computed by utilizing the Moore–Penrose (MP) pseudo inverse [2]. ELMs have shown excellent generalization capabilities [3], and they are known to be very fast and efficient due to the fact that they do not require traditional training, which is one of the most time-consuming tasks when dealing with other types of neural networks. By different training, we mean that ELM models learn without tuning hidden parameters in several iterations, and the only parameter that needs to be determined is the weight between the hidden layer and the output layer, using MP, as mentioned above.
ELMs require an adequate number of neurons in the hidden layer in order to obtain good performance and fast convergence. The difference between ELMs and other traditional machine learning (ML) models that typically utilize the gradient-descent-based algorithms is that ELMs use randomly allocated input weight and bias values that do not alter during the learning process. This approach prevents some of the issues that commonly accompany the gradient-descent-methods, such as iterative tuning of the weight and bias values, lingering in the local minimums, and slowing converging speed. Nevertheless, the appropriate number of neurons that make the hidden layer still remains one of the open questions that ELMs face.
Several enhanced ELM variants were proposed subsequently, most of which deal with the appropriate number of hidden neurons. The authors in [4] have introduced the pruned extreme learning machine (P-ELM) and used it for classifying patterns. P-ELM starts with a large number of neurons in a hidden layer, and utilizes statistical methods for determining the relevance of the neurons based on the class labels. The neurons that have been determined to be irrelevant are removed from the network, thus narrowing down the total number of neurons. Evolutionary ELM (E-ELM) that was proposed by [5] optimizes the input weight and bias values by applying the differential evolution method, and calculates the outputs with MP general inverse. The enhanced variant of the E-ELM suggested by [6], i.e., the self-adaptive evolutionary ELM (SaE-ELM), uses a self-adaptive differential evolution algorithm for hidden parameters’ optimization, and determines the output weight values analytically. The optimally-pruned ELM approach (OP-ELM), developed by [7], tackles the problem of a large number of hidden neurons by introducing neuron ranking. OP-ELM also begins with a large number of neurons as standard ELM approach, and narrows it down by utilizing the multi-response sparse regression algorithm (MRSR) for calculating ranks of neurons, and leave-one-out (LOO) validating technique for determining the optimal number of neural cells.
Another approach named incremental ELM (I-ELM) was suggested by [8] and proposed adding neurons one at the time to the hidden layer and calculating the training rate after every single added cell. The process halts either when the maximal amount of neurons is met, or when the training rate starts decreasing. More recent research published by [2] proposed two swarm intelligence meta-heuristics to optimize the ELM, namely ELM-ABC (using the artificial bee colony meta-heuristics) and ELM-IWO (based on the invasive weed optimization method). Swarm meta-heuristics are used for tuning the input weight and bias parameters, while the ELM calculates the output weight values in the standard, analytical way.
As previously stated, the ELM performance mostly depends on the number of neurons in the hidden layer and the initialized weights between input features and each neuron in the hidden layer. The extensive literature survey showed that meta-heuristics-based approaches to ELM optimization are scarce and insufficiently exploited in this domain, despite the very promising results obtained in other ML domains. This paper proposes weights and biases optimization in the ELM model by a multi-swarm algorithm. A straightforward heuristic is used to determine the promising number of neurons in the hidden layer.
The proposed approach utilizes a novel multi-swarm hybrid algorithm that combines three well-known swarm intelligence meta-heuristics, namely the artificial bee colony algorithm (ABC), the firefly algorithm (FA), and the sine–cosine algorithm (SCA). The algorithm is a high- and low-level combination of hybridized algorithms that exploits their strengths and overcomes deficiencies of each individual one. The main motivation behind this research lies in the fact that ELMs are efficient, fast, they do not require training, and, on the other hand, they have not been exploited sufficiently, especially in combination with meta-heuristics. The obtained results are comparable with the results of other ML methods that require training and significantly more time for execution. Inspired by the experiments given in [2], the proposed method has been tested on seven benchmark datasets in order to provide a fair comparison of the results.
Moreover, to prove that combining two algorithms is not as effective as joining three approaches, in an additional set of experiments, hybrids generated by pairing each two methods employed in the proposed multi-swarm approach, were also implemented and validated against four imbalanced challenging datasets.
The rest of the manuscript is structured in the following way. Section 2 provides a literature survey on ELM and swarm intelligence meta-heuristics. The description of the proposed multi-swarm approach is provided in Section 3. Section 4 describes the conducted experiments and exhibits the simulation findings on seven datasets together with the comparative analysis with similar approaches. Lastly, Section 5 delivers final observations, proposes future directions in this area, and concludes the paper.

2. Background

This section first introduces the ELM as one of the ML models. After that, a brief survey of swarm intelligence meta-heuristics is provided, together with the most common applications. Finally, an overview of the ELM models optimized with swarm intelligence meta-heuristic algorithms is given.

2.1. Extreme Learning Machine

Extreme learning machine (ELM) was proposed by Huang et al. [1] for single-hidden layer feed-forward neural networks (SLFNs). The algorithm randomly chooses the input weights and analytically determines the output weights of SLFNs. After the input weights and the hidden layer biases are chosen arbitrarily, SLFNs can be simply considered as a linear system and the output weights of SLFNs can he analytically determined through a simple generalized inverse operation of the hidden layer output matrices. This algorithm provides a faster learning speed than traditional feed-forward network learning algorithms, while obtaining better generalization performance. Additionally, ELM tends to reach the smallest training error and the smallest norm of weights. The output weights are computed using Moore–Penrose (MP) generalized inverse [9]. As shown in [10], the learning speed of ELM can be thousands of times faster than conventional learning algorithms with better generalization performance than the gradient-based learning models. Unlike the traditional classic gradient-based learning algorithms that only work for differentiable activation functions, the ELM learning algorithm could be used to train SLFNs with many non-differentiable activation functions.
For a set of training samples { ( x j , t j ) } j = 1 N with N samples and m classes, the SLFN with L hidden nodes and activation function g ( x ) is expressed as in (1) [1], where w i = [ w i 1 , , w i n ] T is the input weight, b i is the bias of the ith hidden node, β i = [ β i 1 , , β i m ] T is the weight vector connecting the ith hidden node and the output nodes, w i · x j is inner product of w i and x j and t j is network output with respect to input x j .
i = 1 L β i g ( w j · x j + b i ) = t j , j = 1 , 2 , , N
The Equation (1) can be written as:
H β = T
H = g ( w 1 · x 1 + b 1 ) g ( w L · x 1 + b L ) g ( w 1 · x N + b 1 ) g ( w L · x N + b L ) N x L , β = β 1 T β L T L x m , T = t 1 T t N T N x m
In this equation, H is the hidden layer output matrix of the neural network as explained in [11], while β is the output weight matrix.
The ELM is successfully used in solving many practical problems, such as text categorization [12], face recognition [13], image classification [14], different medical diagnostics [15,16], and so on. Over time, researchers have presented various improvements for the original ELM. In [4], authors propose a pruned ELM algorithm as a systematic and automated approach for designing an ELM classifier network. By considering the relevance of the hidden nodes to the class labels, the algorithm removes the irrelevant nodes from the initial large set of hidden nodes. Zhu et al. in their paper [5], presented an improved ELM, which uses the differential evolutionary algorithm to tune the input weights and MP generalized inverse. Experimental results show that this approach provides good generalization performance with more compact networks. Adopting this idea, in [17], the authors introduced a new kind of evolutionary algorithm based on PSO which, using the concepts of ELM, can train the network more suitably for some prediction problems. In order to deal with data with imbalanced class distribution, in [18], a weighted ELM is proposed which is able to generalize to balanced data. Recently, Alshamiri et al. presented in [2] a model for tuning ELM by using two SI based techniques—ABC and Invasive Weed Optimization (IWO) [19]. In this approach, the input weights and hidden biases are selected using ABC and IWO and the output weights are computed using the MP generalized inverse.

2.2. Swarm Intelligence

Swarm intelligence (SI) is an artificial intelligence approach which is inspired by the natural behavior to solve optimization problems [20]. Over time, many different SI algorithms were developed, including ant colony optimization (ACO) [21], particle swarm optimization (PSO) [22], artificial bee colony (ABC) [23], the firefly algorithm (FA) [24], cuckoo search (CS) [25], the bat algorithm (BA) [26], the whale optimization algorithm (WOA) [27], elephant herding optimization (EHO) [28], and many others [29,30,31]. More recent, but successful, approaches include monarch butterfly optimization (MBO) [32], slime mould algorithm (SMA) [33], moth search algorithm (MSA) [34], hunger games search (HGS) [35], colony predation algorithm (CPA) [36]. Still, we do not claim the list above is exhaustive.
The algorithms from this group have been used in a wide spectrum of different challenges with NP-hardness from the computer science field. These applications include the problem of global numerical optimization [37], scheduling of tasks in the cloud-edge environments [38,39,40], health care systems and pollution prediction [41], the problems of wireless sensors networks including localization and lifetime maximization [42,43,44], artificial neural networks optimization [45,46,47,48,49,50,51,52,53,54,55,56,57], feature selection in general [58,59], text document clustering [48], cryptocurrency values prediction [60], computer-aided medical diagnostics [61,62,63,64], and, finally, the ongoing COVID-19 pandemic related applications [65,66,67].

2.3. ELM Tuning by Swarm Intelligence Meta-Heuristics

An extensive literature survey indicates that swarm intelligence meta-heuristics have not been sufficiently exploited for the optimization of the ELM. In addition to the already mentioned paper [2] that inspired this research, just a few approaches that combine ELM and meta-heuristics were published in the past several years. Research published in [68] proposed a hybrid PSO-ELM model, and used it for flash flood prediction. The algorithm was tested on the geospatial database of a typhoon area, and compared to traditional ML models. The obtained results have shown that the PSO-ELM model was superior to other ML models. ELM optimized by PSO was also used in [69], where the authors used ELM to derive hydropower reservoir operation rules. The proposed method was named class-based evolutionary extreme learning machine (CEELM), and it combined k-means clustering that was used to separate the influential factors into clusters with more simple pattern, followed by the application of the ELM optimized by PSO for identifying the complex input–output relationships for every cluster. According to the authors, CEELM showed excellent generalization capabilities.
Faris et al. [70] discussed the application of the salp swarm algorithm (SSA) for optimizing ELM and improving the accuracy. The proposed approach was tested against ten benchmark datasets and compared to other popular training techniques. They concluded that ELM hybridized with SSA outperforms other approaches in achieved accuracy, and obtained satisfactory prediction stability. Finally, improved bacterial foraging optimization algorithm (BFO) has been proposed in [71] and applied for the ELM optimizing task. The obtained results once again indicated that it is possible to achieve similar or even better performances than other ML methods, in a reduced amount of time.

3. Proposed Hybrid Meta-Heuristics

This section first introduces the basic implementations of the three algorithms used for the proposed research, namely ABC, FA, and SCA. Since each algorithm has specific deficiencies, a novel multi-swarm algorithm has been proposed, that combines the strengths of the individual algorithms and overcomes their individual flaws, by creating synergy and achieving a complementary effect.

3.1. Original Algorithms

3.1.1. The Original ABC Algorithm

The artificial bee colony (ABC) algorithm was designed for continuous optimization problems and it was inspired by the foraging behavior of honey bees [23,72]. ABC uses three control parameters and utilizes three classes of artificial bees: employed bees, onlookers, and scouts. Employed bees make half of a colony. In this model, food source represents the possible problem solution. There is only one employed bee per each food source. The employed bee performs the search process by examining the solution’s neighborhood. The onlooker chooses a food source for exploitation based on the information which they gain from employed bees. If a food source does not improve for a predetermined number of cycles, the scouts replace that food source with a new one which is chosen randomly. The limit parameter controls this process [73].
The ABC algorithm, as an iterative algorithm, starts by associating each employed bee with a randomly generated food source. Each bee x i ( i = 1 , 2 , , N ) is a D-dimensional vector, where N denotes the size of the population. The initial population of candidate solutions is created using the following expression (4), where x i , j is the j-the parameter of th i t h bee in the population, r a n d ( 0 , 1 ) is a random real number between 0 and 1, and u b j and l b j are upper and lower bounds of the j t h parameter, respectively. Naturally, x represents a different element than the training samples from Equation (1).
x i , j = l b j + r a n d ( 0 , 1 ) ( u b j l b j ) ,
There are many formulations of the fitness function, but in most implementations, for maximization problems, fitness is simply proportional to the value of objective function. In case the problem to be solved targets the minimization of a function denoted here by o b j F u n , the task is converted for maximization using a modification, such as in (5).
f i t n e s s i = 1 o b j F u n i , i f o b j F u n i > 0 1 + | o b j F u n i | , otherwise
Each employed bee discovers a food source in its neighborhood and evaluates its fitness. The discovery of a new neighborhood solution is simulated with the expression (6), where x i , j is j t h parameter of the old solution i, x k , j is j t h parameter of a neighbor solution k, ϕ is a random number between 0 and 1, and M R is modification rate. M R is a control parameter of ABC algorithm.
v i , j = x i , j + ϕ ( x i , j x k , j ) , R j < M R x i , j , otherwise
If the fitness of the new solution is higher than the fitness of the old one, the employed bee continues the exploitation process with the new food source, otherwise it retains the old one. Employed bees share information about the fitness of a food source with onlookers, and onlookers select a food source i with a probability that is proportional to the solution’s fitness:
p i = f i t n e s s i i = 1 N f i t n e s s i

3.1.2. The Original Firefly Algorithm

The Firefly algorithm was introduced by Yang [24]. The proposed model uses brightness and attractiveness of fireflies. Brightness is determined by the objective function value, while attractiveness depend on the brightness. This is expressed with Equation (8) [24], where I ( x ) represents attractiveness and f ( x ) denotes the value of objective function at location x. Again, it is noted that the x in the current subsection should not be mistaken for the representations in the previous subsections.
I ( x ) = 1 f ( x ) , if f ( x ) > 0 1 + f ( x ) , otherwise
The attractiveness of the firefly decreases, as the distance from the light source increases [24]:
I ( r ) = I 0 1 + γ r 2
where I ( r ) represents light intensity at distance r, and I 0 stands for the light intensity at the source. In order to model a real nature system, where the light is partially absorbed by its surroundings, the FA uses the γ parameter, which represents the light absorption coefficient. The combined effect of the inverse square law for distance and the γ coefficient is approximated with the following Gaussian form [24]:
I ( r ) = I 0 · e γ r 2
Moreover, each firefly individual utilizes attractiveness β , which is directly proportional to the light intensity of a given firefly and also depends on the distance, as shown in Equation (11).
β ( r ) = β 0 · e γ r 2
where parameter β 0 designates attractiveness at distance r = 0 . It should be noted that, in practice, Equation (11) is often replaced by Equation (12) [24]:
β ( r ) = β 0 1 + γ r 2
Based on the above, the basic FA search equation for a random individual i, which moves in iteration t + 1 to a new location x i towards individual j with greater fitness, is given as [24]:
x i t + 1 = x i t + β 0 · e γ r i , j 2 ( x j t x i t ) + α t ( κ 0.5 )
where α stands for the randomization parameter, the random number drawn from Gaussian or a uniform distribution is denoted as κ , and r i , j represents the distance between two observed fireflies i and j. Typical values that establish satisfying results for most problems for β 0 and α are 1 and [ 0 , 1 ] , respectively.
The r i , j is the Cartesian distance, which is calculated by using Equation (14).
r i , j = | | x i x j | | = k = 1 D ( x i , k x j , k ) 2
where D marks the number specific problem parameters.

3.1.3. The Original SCA Method

The sine–cosine algorithm (SCA) proposed in Mirjalili [74] is based on mathematical model of the sine and cosine trigonometric functions. The solutions’ positions in the population are updated based on the sine and cosine functions outputs which makes them oscillate around the best solution. The return values of these functions are between −1 and +1, which is the mechanism that keeps the solutions fluctuating. An algorithm starts with generating a set of random candidate solutions within the boundaries of the search space in the initialization phase. Exploration and exploitation are controlled differently throughout the execution by random adaptive variables.
The solutions’ position update process is performed in each iteration by using Equations (15) and (16), where X i t and X i t + 1 is the current solution’s position in the i-th dimension at t-th and i + 1 -th iteration, respectively, r 1 3 are pseudo-randomly generated numbers, the P i * denotes the destination point’s position (current best approximation of an optimum) in the i-th dimension, while symbol | | represents the absolute value. The same notations as in the original manuscript where the method was initially proposed [74] are used in this manuscript.
X i t + 1 = X i t + r 1 · s i n ( r 2 ) · | r 3 · P i * t X i t |
X i t + 1 = X i t + r 1 · c o s ( r 2 ) · | r 3 · P i * t X i t |
These two equations are used in combination by using control parameter r 4 :
X i t + 1 = X i t + 1 = X i t + r 1 · s i n ( r 2 ) · | r 3 · P i * t X i t | , r 4 < 0.5 X i t + 1 = X i t + r 1 · c o s ( r 2 ) · | r 3 · P i * t X i t | , r 4 0.5 ,
where r 4 represents a randomly generated number between 0 and 1.
It is noted that, for every component of each solution in the population, new values for pseudo-random parameters r 1 4 are generated.
The algorithm’s search process is controled by four random parameters and they influence the current and the best solution’s positions. In order to converge towards the global optima, the balance between solutions is required. This is achieved by changing the range of the based functions in an ad-hoc manner. Exploitation is guaranteed by the fact that sine and cosine functions exhibit cyclic patterns which allow for reposition around the solution. Changes in ranges of sine and cosine functions allow the algorithm to search outside of their corresponding destinations. Furthermore, the solution requires its position not to overlap with the areas of other solutions.
For better quality of randomness, the values for parameter r 2 are generated within the range [ 0 , 2 Π ] and that guarantees exploration. The controls of the balance between diversification and exploitation are shown with Equation (18).
r 1 = a t a T ,
where t is the current iteration, T represents the maximum number of iterations in a run, while a is a constant.

3.2. Proposed Multi-Swarm Meta-Heuristics Algorithm

3.2.1. Motivation and Preliminaries

The effectiveness of meta-heuristics in optimization process largely depends on efficiency and balance between exploitation and exploration, that direct the search towards optimum (sub-optimum) solutions. Additionally, according to the no free lunch theorem (NFL), optimizer without flaws does not exist, nor there is one which can render satisfying solutions for all kinds of NP-hard challenges. Therefore, every meta-heuristics suffers from some deficiencies and also each one has distinctive advantages over others.
One of promising techniques that can be used to combine different meta-heuristics is hybridization [75,76]. If the right meta-heuristics are chosen as components of hybrid method, strengths of one approach compensates weaknesses of the other, and vice-versa. Hybrid meta-heuristics are proven as efficient optimizers and they were validated against different problems [56,59,77,78,79].
In the modern literature, many taxonomies of hybrid meta-heuristics can be found, however on of the most widely adopted is the one provided by Talbi [76]. According to [76], by using the notion of hierarchical classification, hybrid algorithms can be differentiated between low-level (LLH) and high-level hybrids (HLH). In the case of LLH, search function of one method is replaced with one that belongs to other optimization method. Conversely, HHL approaches are self-contained [76].
Further, both LLH and HLH can be executed in relay or teamwork mode [76]. The first mode executes in a pipeline manner, where the output of first meta-heuristics is used as the input for the second, while in the case of teamwork mode, meta-heuristics evolve in parallel, cooperatively exploring and exploiting search space.
Approach which is developed for the purpose of this research represents combination of LLH and HLH and encompasses well-known ABC, FA, and SCA meta-heuristics. These three meta-heuristics are chosen due to its complementary weaknesses and strengths that makes them a promising candidates for hybridization.
Based on the previous findings, the ABC algorithm has efficient exploration mechanism which discards individuals that can not be improved in the predefined number of iterations, however it suffers from poor exploitation [73]. Conversely, both FA and SCA meta-heuristics exhibit above average intensification abilities, but they do not employ explicit exploration mechanism which leads to lower diversification capabilities [67,80]. Dynamic FA implementation controls exploitation–exploration balance by shrinking parameter α throughout iterations, while the SCA also uses dynamic parameter r 1 . However, if the initially generated population is far away from optimum, dynamic parameters would only perform exploration around current solutions (novel solutions from other regions of the search space will not be generated), and when termination condition is reached, in most cases local optimum solutions will be rendered. Additionally, regardless of good intensification of FA and SCA, the search can be further boosted by combination of its search expressions. This stems from the fact that FA and SCA employ different search equations—the FA uses the notion of distance between solution, while the SCA employs trigonometric functions.
Motivated by the facts provided above, proposed hybrid meta-heuristics first combines FA and SCA algorithms in a form of LLH with teamwork mode and afterwards such approach is hybridized with the ABC meta-heuristics, forming a HLH teamwork mode optimizer. Method which is proposed for the purpose of this research is therefore named multi-swarm-ABC-FA-SCA (MS-AFS).

3.2.2. Overview of MS-AFS

In addition to combining ABC, FA, and SCA meta-heuristics, proposed MS-AFS also employs the following mechanisms:
  • Chaotic and quasi-reflection-based learning (QRL) population initialization in order to establish boosting of the search by redirecting solutions towards more favorable parts of the domain;
  • Efficient learning mechanism between swarms with the goal of combining weakness and strengths of different approaches more efficiently.
The concept of employing chaotic maps in meta-heuristics methods was first proposed by Caponetto et al. in [81]. The stochastic essence of the majority of meta-heuristics methods relates on random number generators. Nevertheless, several recent studies suggest that the search procedure could be improved if it were grounded in chaotic sequences [82,83].
Numerous chaotic maps exist, including circle, Chebyshev, logistic, sine, sinusoidal, tent, and many others. Extensive simulations conducted for the purpose of current, as well as previous research [63] with all the above-mentioned maps yielded the conclusion that the best results can be obtained by applying the logistic map, that was selected for implementation.
To establish chaotic-based population initialization, pseudo-random number θ 0 is generated, as the seed for chaotic sequence θ created by the logistic mapping:
θ i + 1 = μ θ i × ( 1 θ i ) , i = 1 , 2 , , N 1 ,
where N denotes the population size, i is the sequence number, while μ is chaotic sequence control parameter. The μ was set to 4, as suggested in [84], while 0 < θ 0 < 1 and θ 0 0.25 , 0.5 , 0.75 , 1 .
Every parameter j of each solution i is mapped to rendered chaotic sequences by the following equation:
X i c = θ i X i ,
where X i c is new position of individual i after chaotic perturbations.
The QRL procedure was initially proposed in [85]. This approach implies the generation of the quasi-reflexive-opposite solutions following the logic that if the original individual is positioned at a large distance from the optimum, a decent chance exists that the opposite solution could be located much nearer to the optimum.
When utilizing the QRL procedure described above, the quasi-reflexive-opposite individual X q r of the solution X will be created by applying the following expression for every component j of solution X:
X q r = rnd L B + U B 2 , X ,
where rnd L B + U B 2 , X is used to generate an arbitrary number from the uniform distribution within L B + U B 2 , X , and L B and U B are lower and upper search boundaries, respectively. This strategy will be executed for each parameter of observed solution X in D dimensions.
Taking all into account, population initialization of proposed MS-AFS is summarized in Algorithm 1.
As it can be observed from Algorithm 1, the size of starting population P s t a r t is N / 2 individuals. In this way, the fitness function evaluations F F E s in the initialization phase are executed only N times and additional load, in terms of computational requirements, on the MS-AFS complexity is not imposed.
After initialization of population P by Algorithm 1, N / 2 worse solutions are chosen as the initial population ( P 1 ) for first swarm ( s 1 ), while remaining individuals ( P 2 ) are delegated to the second swarm ( s 2 ). The s 2 is created by establishing LLH with teamwork mode between FA and SCA algorithms, while the s 1 is executed only by the ABC meta-heuristics. Due to the fact that the ABC exhibits better exploration abilities and that it has more chance to hit the favorable regions of the search domain, worse N / 2 individuals are chosen as initial population for the s 1 .
Algorithm 1 Pseudo-code for chaotic and QRL population initialization
  • Step 1: Generate starting population P s t a r t of N / 2 solutions with standard initialization expression: X i = L B + ( U B L B ) · r a n d ( 0 , 1 ) , i = 1 , , N / 2 , where r a n d ( 0 , 1 ) represents pseudo-random number drawn from the range [ 0 , 1 ] .
  • Step 2: Randomly select 2 subsets of N / 4 from P s t a r t for chaotic and QRL initialization, denoted as P c and P q r l , respectively.
  • Step 3: Extend P c by applying chaotic sequences to each individual in P c using expressions (19) and (20). The size of P c after extension is N / 2 .
  • Step 4: Extend P q r l by applying QRL mechanism to each individual in P q r l using expression (21). The size of P q r l after extension is N / 2 .
  • Step 5: Calculate fitness of all individuals from P c and P q r l .
  • Step 6: Sort all solutions from P c P q r l according to fitness.
  • Step 7: Select N best solutions as the initial population P.
The s 2 simply combines search expressions of FA and SCA algorithms, Equations (13) and (17), respectively, and in each iteration every individual is evolved either by performing FA or SCA search. Finally, the s 1 and s 2 execute independently, where each swarm evolves its own population of candidate solutions.
The s 1 and s 2 search processes are shown in Algorithms 2 and 3, respectively.
Algorithm 2 Search process of s1—ABC algorithm
  • for each solution X i  do
  •    perform employed bee phase according to Equation (6)
  •    perform onlooker bee phase according to expressions (6) and (7)
  • end for
  • perform scout bee phase (explicit exploration) according to expression (4)
Algorithm 3 Search process of s2—LLH between FA and SCA
  • for each solution X i  do
  •    if  r a n d ( 0 , 1 ) > 0.5  then
  •      Evolve X i by FA search—expression (13)
  •    else
  •      Evolve X i by SCA search—expression (17)
  •    end if
  • end for
However, as noted above, in order to facilitate the search, after ψ iterations, the mechanism of exchanging knowledge (knowledge exchange mechanism— K E M ) about the search region between s 1 and s 2 is triggered and it is executed in the following way in every iteration: if r a n d ( 0 , 1 ) > k e f replace one worst solution from s 1 ( X w , s 1 ) with the best individual from s 2 ( X b , s 2 ) and vice-versa. However, this mechanism may also render some problems. If the exchange of solutions between swarms is triggered too early and/or too frequently, then diversity of swarms may be lost and local optimal solutions may be returned. This scenario is mitigated by additional two control parameters: ψ and k e f . The k e f (knowledge exchange frequency) controls the frequency of K E M triggering after the condition t > ψ , where t is the current iteration counter, has been satisfied.
High-level inner workings of proposed MS-AFS are described in Algorithm 4.
Algorithm 4 High-level MS-AFS pseudo-code
  • Initialize global parameters: t = 0 , T, and N.
  • Initialize: control parameters of ABC, FA, and SCA meta-heuristics.
  • Generate initial population P according to Algorithm 1.
  • Determine populations for s 1 and s 2 P 1 and P 2 , respectively.
  • while  t T do
  •    Execute s 1 according to Algorithm 2
  •    Execute s 2 according to Algorithm 3
  •    if  t > ϕ  then
  •      if  r a n d ( 0 , 1 ) > k e f  then
  •         Trigger K E M mechanism
  •      end if
  •    end if
  • end while
  • Return X b e s t
  • Results analysis, performance metrics generation and visualization

3.2.3. Computational Complexity, MS-AFS Solutions’ Encoding for ELM Tuning and Flow-Chart

Because the most computationally costly portion of the swarm intelligence algorithm is the objective evaluation [86], the number of F F E s may be used to assess the complexity of the method.
Proposed MS-AFS does not impose additional F F E s , not even in the initialization phases, therefore in terms of F F E s , its complexity is given as:
O ( M S A F S ) = O ( N ) + O ( ( N · T ) )
However, there is always a trade-off, therefore the proposed MS-AFS also exhibits some limitations. The major drawback of MS-AFS method is reflected in the fact that the algorithm requires more control parameters. All three components of the MS-AFS, namely the ABC, FA, and SCA, have to be tuned with their respective control parameters. Nevertheless, the proposed MS-AFS is significantly more efficient than the individual algorithms, justifying the requirement for more control parameters, as it is shown in Section 4.
The plain ELM model is based on the random initial set of the input weights and biases, consequently being vulnerable to several performance drawbacks. More specifically, the plain ELM frequently requires a significant amount of neurons, that could be not necessary and/or sub-optimal. This increase in the number of neurons in the hidden layer can slow down the ELM response in case that previously unknown data are wired to the network inputs, rendering it impractical for numerous practical applications.
The proposed hybrid multi-swarm meta-heuristics and ELM model framework utilizes MS-AFS meta-heuristics to optimize the input weights and biases of the ELM model, while the number of neurons in the hidden layer was determined by a simple grid search. The MP generalized inverse has been used to obtain the output weights. Therefore, the proposed hybrid technique is named ELM-MS-AFS.
Each MS-AFS solution consists of n n · f s + n n parameters, where n n and f s denote number of neurons in the hidden layer, and the size of input feature vector, respectively. For the sake of clarity, a flow-chart of proposed ELM-MS-AFS is given in Figure 1.

4. Experiments

This section first describes the datasets used in the experiments, followed by the metrics that were used to evaluate the results. Finally, this section provides the obtained results and their comparative analysis with other similar cutting-edge methods.

4.1. Datasets

The experiments in this research were performed on seven well-known UCI (University of California, Irvine) benchmark datasets, namely Diabetes, Heart Disease, Iris, Wine, Wine Quality, Satellite and Shuttle, that can be retrieved from (accessed on 15 May 2022).
Their characteristics have been summarized in Table 1. The Pima Indians Diabetes dataset is utilized in diabetes diagnostics, to determine if the patient is positive or not. The dataset comprises 768 patterns belonging to two distinct classes. The Heart Disease dataset comprises 270 patterns, with 13 attributes and two classes, that indicate if the patient has a heart disease or not. The third dataset, namely the Fisher Iris dataset, consists of three flower species measurements (viz. Setosa, Verginica, and Versicolor). The Iris dataset is comprised of three classes, and every class has fifty samples. The Wine dataset comprises 178 samples belonging to three sorts of wines. The Wine dataset was created by the chemical analyses that have been performed on wines produced from the grapes grown in the same region in Italy, but by three different cultivators.
The fifth dataset used, Wine Quality, deals with the sorts of the Portuguese “Vinho Verde” wines. The quality of wines is modeled by the results obtained with physiochemical testing. The satellite image dataset comprises the multi-spectral pixel values located in 3 × 3 neighbourhood areas of the satellite images. This dataset is also available on the UCI repository ( (accessed on 15 May 2022)), where it is stated that it has seven classes. However, it actually has just six classes, as reported in Table 1. Finally, the seventh dataset, Shuttle, relates to the radiators’ placement on board of the Space Shuttle, and it comprises 58,000 samples, with nine attributes and separated into seven classes.
All datasets have been divided into training and testing groups. Satellite and Shuttle datasets are available with already predetermined train and test subsets, and they were used accordingly. Diabetes, Disease, Iris, Wine, and Wine Quality datasets do not have predetermined training and testing subsets, as each one of them comes in the form of a singular dataset. Therefore, all five mentioned datasets were subsequently separated into testing and training subsets by utilizing 70% of data for training process, and 30% for testing. Since most of the datasets are imbalanced, data are split in a stratified fashion to maintain the same proportions of class labels in training and testing subsets as in the input dataset.
Visualization of class distributions in the employed datasets is provided in Figure 2 and Figure 3 for Diabetes, Disease, Iris, Wine, and Wine Quality before split into training and testing subsets and for Satellite and Shuttle with already predetermined training and testing groups, respectively.

4.2. Metrics

In order to evaluate the performances of the proposed MS-AFS, it is required to measure them accurately and precisely. The common approach to evaluate machine learning models is based on the false positives (FP) and false negatives (FN), along with true positives (TP) and true negatives (TN), to accurately verify the classification accuracy, as defined by the general formula given by Equation (23).
A C C = T P + T N / T P + F P + T N + F N
By utilizing TP, TN, FP, and FN, the model’s recall, sensitivity (recall) and F-measure can easily be determined by applying the formulas given in Equations (24)–(26):
P r e c i s i o n = T P / ( T P + F P )
R e c a l l ( s e n s i t i v i t y ) = T P / ( T P + F N )
F - m e a s u r e = ( 2 · P r e c i s i o n · R e c a l l ) / ( P r e c i s i o n + R e c a l l )
The precision and recall measurements are very important for the imbalanced datasets.

4.3. Experimental Results and Comparative Analysis with Other Cutting-Edge Meta-Heuristics

The performance of the suggested method has been evaluated by utilizing the similar experimental setup as proposed in the referred paper [2]. The proposed method has been validated and compared against the basic versions of the algorithms that were used to create a multi-swarm method—ABC [23], FA [24], and SCA [74]. Additionally, the elaborated algorithm has been compared to the bat algorithm (BA) [87], Harris hawk optimization (HHO) [88], whale optimization algorithm (WOA) [27], and Invasive Weed Optimization (IWO) [19], which were also used in [2]. It is important to note that all meta-heuristics included in the experiments were independently implemented by the authors, and these results were reported in the tables. Additionally, to emphasize that meta-heuristics were applied to ELM tuning, each proposed approach is shown with the prefix ‘ELM’.
All meta-heuristics included in comparative analysis were tested with optimal (sub-optimal) parameters which are suggested in original papers. Values of MS-AFS specific parameters were determined empirically and they were set as follows for all simulations: ψ = T / 5 and k e f = 0.6 .
In paper [2] simulations were executed with 20 solutions in the population ( N = 20 ) and the termination condition was limited to 100 iterations ( T = 100 ). However, in this research, a lower number of neurons in the hidden ELM layer were employed for all observed datasets.
In the proposed research, a simple grid search has been applied to determine the optimal (sub-optimal) number of neurons for all datasets in average. The search was performed with 10–200 neurons with a step size of 10 and it was observed that, in average for all datasets, the best performance was obtained with 30, 60, and 90 neurons. Therefore, in this research, simulations with 30, 60, and 90 neurons are conducted to evaluate the performance of the proposed ELM-MS-AFS model.
However, in this research, all methods were tested by employing a substantially lower number of iterations than in [2]. All methods were tested with N = 20 and T = 20 in 50 independent runs, and best, worst, and mean accuracy along with standard deviation performance metrics are reported in Table 2, Table 3 and Table 4 for 30, 60, and 90 neurons, respectively. The basic ELM was also tested on each dataset in 50 independent runs.
The findings from Table 2, Table 3 and Table 4 demonstrate the superior performance of meta-heuristics-based ELMs over the basic ELM. It can be noted that the plain ELM exhibited high standard deviations on all datasets, for 30, 60, and 90 neurons, which was expected as the weights are initialized in a random fashion, without any kind of “intelligence”. The proposed ELM-MS-AFS approach produced the best results by far, considering the meta-heuristics-based ELMs. In case of 30 neurons in a hidden layer, depicted in Table 2, the ELM-MS-AFS obtained the best results in terms of best, worst, and mean accuracies on five datasets (Diabetes, Disease, Wine Quality, Satellite, and Shuttle), and also being tied on the first place in two occasions (Iris and Wine). Similar trends are observed in case of 60 neurons (Table 3), where ELM-MS-AFS achieved the best results in terms of best, worst, and mean accuracies on three datasets (Diabetes, Disease, Wine Quality, and Satellite), and being tied on the first position on Iris and Wine Datasets. The ELM-MS-AFS also obtained the highest best accuracy in the case of the Shuttle dataset. Finally, on the experiments with 90 neurons in the hidden layer shown in Table 4, the proposed ELM-MS-AFS obtained the best results on five datasets (Diabetes, Disease, Wine Quality, Satellite, and Shuttle), and was tied for the first place on the Wine dataset.
Another interesting conclusion can be derived from the obtained performance for a different amount of neurons in the hidden layer. For example, for the ELM-MS-AFS approach, performance rise with the increased number of neurons on some datasets, as it can be seen for the Disease dataset, where the ELM-MS-AFS achieved an average accuracy of 91.80% with 30 neurons, 92.86% with 60 neurons, and 95.70% with 90 neurons. Similar patterns can be observed for the Satellite dataset. On the other hand, on the Diabetes dataset, ELM-MS-AFS achieved the best performance and average accuracy of 84.63% with 30 neurons, then a drop to 81.60% in the average accuracy can be seen with 60 neurons, and, finally, again an increase to 83.55% with 90 neurons. Finally, for the Wine Quality dataset, the ELM-MS-AFS achieved the best performances with an accuracy of 67.60% with 60 neurons in the hidden layer. A further incrementation in neurons did not result in an increased accuracy, as there is a drop of the average accuracy to 67.40% when the network is leveraged to 90 neurons. This is a classic example of the over-fitting issue, where increasing the number of neurons reduces the generalization capabilities of the model, and results in the network that learns training data too well and under-performs on the test data.
As already noted above, for imbalanced datasets, accuracy metric is not enough to gain insights into classification results, therefore in Table 5, Table 6 and Table 7, macro averaged precision, recall, and f1-score metrics, obtained by ELM tuned with meta-heuristics approaches for the best run, were also shown for experiments with 30, 60, and 90 neurons, respectively. All those metrics were extracted from classification report.
In order to better visualize the performance and classification error rate speed of convergence for the proposed ELM-MF-AFS method, convergence graphs for all seven datasets, for the cases of 30, 60, and 90 neurons, are shown in Figure 4. The compared algorithms were also plotted in Figure 4. It is obvious that the proposed method converges much faster than other approaches for most of the datasets. Additionally, it can be observed that the proposed MS-AFS has initial advantage due to the chaotic and QRL initialization.
Finally, visualization of obtained metrics is further showed in Figure 5, where generated confusion matrices and precision-recall (PR) curves for some simulations by proposed ELM-MS-AFS are shown.

Statistical Tests

In this section, findings of statistical tests conducted for simulations shown in Section 4.3, are presented with the goal of establishing whether or not performance improvements of proposed ELM-MS-AFS over other state-of-the-art meta-heuristics are statistically significant.
All statistical tests were performed by taking best values of all methods obtained in all three simulations—with 30, 60, and 90 neurons in the hidden layer. In order to determine if the generated improvements are significant in terms of statistics, a Friedman Aligned test [89,90] and two-way variance analysis by ranks have been employed. By analyzing the test results, a conclusion can be made if there is a significant results’ difference among the proposed ELM-MS-AFS and other methods encompassed by comparison. The Friedman Aligned test results for the eight compared algorithms on seven datasets are presented in Table 8.
The results presented in Table 8 statistically indicate that the proposed ELM-MS-AFS algorithm has superior performance when compared to the other seven algorithms with an average rank value of 9.5. The second best performance was achieved by ELM-HHO algorithm that scored the average rank of 24.36, while the ELM-IWO algorithm obtained the average rank of 27.64 at third place. The basic ELM-ABC, ELM-FA and ELM-SCA meta-heuristics obtained the average ranks of 34.21, 31.07, and 32.5, respectively. Additionally, the Friedman Aligned statistics ( χ r 2 = 18.49 ) is greater than the χ 2 critical value with seven degrees of freedom ( 14.07 ), at significance level α = 0.05 . As the result, the null hypothesis ( H 0 ) can be rejected and it can be stated that the suggested ELM-MS-AFS achieved results that are significantly different than the other seven algorithms.
Finally, the non-parametric post-hoc procedure, the Holm’s step-down procedure, is also conducted and presented in Table 9. By using this procedure, all methods are sorted according to their p value and compared with α / ( k i ) , where k and i represent the degree of freedom (in this work k = 10 ) and the algorithm number after sorting according to the p value in ascending order (which corresponds to rank), respectively. In this study the α is set to 0.05 and 0.1. Additionally, it is noted that the p-value results are provided in scientific notation.
The results given in the Table 9 suggest that the proposed algorithm significantly outperformed all opponent algorithms at both significance levels α = 0.1 and α = 0.05 .

4.4. Hybridization by Pairs

Although the reasons of combining ABC, FA and SCA meta-heuristics in multi-swarm approach are elaborated in Section 3.2.1, for the purpose of this research, additional methods were implemented to prove that combining two algorithms is not as effective as joining three approaches. Therefore, the following HLH teamwork mode optimizers were implemented: ABC-FA, ABC-SCA, and FA-SCA.
All methods have the same properties as the MS-AFS meta-heuristics—they employ chaotic and QRL population initialization and the K E M procedure controlled by k e f and ψ control parameters (for more details please refer to Section 3.2.2). During the initialization phase, N / 2 worse individuals are included in population s 1 , which is guided by the ABC algorithm in case of ABC-FA and ABC-SCA approaches, and by the FA algorithm in the case of FA-SCA method. It is also worth mentioning that all three additional hybrid methods have the same computational complexity as the MS-AFS.
It needs to be noted that the hybrid between FA and SCA is established as the HLH, not as the LLH which is the case of the MS-AFS, because only in this way two populations controlled by different methods can be generated. Alternatively, establishing LLH between FA and SCA would not render a fair comparison with the MS-AFS, because the K E M procedure could not be implemented. Naturally, the three methods above can be combined in various different ways, but performing all hybridization possibilities would go far beyond the scope of our research.
The same experimental ELM’s tuning setup as in the basic experiment (Section 4.3) was established and the same control parameters’ values as for ELM-MS-AFS were used for ELM-ABC-FA, ELM-ABC-SCA, and ELM-FA-SCA. The additionally implemented methods were validated only for three more challenging datasets from the previous experiment: Wine Quality, Satellite, and Shuttle with 30, 60, and 90 neurons.
However, with the aim of gaining more insights into the performance of proposed ELM-MS-AFS, one more challenging dataset was included for the current comparison. The newly utilized NSL-KDD dataset is an improved version of the KDD’99 dataset for network intrusion detection and it has been widely used in the modern literature [91,92,93]. However, according to authors’ findings, the ELM has never been applied to this dataset before.
Predefined training and testing sets for the NSL-KDD, as well as its description, can be retrieved from the following URL: (accessed on 15 May 2022) and it includes in total 148,517 instances with 41 mixed numerical and categorical features along with five classes. Class 0 represents normal network traffic (no intrusion), while the other four classes denote malicious type of network traffic (Probe, DoS, U2R, and R2L). For training ELM, all categorical features were transformed into integers using one hot encoding (OHE) scheme, resulting in a dataset with 122 attributes. Other features are normalized. It also should be emphasized that the NSL-KDD dataset is highly imbalanced (Figure 6) and in the conducted experiments it was used as such.
Following the setup from the previous experiments, all hybrids are tested with N = 20 and T = 20 in 50 independent runs and best, mean, worst accuracy along with standard deviation for all four datasets with 30, 60, and 90 ELM neurons are captured and reported in Table 10. Detailed performance indicators for the best run in terms of macro averaged precision, recall, and f1-score are shown in Table 11. In both tables, the best achieved results are denoted with bold style.
Convergence speed graphs for all additional simulations are shown in Figure 7.
From provided simulation results, as well as from convergence graphs, it can clearly be stated that the proposed ELM-MS-AFS on average exhibits superior performance over ELM-ABC-FA, ELM-ABC-SCA, and ELM-FA-SCA hybrid meta-heuristics, therefore the assumption that combining three approaches renders better performance than joining two methods is justified. It is also interesting to notice that, on average, when all simulations are taken into account, the ELM-ABC-FA and ELM-FA-SCA are close in terms of performance and that the ELM-ABC-SCA achieves slightly worse results. Finally, by comparing with the metrics established by other state-of-the-art swarm approaches, shown in the tables from Section 4.3, all three hybrid meta-heuristics on average proved to be more efficient and robust optimizers than standard, non-hybridized algorithms.
Additionally, since the NSL-KDD dataset is highly imbalanced, the PR curves for all four hybrid methods for simulations with 30, 60, and 90 ELM’s neurons are shown in Figure 8. From this visualization, it can be also concluded that the ELM-MS-AFS on average manages to better classify classes with minority of samples.

5. Conclusions

This paper proposes a novel approach to ELM optimization by swarm intelligence meta-heuristics. For this purpose, a novel multi-swarm algorithm has been implemented, by combining three famous algorithms: ABC, FA, and SCA. The goal of this hybrid method was to combine the strengths of each individual algorithm, and compensate their weaknesses. New multi-swarm meta-heuristics has been named MS-AFS, and later used to optimize the weights and biases in ELM model. The number of ELM’s hidden neurons was not subjected to optimization, as the simple grid search was employed to determine the optimal number of neurons.
To validate the new ELM-MS-AFS technique, thorough simulations were conducted with seven UCI benchmark datasets, with 30, 60, and 90 neurons in the hidden layer. The results have been compared to the basic ELM, and to seven other cutting-edge meta-heuristics-based ELMs. The proposed ELM-MS-AFS method has proven to be superior to other methods included in the analysis, as it was confirmed with statistical tests employed to determine the significance of the improvements of the proposed method.
Additionally, to prove that combining two algorithms is not as effective as joining three approaches, hybrids generated by pairing each two methods employed in the proposed multi-swarm approach, were also implemented and validated against four challenging datasets. From obtained simulation results, it was concluded that the proposed ELM-MS-AFS on average exhibits superior performance over ELM-ABC-FA, ELM-ABC-SCA, and ELM-FA-SCA hybrid meta-heuristics, therefore the assumption that combining three approaches renders better performance than joining two methods is justified.
The future research in this area will include extensive testing of the proposed ELM-MS-AFS approach on other benchmark and real-life datasets, and employing it in various application domains. Additionally, the number of neurons in the hidden layer will also be subjected to the optimization process. Finally, the proposed MS-AFS meta-heuristics will be tested and employed for solving NP-hard tasks for other domains, such as wireless sensor networks and cloud-based systems.

Author Contributions

Conceptualization, M.Z., N.B. and C.S.; methodology, N.B., C.S. and D.J.; software, N.B. and M.Z.; validation, M.A., D.J. and D.M.; formal analysis, M.Z.; investigation, D.J., D.M. and N.B.; resources, D.J., D.M., M.A. and C.S.; data curation, M.Z., C.S. and N.B.; writing—original draft preparation, D.J., M.A. and D.M.; writing—review and editing, C.S., M.Z. and N.B.; visualization, N.B., M.A. and M.Z.; supervision, N.B.; project administration, M.Z. and N.B.; funding acquisition, N.B. and C.S. All authors have read and agreed to the published version of the manuscript.


This research was funded by the Romanian Ministry of Research and Innovation, CCCDI—UEFISCDI, project number 178PCE/2021, PN-III-P4-ID-PCE-2020-0788, within PNCDI III.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets used in this study are public and available on the UCI repository on the following URL:, accessed on 15 May 2022. Preprocessed datasets along with same code is available on the following GitHub link:, accessed on 15 May 2022.


Catalin Stoean acknowledges the support of a grant of the Romanian Ministry of Research and Innovation, CCCDI—UEFISCDI, project number 178PCE/2021, PN-III-P4-ID-PCE-2020-0788, within PNCDI III.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), Budapest, Hungary, 25–29 July 2004; Volume 2, pp. 985–990. [Google Scholar] [CrossRef]
  2. Alshamiri, A.K.; Singh, A.; Surampudi, B.R. Two swarm intelligence approaches for tuning extreme learning machine. Int. J. Mach. Learn. Cybern. 2018, 9, 1271–1283. [Google Scholar] [CrossRef]
  3. Wang, J.; Lu, S.; Wang, S.; Zhang, Y.D. A review on extreme learning machine. Multimed. Tools Appl. 2021, 1–50. [Google Scholar] [CrossRef]
  4. Rong, H.J.; Ong, Y.S.; Tan, A.H.; Zhu, Z. A fast pruned-extreme learning machine for classification problem. Neurocomputing 2008, 72, 359–366. [Google Scholar] [CrossRef]
  5. Zhu, Q.Y.; Qin, A.; Suganthan, P.; Huang, G.B. Evolutionary extreme learning machine. Pattern Recognit. 2005, 38, 1759–1763. [Google Scholar] [CrossRef]
  6. Cao, J.; Lin, Z.; Huang, G.B. Self-adaptive evolutionary extreme learning machine. Neural Process. Lett. 2012, 36, 285–305. [Google Scholar] [CrossRef]
  7. Miche, Y.; Sorjamaa, A.; Bas, P.; Simula, O.; Jutten, C.; Lendasse, A. OP-ELM: Optimally pruned extreme learning machine. IEEE Trans. Neural Netw. 2009, 21, 158–162. [Google Scholar] [CrossRef]
  8. Huang, G.B.; Chen, L.; Siew, C.K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 2006, 17, 879–892. [Google Scholar] [CrossRef] [Green Version]
  9. Serre, D. Matrices: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  10. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  11. Huang, G.B. Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans. Neural Netw. 2003, 14, 274–281. [Google Scholar] [CrossRef] [Green Version]
  12. Zheng, W.; Qian, Y.; Lu, H. Text categorization based on regularization extreme learning machine. Neural Comput. Appl. 2013, 22, 447–456. [Google Scholar] [CrossRef]
  13. Zong, W.; Huang, G.B. Face recognition based on extreme learning machine. Neurocomputing 2011, 74, 2541–2551. [Google Scholar] [CrossRef]
  14. Cao, F.; Liu, B.; Park, D.S. Image classification based on effective extreme learning machine. Neurocomputing 2013, 102, 90–97. [Google Scholar] [CrossRef]
  15. Wang, Z.; Yu, G.; Kang, Y.; Zhao, Y.; Qu, Q. Breast tumor detection in digital mammography based on extreme learning machine. Neurocomputing 2014, 128, 175–184. [Google Scholar] [CrossRef]
  16. Kaya, Y.; Uyar, M. A hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease. Appl. Soft Comput. 2013, 13, 3429–3438. [Google Scholar] [CrossRef]
  17. Xu, Y.; Shu, Y. Evolutionary extreme learning machine—Based on particle swarm optimization. In Advances in Neural Networks—ISNN 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 644–652. [Google Scholar]
  18. Zong, W.; Huang, G.B.; Chen, Y. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013, 101, 229–242. [Google Scholar] [CrossRef]
  19. Mehrabian, A.; Lucas, C. A novel numerical optimization algorithm inspired from weed colonization. Ecol. Informatics 2006, 1, 355–366. [Google Scholar] [CrossRef]
  20. Raslan, A.F.; Ali, A.F.; Darwish, A. 1—Swarm intelligence algorithms and their applications in Internet of Things. In Swarm Intelligence for Resource Management in Internet of Things; Intelligent Data-Centric Systems; Academic Press: Cambridge, MA, USA, 2020; pp. 1–19. [Google Scholar] [CrossRef]
  21. Dorigo, M.; Birattari, M. Ant Colony Optimization. In Encyclopedia of Machine Learning; Springer: Boston, MA, USA, 2010; pp. 36–39. [Google Scholar] [CrossRef]
  22. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
  23. Karaboga, D.; Basturk, B. On the performance of artificial bee colony (ABC) algorithm. Appl. Soft Comput. 2008, 8, 687–697. [Google Scholar] [CrossRef]
  24. Yang, X.S. Firefly algorithms for multimodal optimization. In Stochastic Algorithms: Foundations and Applications; Watanabe, O., Zeugmann, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 169–178. [Google Scholar]
  25. Gandomi, A.H.; Yang, X.S.; Alavi, A.H. Cuckoo search algorithm: A metaheuristic approach to solve structural optimization problems. Eng. Comput. 2013, 29, 17–35. [Google Scholar] [CrossRef]
  26. Yang, X.; Gandomi, A.H. Bat algorithm: A novel approach for global engineering optimization. Eng. Comput. 2012, 29, 464–483. [Google Scholar] [CrossRef] [Green Version]
  27. Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  28. Wang, G.G.; Deb, S.; Coelho, L.d.S. Elephant Herding Optimization. In Proceedings of the 2015 3rd International Symposium on Computational and Business Intelligence (ISCBI), Bali, Indonesia, 7–9 December 2015; pp. 1–5. [Google Scholar] [CrossRef]
  29. Mucherino, A.; Seref, O. Monkey search: A novel metaheuristic search for global optimization. AIP Conf. Proc. 2007, 953, 162–173. [Google Scholar] [CrossRef]
  30. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
  31. Yang, X.S. Flower pollination algorithm for global optimization. In Unconventional Computation and Natural Computation; Springer: Berlin/Heidelberg, Germany, 2012; pp. 240–249. [Google Scholar]
  32. Feng, Y.; Deb, S.; Wang, G.G.; Alavi, A.H. Monarch butterfly optimization: A comprehensive review. Expert Syst. Appl. 2021, 168, 114418. [Google Scholar] [CrossRef]
  33. Li, S.; Chen, H.; Wang, M.; Heidari, A.A.; Mirjalili, S. Slime mould algorithm: A new method for stochastic optimization. Future Gener. Comput. Syst. 2020, 111, 300–323. [Google Scholar] [CrossRef]
  34. Wang, G.G. Moth search algorithm: A bio-inspired metaheuristic algorithm for global optimization problems. Memetic Comput. 2018, 10, 151–164. [Google Scholar] [CrossRef]
  35. Yang, Y.; Chen, H.; Heidari, A.A.; Gandomi, A.H. Hunger games search: Visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst. Appl. 2021, 177, 114864. [Google Scholar] [CrossRef]
  36. Tu, J.; Chen, H.; Wang, M.; Gandomi, A.H. The Colony Predation Algorithm. J. Bionic Eng. 2021, 18, 674–710. [Google Scholar] [CrossRef]
  37. Bezdan, T.; Petrovic, A.; Zivkovic, M.; Strumberger, I.; Devi, V.K.; Bacanin, N. Current Best Opposition-Based Learning Salp Swarm Algorithm for Global Numerical Optimization. In Proceedings of the 2021 Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 26–27 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5–10. [Google Scholar]
  38. Bezdan, T.; Zivkovic, M.; Tuba, E.; Strumberger, I.; Bacanin, N.; Tuba, M. Multi-objective Task Scheduling in Cloud Computing Environment by Hybridized Bat Algorithm. In Proceedings of the International Conference on Intelligent and Fuzzy Systems, Istanbul, Turkey, 21–23 July 2020; Springer: Cham, Switzerland, 2020; pp. 718–725. [Google Scholar]
  39. Bacanin, N.; Bezdan, T.; Tuba, E.; Strumberger, I.; Tuba, M.; Zivkovic, M. Task scheduling in cloud computing environment by grey wolf optimizer. In Proceedings of the 2019 27th Telecommunications Forum (TELFOR), Belgrade, Serbia, 26–27 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
  40. Bacanin, N.; Zivkovic, M.; Bezdan, T.; Venkatachalam, K.; Abouhawwash, M. Modified firefly algorithm for workflow scheduling in cloud-edge environment. Neural Comput. Appl. 2022, 34, 9043–9068. [Google Scholar] [CrossRef]
  41. Bacanin, N.; Sarac, M.; Budimirovic, N.; Zivkovic, M.; AlZubi, A.A.; Bashir, A.K. Smart wireless health care system using graph LSTM pollution prediction and dragonfly node localization. Sustain. Comput. Infor. Syst. 2022, 35, 100711. [Google Scholar] [CrossRef]
  42. Zivkovic, M.; Bacanin, N.; Tuba, E.; Strumberger, I.; Bezdan, T.; Tuba, M. Wireless Sensor Networks Life Time Optimization Based on the Improved Firefly Algorithm. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1176–1181. [Google Scholar]
  43. Bacanin, N.; Tuba, E.; Zivkovic, M.; Strumberger, I.; Tuba, M. Whale Optimization Algorithm with Exploratory Move for Wireless Sensor Networks Localization. In Proceedings of the International Conference on Hybrid Intelligent Systems, Bhopal, India, 10–12 December 2019; Springer: Cham, Switzerland, 2019; pp. 328–338. [Google Scholar]
  44. Zivkovic, M.; Bacanin, N.; Zivkovic, T.; Strumberger, I.; Tuba, E.; Tuba, M. Enhanced Grey Wolf Algorithm for Energy Efficient Wireless Sensor Networks. In Proceedings of the 2020 Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 26–27 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 87–92. [Google Scholar]
  45. Bacanin, N.; Stoean, R.; Zivkovic, M.; Petrovic, A.; Rashid, T.A.; Bezdan, T. Performance of a Novel Chaotic Firefly Algorithm with Enhanced Exploration for Tackling Global Optimization Problems: Application for Dropout Regularization. Mathematics 2021, 9, 2705. [Google Scholar] [CrossRef]
  46. Strumberger, I.; Tuba, E.; Bacanin, N.; Zivkovic, M.; Beko, M.; Tuba, M. Designing convolutional neural network architecture by the firefly algorithm. In Proceedings of the 2019 International Young Engineers Forum (YEF-ECE), Costa da Caparica, Portugal, 10 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 59–65. [Google Scholar]
  47. Milosevic, S.; Bezdan, T.; Zivkovic, M.; Bacanin, N.; Strumberger, I.; Tuba, M. Feed-Forward Neural Network Training by Hybrid Bat Algorithm. In Modelling and Development of Intelligent Systems, Proceedings of the 7th International Conference, MDIS 2020, Sibiu, Romania, 22–24 October 2020; Revised Selected Papers 7; Springer: Cham, Switzerland, 2021; pp. 52–66. [Google Scholar]
  48. Bezdan, T.; Stoean, C.; Naamany, A.A.; Bacanin, N.; Rashid, T.A.; Zivkovic, M.; Venkatachalam, K. Hybrid Fruit-Fly Optimization Algorithm with K-Means for Text Document Clustering. Mathematics 2021, 9, 1929. [Google Scholar] [CrossRef]
  49. Cuk, A.; Bezdan, T.; Bacanin, N.; Zivkovic, M.; Venkatachalam, K.; Rashid, T.A.; Devi, V.K. Feedforward multi-layer perceptron training by hybridized method between genetic algorithm and artificial bee colony. In Data Science and Data Analytics: Opportunities and Challenges; CRC Press: Boca Raton, FL, USA, 2021; p. 279. [Google Scholar]
  50. Stoean, R. Analysis on the potential of an EA–surrogate modelling tandem for deep learning parametrization: An example for cancer classification from medical images. Neural Comput. Appl. 2020, 32, 313–322. [Google Scholar] [CrossRef]
  51. Bacanin, N.; Bezdan, T.; Zivkovic, M.; Chhabra, A. Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In Mobile Computing and Sustainable Informatics; Springer: Cham, Switzerland, 2022; pp. 397–409. [Google Scholar]
  52. Gajic, L.; Cvetnic, D.; Zivkovic, M.; Bezdan, T.; Bacanin, N.; Milosevic, S. Multi-layer perceptron training using hybridized bat algorithm. In Computational Vision and Bio-Inspired Computing; Springer: Cham, Switzerland, 2021; pp. 689–705. [Google Scholar]
  53. Bacanin, N.; Alhazmi, K.; Zivkovic, M.; Venkatachalam, K.; Bezdan, T.; Nebhen, J. Training Multi-Layer Perceptron with Enhanced Brain Storm Optimization Metaheuristics. Comput. Mater. Contin. 2022, 70, 4199–4215. [Google Scholar] [CrossRef]
  54. Jnr, E.O.N.; Ziggah, Y.Y.; Relvas, S. Hybrid ensemble intelligent model based on wavelet transform, swarm intelligence and artificial neural network for electricity demand forecasting. Sustain. Cities Soc. 2021, 66, 102679. [Google Scholar]
  55. Bacanin, N.; Bezdan, T.; Venkatachalam, K.; Zivkovic, M.; Strumberger, I.; Abouhawwash, M.; Ahmed, A. Artificial Neural Networks Hidden Unit and Weight Connection Optimization by Quasi-Refection-Based Learning Artificial Bee Colony Algorithm. IEEE Access 2021, 9, 169135–169155. [Google Scholar] [CrossRef]
  56. Bacanin, N.; Zivkovic, M.; Bezdan, T.; Cvetnic, D.; Gajic, L. Dimensionality Reduction Using Hybrid Brainstorm Optimization Algorithm. In Proceedings of the International Conference on Data Science and Applications, Kolkata, India, 26–27 March 2022; Springer: Cham, Switzerland, 2022; pp. 679–692. [Google Scholar]
  57. Latha, R.S.; Saravana Balaji, B.; Bacanin, N.; Strumberger, I.; Zivkovic, M.; Kabiljo, M. Feature Selection Using Grey Wolf Optimization with Random Differential Grouping. Comput. Syst. Sci. Eng. 2022, 43, 317–332. [Google Scholar] [CrossRef]
  58. Zivkovic, M.; Stoean, C.; Chhabra, A.; Budimirovic, N.; Petrovic, A.; Bacanin, N. Novel Improved Salp Swarm Algorithm: An Application for Feature Selection. Sensors 2022, 22, 1711. [Google Scholar] [CrossRef]
  59. Bacanin, N.; Petrovic, A.; Zivkovic, M.; Bezdan, T.; Antonijevic, M. Feature Selection in Machine Learning by Hybrid Sine Cosine Metaheuristics. In Proceedings of the International Conference on Advances in Computing and Data Sciences, Nashik, India, 23–24 April 2021; Springer: Cham, Switzerland, 2021; pp. 604–616. [Google Scholar]
  60. Salb, M.; Zivkovic, M.; Bacanin, N.; Chhabra, A.; Suresh, M. Support vector machine performance improvements for cryptocurrency value forecasting by enhanced sine cosine algorithm. In Computer Vision and Robotics; Springer: Berlin/Heidelberg, Germany, 2022; pp. 527–536. [Google Scholar]
  61. Bezdan, T.; Zivkovic, M.; Tuba, E.; Strumberger, I.; Bacanin, N.; Tuba, M. Glioma Brain Tumor Grade Classification from MRI Using Convolutional Neural Networks Designed by Modified FA. In Proceedings of the International Conference on Intelligent and Fuzzy Systems, Izmir, Turkey, 21–23 July 2020; Springer: Cham, Switzerland, 2020; pp. 955–963. [Google Scholar]
  62. Bezdan, T.; Milosevic, S.; Venkatachalam, K.; Zivkovic, M.; Bacanin, N.; Strumberger, I. Optimizing Convolutional Neural Network by Hybridized Elephant Herding Optimization Algorithm for Magnetic Resonance Image Classification of Glioma Brain Tumor Grade. In Proceedings of the 2021 Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 26–27 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 171–176. [Google Scholar]
  63. Basha, J.; Bacanin, N.; Vukobrat, N.; Zivkovic, M.; Venkatachalam, K.; Hubálovskỳ, S.; Trojovskỳ, P. Chaotic Harris Hawks Optimization with Quasi-Reflection-Based Learning: An Application to Enhance CNN Design. Sensors 2021, 21, 6654. [Google Scholar] [CrossRef]
  64. Tair, M.; Bacanin, N.; Zivkovic, M.; Venkatachalam, K. A Chaotic Oppositional Whale Optimisation Algorithm with Firefly Search for Medical Diagnostics. Comput. Mater. Contin. 2022, 72, 959–982. [Google Scholar] [CrossRef]
  65. Zivkovic, M.; Bacanin, N.; Venkatachalam, K.; Nayyar, A.; Djordjevic, A.; Strumberger, I.; Al-Turjman, F. COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain. Cities Soc. 2021, 66, 102669. [Google Scholar] [CrossRef]
  66. Zivkovic, M.; Venkatachalam, K.; Bacanin, N.; Djordjevic, A.; Antonijevic, M.; Strumberger, I.; Rashid, T.A. Hybrid Genetic Algorithm and Machine Learning Method for COVID-19 Cases Prediction. In Proceedings of the International Conference on Sustainable Expert Systems: ICSES 2020, Lalitpur, Nepal, 28–29 September 2020; Springer: Gateway East, Singapore, 2021; Volume 176, p. 169. [Google Scholar]
  67. Zivkovic, M.; Jovanovic, L.; Ivanovic, M.; Krdzic, A.; Bacanin, N.; Strumberger, I. Feature selection using modified sine cosine algorithm with COVID-19 dataset. In Evolutionary Computing and Mobile Sustainable Networks; Springer: Gateway East, Singapore, 2022; pp. 15–31. [Google Scholar]
  68. Bui, D.T.; Ngo, P.T.T.; Pham, T.D.; Jaafari, A.; Minh, N.Q.; Hoa, P.V.; Samui, P. A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping. Catena 2019, 179, 184–196. [Google Scholar] [CrossRef]
  69. Feng, Z.k.; Niu, W.j.; Zhang, R.; Wang, S.; Cheng, C.T. Operation rule derivation of hydropower reservoir by k-means clustering method and extreme learning machine based on particle swarm optimization. J. Hydrol. 2019, 576, 229–238. [Google Scholar] [CrossRef]
  70. Faris, H.; Mirjalili, S.; Aljarah, I.; Mafarja, M.; Heidari, A.A. Salp swarm algorithm: Theory, literature review, and application in extreme learning machines. In Nature-Inspired Optimizers; Springer: Cham, Switzerland, 2020; pp. 185–199. [Google Scholar]
  71. Chen, H.; Zhang, Q.; Luo, J.; Xu, Y.; Zhang, X. An enhanced bacterial foraging optimization and its application for training kernel extreme learning machine. Appl. Soft Comput. 2020, 86, 105884. [Google Scholar] [CrossRef]
  72. Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report; Erciyes University: Kayseri, Turkey, 2005. [Google Scholar]
  73. Tuba, M.; Bacanin, N. Artificial Bee Colony Algorithm Hybridized with Firefly Algorithm for Cardinality Constrained Mean-Variance Portfolio Selection Problem. Appl. Math. Inf. Sci. 2014, 8, 2831–2844. [Google Scholar] [CrossRef]
  74. Mirjalili, S. SCA: A Sine Cosine Algorithm for solving optimization problems. Knowl. Based Syst. 2016, 96, 120–133. [Google Scholar] [CrossRef]
  75. Bačanin Dzakula, N. Unapređenje Hibridizacijom Metaheuristika Inteligencije Rojeva za Resavanje Problema Globalne Optimizacije. Ph.D. Thesis, Univerzitet u Beogradu-Matematički fakultet, Beograd, Serbia, 2015. [Google Scholar]
  76. Talbi, E.G. Combining metaheuristics with mathematical programming, constraint programming and machine learning. Ann. Oper. Res. 2016, 240, 171–215. [Google Scholar] [CrossRef]
  77. Bacanin, N.; Tuba, M.; Strumberger, I. RFID Network Planning by ABC Algorithm Hybridized with Heuristic for Initial Number and Locations of Readers. In Proceedings of the 2015 17th UKSim-AMSS International Conference on Modelling and Simulation (UKSim), Cambridge, UK, 25–27 March 2015; pp. 39–44. [Google Scholar] [CrossRef]
  78. Attiya, I.; Abd Elaziz, M.; Abualigah, L.; Nguyen, T.N.; Abd El-Latif, A.A. An Improved Hybrid Swarm Intelligence for Scheduling IoT Application Tasks in the Cloud. IEEE Trans. Ind. Infor. 2022. [Google Scholar] [CrossRef]
  79. Wu, X.; Li, R.; Chu, C.H.; Amoasi, R.; Liu, S. Managing pharmaceuticals delivery service using a hybrid particle swarm intelligence approach. Ann. Oper. Res. 2022, 308, 653–684. [Google Scholar] [CrossRef]
  80. Bezdan, T.; Cvetnic, D.; Gajic, L.; Zivkovic, M.; Strumberger, I.; Bacanin, N. Feature Selection by Firefly Algorithm with Improved Initialization Strategy. In Proceedings of the 7th Conference on the Engineering of Computer Based Systems, Novi Sad, Serbia, 26–27 May 2021; pp. 1–8. [Google Scholar]
  81. Caponetto, R.; Fortuna, L.; Fazzino, S.; Xibilia, M.G. Chaotic sequences to improve the performance of evolutionary algorithms. IEEE Trans. Evol. Comput. 2003, 7, 289–304. [Google Scholar] [CrossRef]
  82. Wang, M.; Chen, H. Chaotic multi-swarm whale optimizer boosted support vector machine for medical diagnosis. Appl. Soft Comput. 2020, 88, 105946. [Google Scholar] [CrossRef]
  83. Kose, U. An ant-lion optimizer-trained artificial neural network system for chaotic electroencephalogram (EEG) prediction. Appl. Sci. 2018, 8, 1613. [Google Scholar] [CrossRef] [Green Version]
  84. Yu, H.; Zhao, N.; Wang, P.; Chen, H.; Li, C. Chaos-enhanced synchronized bat optimizer. Appl. Math. Model. 2020, 77, 1201–1215. [Google Scholar] [CrossRef]
  85. Rahnamayan, S.; Tizhoosh, H.R.; Salama, M.M.A. Quasi-oppositional Differential Evolution. In Proceedings of the 2007 IEEE Congress on Evolutionary Computation, Singapore, 25–28 September 2007; pp. 2229–2236. [Google Scholar] [CrossRef]
  86. Yang, X.S.; He, X. Firefly algorithm: Recent advances and applications. Int. J. Swarm Intell. 2013, 1, 36–50. [Google Scholar] [CrossRef] [Green Version]
  87. Yang, X.S. Bat algorithm for multi-objective optimisation. Int. J. Bio-Inspired Comput. 2011, 3, 267–274. [Google Scholar] [CrossRef]
  88. Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
  89. Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
  90. Friedman, M. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  91. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar]
  92. Dhanabal, L.; Shantharajah, S. A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 2015, 4, 446–452. [Google Scholar]
  93. Protić, D.D. Review of KDD Cup’99, NSL-KDD and Kyoto 2006+ datasets. Vojnoteh. Glas. 2018, 66, 580–596. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Overview for the proposed ELM-MS-AFS approach.
Figure 1. Overview for the proposed ELM-MS-AFS approach.
Sensors 22 04204 g001
Figure 2. Distribution of classes in Diabetes, Disease, Iris, Wine, and Wine Quality datasets before split.
Figure 2. Distribution of classes in Diabetes, Disease, Iris, Wine, and Wine Quality datasets before split.
Sensors 22 04204 g002
Figure 3. Distribution of classes in Satellite and Shuttle datasets with predetermined training and testing subsets.
Figure 3. Distribution of classes in Satellite and Shuttle datasets with predetermined training and testing subsets.
Sensors 22 04204 g003
Figure 4. Graphs for convergence speed evaluation on seven observed datasets for 30, 60, and 90 neurons, for the proposed method vs. other approaches.
Figure 4. Graphs for convergence speed evaluation on seven observed datasets for 30, 60, and 90 neurons, for the proposed method vs. other approaches.
Sensors 22 04204 g004aSensors 22 04204 g004b
Figure 5. Generated confusion matrices and PR curves for some datasets by ELM-MS-AFS.
Figure 5. Generated confusion matrices and PR curves for some datasets by ELM-MS-AFS.
Sensors 22 04204 g005
Figure 6. Distribution of classes in NSL-KDD dataset with predetermined training and testing subsets.
Figure 6. Distribution of classes in NSL-KDD dataset with predetermined training and testing subsets.
Sensors 22 04204 g006
Figure 7. Graphs for convergence speed evaluation on four observed datasets for 30, 60, and 90 neurons, for the proposed ELM-MS-AFS vs. other hybrid approaches.
Figure 7. Graphs for convergence speed evaluation on four observed datasets for 30, 60, and 90 neurons, for the proposed ELM-MS-AFS vs. other hybrid approaches.
Sensors 22 04204 g007
Figure 8. Generated PR curves for NSL-KDD dataset by hybrid methods.
Figure 8. Generated PR curves for NSL-KDD dataset by hybrid methods.
Sensors 22 04204 g008
Table 1. Datasets used in the conducted experiments.
Table 1. Datasets used in the conducted experiments.
DatasetSamplesTraining DataTesting DataAttributesClasses
Wine Quality15991120479116
Table 2. Accuracy comparative analysis for simulations with 30 neurons.
Table 2. Accuracy comparative analysis for simulations with 30 neurons.
DiabetesDiseaseIrisWineWine QualitySatelliteShuttle
ELMbest (%)73.5987.998098.156077.2484.12
worst (%)61.9079.8766.6783.3354.3768.4810.39
mean (%)71.2284.6772.1892.525773.6855.44
ELM-IWObest (%)80.9589.94100.00100.0062.9282.0493.13
worst (%)80.0988.96100.00100.0062.2981.5991.14
mean (%)80.5289.61100.00100.0062.7681.7992.20
ELM-WOAbest (%)80.5289.61100.0098.1562.9281.5992.90
worst (%)79.2287.01100.0096.3061.6780.3488.43
mean (%)79.8788.72100.0097.6962.2980.9490.93
ELM-HHObest (%)79.6589.29100.00100.0062.7182.0993.85
worst (%)79.2287.34100.0094.4461.2580.8988.08
mean (%)79.4488.56100.0098.1561.8281.4491.24
ELM-BAbest (%)80.5288.31100.00100.0063.7581.4490.98
worst (%)80.0987.66100.00100.0061.6780.6489.86
mean (%)80.3088.07100.00100.0062.6681.0990.39
ELM-SCAbest (%)81.3989.94100.00100.0063.1381.8992.97
worst (%)80.9588.31100.00100.0061.6781.4989.75
mean (%)81.1789.12100.00100.0062.5081.7291.83
ELM-FAbest (%)80.9589.61100.0098.1562.5081.2491.97
worst (%)80.9587.66100.0096.3061.4680.5489.95
mean (%)80.9588.72100.0097.2261.9880.9091.11
ELM-ABCbest (%)81.3990.58100.00100.0062.5081.9992.28
worst (%)80.0988.64100.00100.0060.2181.2988.82
mean (%)80.7489.37100.00100.0061.7281.7390.65
ELM-MS-AFSbest (%)86.1592.53100.00100.0066.2584.5998.67
worst (%)83.1290.91100.00100.0065.6382.9997.71
mean (%)84.6391.80100.00100.0065.9983.6798.21
Table 3. Accuracy comparative analysis for simulations with 60 neurons.
Table 3. Accuracy comparative analysis for simulations with 60 neurons.
DiabetesDiseaseIrisWineWine QualitySatelliteShuttle
ELMbest (%)69.6989.9371.1194.4458.9680.0990.00
worst (%)55.8483.124081.4852.2974.593.04
mean (%)64.8586.5254.489.4855.8778.1142.43
ELM-IWObest (%)77.4989.94100.00100.0062.9283.5994.64
worst (%)77.4988.96100.00100.0061.6783.3488.22
mean (%)77.4989.45100.00100.0062.2483.4791.42
ELM-WOAbest (%)79.2288.96100.00100.0062.2983.3494.45
worst (%)77.4988.31100.0098.1561.8882.6491.52
mean (%)78.3588.64100.0099.0762.0882.9992.67
ELM-HHObest (%)79.2290.26100.00100.0063.5483.3493.43
worst (%)77.9287.99100.0098.1562.0883.3489.32
mean (%)78.5789.29100.0099.0763.0783.3491.90
ELM-BAbest (%)79.6588.96100.00100.0063.5483.3990.61
worst (%)78.3588.3197.7898.1562.2982.8487.46
mean (%)79.0088.5697.7899.5462.7683.1288.79
ELM-SCAbest (%)79.2289.61100.00100.0062.9283.8490.83
worst (%)77.4988.64100.0098.1561.2583.5489.46
mean (%)78.3589.29100.0099.5462.1983.6990.10
ELM-FAbest (%)78.7989.61100.00100.0063.3382.6491.59
worst (%)77.4988.64100.00100.0062.0882.5485.79
mean (%)78.1489.29100.00100.0062.4582.5988.67
ELM-ABCbest (%)79.2289.94100.00100.0062.7183.5496.77
worst (%)79.2289.29100.00100.0062.2983.3487.72
mean (%)79.2289.61100.00100.0062.5583.4491.83
ELM-MS-AFSbest (%)82.6894.16100.00100.0068.1386.8997.68
worst (%)80.5291.88100.00100.0066.8885.8984.74
mean (%)81.6092.86100.00100.0067.6086.3991.62
Table 4. Accuracy comparative analysis for simulations with 90 neurons.
Table 4. Accuracy comparative analysis for simulations with 90 neurons.
DiabetesDiseaseIrisWineWine QualitySatelliteShuttle
ELMbest (%)70.5692.5375.5590.7461.0480.6480.42
worst (%)61.0483.4457.7840.7446.0477.594.70
mean (%)65.5687.8367.2971.4853.7979.4344.12
ELM-IWObest (%)80.0994.4897.78100.0062.9285.3991.84
worst (%)79.6592.8697.78100.0061.2584.8489.19
mean (%)79.8793.3497.78100.0062.2485.1290.29
ELM-WOAbest (%)79.6593.5197.78100.0062.7184.8492.49
worst (%)79.2292.5397.78100.0060.2183.9489.05
mean (%)79.4493.1897.78100.0061.6184.3990.27
ELM-HHObest (%)80.5293.51100.00100.0064.1784.3493.00
worst (%)77.2292.5397.78100.0061.2584.0484.74
mean (%)79.8792.8698.33100.0061.6084.1987.72
ELM-BAbest (%)79.2294.4897.78100.0062.0885.3991.63
worst (%)78.3588.3197.78100.0061.6782.8488.12
mean (%)79.0088.5697.78100.0062.6683.1290.29
ELM-SCAbest (%)79.6593.5197.78100.0063.3385.4989.29
worst (%)79.6592.5397.78100.0061.8884.5985.52
mean (%)79.6593.0297.78100.0062.3485.0486.87
ELM-FAbest (%)80.9594.16100.00100.0061.8884.4492.39
worst (%)79.6592.8697.78100.0060.4282.5490.99
mean (%)80.3093.5998.33100.0061.0982.5991.70
ELM-ABCbest (%)79.6592.8697.78100.0062.2984.6496.47
worst (%)78.3589.2997.78100.0061.0483.3489.68
mean (%)79.0089.6197.78100.0061.5683.4492.61
ELM-MS-AFSbest (%)84.4296.7597.78100.0068.3388.1997.62
worst (%)82.6895.1397.78100.0066.2587.7493.13
mean (%)83.5595.7097.78100.0067.4087.9794.70
Table 5. Precision, recall, and f1-score comparative analysis for simulations with 30 neurons.
Table 5. Precision, recall, and f1-score comparative analysis for simulations with 30 neurons.
DiabetesDiseaseIrisWineWine QualitySatelliteShuttle
ELM-IWOaccuracy (%)80.9589.94100.00100.0062.9282.0493.13
ELM-WOAaccuracy (%)80.5289.61100.0098.1562.9281.5992.90
ELM-HHOaccuracy (%)79.6589.29100.00100.0062.7182.0993.85
ELM-BAaccuracy (%)80.5288.31100.00100.0063.7581.4490.98
ELM-SCAaccuracy (%)81.3989.94100.00100.0063.1381.8992.97
ELM-FAaccuracy (%)80.9589.61100.0098.1562.5081.2491.97
ELM-ABCaccuracy (%)81.3990.58100.00100.0062.5081.9992.28
ELM-MS-AFSaccuracy (%)86.1592.53100.00100.0066.2584.5998.67
Table 6. Precision, recall, and f1-score comparative analysis for simulations with 60 neurons.
Table 6. Precision, recall, and f1-score comparative analysis for simulations with 60 neurons.
DiabetesDiseaseIrisWineWine QualitySatelliteShuttle
ELM-IWOaccuracy (%)77.4989.94100.00100.0062.9283.5994.64
ELM-WOAaccuracy (%)79.2288.96100.00100.0062.2983.3494.45
ELM-HHOaccuracy (%)79.2290.26100.00100.0063.5483.3493.43
ELM-BAaccuracy (%)79.6588.96100.00100.0063.5483.3990.61
ELM-SCAaccuracy (%)79.2289.61100.00100.0062.9283.8490.83
ELM-FAaccuracy (%)78.7989.61100.00100.0063.3382.6491.59
ELM-ABCaccuracy (%)79.2289.94100.00100.0062.7183.5496.77
ELM-MS-AFSaccuracy (%)82.6894.16100.00100.0068.1386.8997.68
Table 7. Precision, recall and f1-score comparative analysis for simulations with 90 neurons.
Table 7. Precision, recall and f1-score comparative analysis for simulations with 90 neurons.
DiabetesDiseaseIrisWineWine QualitySatelliteShuttle
ELM-IWOaccuracy (%)80.0994.4897.78100.0062.9285.3991.84
ELM-WOAaccuracy (%)79.6593.5197.78100.0062.7184.8492.49
ELM-HHOaccuracy (%)80.5293.51100.00100.0064.1784.3493.00
ELM-BAaccuracy (%)79.2294.4897.78100.0062.0885.3991.63
ELM-SCAaccuracy (%)79.6593.5197.78100.0063.3385.4989.29
ELM-FAaccuracy (%)80.9594.16100.00100.0061.8884.4492.39
ELM-ABCaccuracy (%)79.6592.8695.56100.0062.2984.6496.47
ELM-MS-AFSaccuracy (%)84.4296.7597.78100.0068.3388.1997.62
Table 8. Friedman Aligned test ranks for the compared algorithms.
Table 8. Friedman Aligned test ranks for the compared algorithms.
Wine Quality49552752842351
Table 9. Results of the Holm’s step-down procedure.
Table 9. Results of the Holm’s step-down procedure.
Comparisonp-ValueRank0.05/( k i )0.1/( k i )
MS-AFS vs. ABC 1.92 × 10 3 00.0071430.014286
MS-AFS vs. BA 1.92 × 10 3 10.0083330.016667
MS-AFS vs. FA 3.76 × 10 3 20.010.02
MS-AFS vs. WOA 3.76 × 10 3 30.01250.025
MS-AFS vs. SCA 5.17 × 10 3 40.0166670.033333
MS-AFS vs. IWO 2.17 × 10 2 50.0250.05
MS-AFS vs. HHO 4.04 × 10 2 60.050.1
Table 10. Accuracy comparative analysis—ELM-MS-AFS vs. hybrids.
Table 10. Accuracy comparative analysis—ELM-MS-AFS vs. hybrids.
Wine QualitySatelliteShuttleNSL-KDD
Results for ELM with 30 neurons
ELM-ABC-FAbest (%)65.2183.0497.0877.43
worst (%)63.5482.6992.9673.96
mean (%)64.1782.8795.3275.26
ELM-ABC-SCAbest (%)63.3382.9497.1777.24
worst (%)62.2982.1984.7272.90
mean (%)62.8582.5191.0975.41
ELM-FA-SCAbest (%)65.2183.1497.8875.60
worst (%)62.5082.5296.8175.14
mean (%)63.6182.9197.4075.45
ELM-MS-AFSbest (%)66.2584.5998.6779.66
worst (%)65.6382.9997.7176.59
mean (%)65.9983.6798.2177.74
Results for ELM with 60 neurons
ELM-ABC-FAbest (%)65.2186.6991.6277.16
worst (%)60.8384.6985.5774.53
mean (%)63.6585.5489.5375.72
ELM-ABC-SCAbest (%)65.6385.5996.1573.07
worst (%)62.0884.5492.1571.77
mean (%)63.5484.9493.8872.35
ELM-FA-SCAbest (%)66.0486.3496.5178.88
worst (%)61.6784.8491.7075.11
mean (%)64.2285.8393.9776.84
ELM-MS-AFSbest (%)68.1386.8997.6880.29
worst (%)66.8885.8984.7475.55
mean (%)67.6086.3991.6278.42
Results for ELM with 90 neurons
ELM-ABC-FAbest (%)68.1387.0495.2175.96
worst (%)63.9685.1990.9673.59
mean (%)66.3086.0392.8074.95
ELM-ABC-SCAbest (%)66.4686.1997.5271.58
worst (%)64.3885.4490.6669.47
mean (%)65.5785.7895.0170.87
ELM-FA-SCAbest (%)67.7187.3493.1776.16
worst (%)66.4687.1983.2774.68
mean (%)67.0387.2684.7375.62
ELM-MS-AFSbest (%)68.3388.1997.6279.52
worst (%)66.2587.7493.1375.34
mean (%)67.4087.9794.7077.43
Table 11. Precision, recall, and f1-score comparative analysis—ELM-MS-AFS vs. hybrids.
Table 11. Precision, recall, and f1-score comparative analysis—ELM-MS-AFS vs. hybrids.
Wine QualitySatelliteShuttleNSL-KDD
Results for ELM with 30 neurons
ELM-ABC-FAaccuracy (%)65.2183.0497.0877.43
precision (%)0.3270.8190.4080.473
recall (%)0.3260.7640.4200.483
ELM-ABC-SCAaccuracy (%)63.3382.9497.1777.24
precision (%)0.3200.8730.4040.470
recall (%)0.3190.7650.4170.511
ELM-FA-SCAaccuracy (%)65.2183.1497.8875.60
precision (%)0.3280.8820.4130.453
recall (%)0.3240.7680.4220.490
ELM-MS-AFSaccuracy (%)66.2584.5998.6779.66
precision (%)0.5270.8410.5640.492
recall (%)0.3700.7930.4360.518
Results for ELM with 60 neurons
ELM-ABC-FAaccuracy (%)65.2186.6991.6277.16
precision (%)0.3280.8630.5370.485
recall (%)0.3050.8350.3570.499
ELM-ABC-SCAaccuracy (%)65.6385.5996.1573.07
precision (%)0.4050.8530.4030.461
recall (%)0.3050.8080.4140.477
ELM-FA-SCAaccuracy (%)66.0486.3496.5178.88
precision (%)0.4950.8550.4120.485
recall (%)0.3110.8250.3830.521
ELM-MS-AFSaccuracy (%)68.1386.8997.6880.29
precision (%)0.5500.8660.4120.486
recall (%)0.3570.8290.4200.540
Results for ELM with 90 neurons
ELM-ABC-FAaccuracy (%)68.1387.0495.2175.96
precision (%)0.3770.8610.3890.495
recall (%)0.3420.8340.3700.498
ELM-ABC-SCAaccuracy (%)66.4686.1997.5271.58
precision (%)0.3700.8790.4110.475
recall (%)0.3400.8100.4180.452
ELM-FA-SCAaccuracy (%)67.7187.3493.1776.16
precision (%)0.4860.8650.4150.495
recall (%)0.4860.8390.3670.495
ELM-MS-AFSaccuracy (%)68.3388.1997.6279.52
precision (%)0.3450.8780.4450.512
recall (%)0.3210.8530.4900.525
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bacanin, N.; Stoean, C.; Zivkovic, M.; Jovanovic, D.; Antonijevic, M.; Mladenovic, D. Multi-Swarm Algorithm for Extreme Learning Machine Optimization. Sensors 2022, 22, 4204.

AMA Style

Bacanin N, Stoean C, Zivkovic M, Jovanovic D, Antonijevic M, Mladenovic D. Multi-Swarm Algorithm for Extreme Learning Machine Optimization. Sensors. 2022; 22(11):4204.

Chicago/Turabian Style

Bacanin, Nebojsa, Catalin Stoean, Miodrag Zivkovic, Dijana Jovanovic, Milos Antonijevic, and Djordje Mladenovic. 2022. "Multi-Swarm Algorithm for Extreme Learning Machine Optimization" Sensors 22, no. 11: 4204.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop