Towards an Optimal KELM Using the PSO-BOA Optimization Strategy with Applications in Data Classification

The features of the kernel extreme learning machine—efficient processing, improved performance, and less human parameter setting—have allowed it to be effectively used to batch multi-label classification tasks. These classic classification algorithms must at present contend with accuracy and space–time issues as a result of the vast and quick, multi-label, and concept drift features of the developing data streams in the practical application sector. The KELM training procedure still has a difficulty in that it has to be repeated numerous times independently in order to maximize the model’s generalization performance or the number of nodes in the hidden layer. In this paper, a kernel extreme learning machine multi-label data classification method based on the butterfly algorithm optimized by particle swarm optimization is proposed. The proposed algorithm, which fully accounts for the optimization of the model generalization ability and the number of hidden layer nodes, can train multiple KELM hidden layer networks at once while maintaining the algorithm’s current time complexity and avoiding a significant number of repeated calculations. The simulation results demonstrate that, in comparison to the PSO-KELM, BBA-KELM, and BOA-KELM algorithms, the PSOBOA-KELM algorithm proposed in this paper can more effectively search the kernel extreme learning machine parameters and more effectively balance the global and local performance, resulting in a KELM prediction model with a higher prediction accuracy.


Introduction
Data classification is one of the most important research hotspots in the field of hightech at present.It uses certain features to discriminate or classify a group of objects.The information involved in data classification often has the characteristics of high dimensions, many influencing factors, and complex relationships [1].It is often difficult to effectively determine its laws by human thinking alone, and it needs to be completed through certain mathematical methods with the help of computers.How to discover more and more valuable, associated information from this complex data information, find its internal laws, and establish a model that can better reflect the actual characteristics of the research object, be easily integrated with prior knowledge, and be adaptable to large-scale data-processing requirements are gradually becoming the focus of current data classification [2].Multilabel classification, or MLC, or the study of a thing according to many class label ideas, has become particularly significant in order to address the inadequacies of conventional single-label classification.The multi-label data flow exhibits the features of vast speed and idea drift in the sphere of practical application, making the conventional multi-label classification algorithm unable to directly address such issues.Designing a reliable multilabel data flow classification approach has therefore emerged as a crucial and difficult challenge due to the need to process these new data fast while having limited time and memory as well as to adjust the concept drift in the data flow environment [3].
In engineering applications, learning for data classification is very desirable.Traditional learning techniques, such as artificial neural networks (ANN) and support vector machines (SVM), appear helpless in the face of the twin needs of quicker training speed and greater learning accuracy [4,5].Huang et al.'s extreme learning machine (ELM) is a single-layer feedforward network learning technique (single-hidden layer feedforward neural networks, SLFNs) that does not need changing the hidden layer neuron network settings.It offers quick training and effective learning outcomes, which are important characteristics.Due to this, the distributed extreme learning machine (D-ELM) avoids reading all samples into memory at once by partitioning matrix operations, but it also resolves the issue of memory shortage while training vast amounts of sample data [6,7].Every time they run, they only train an ELM network with a certain amount of hidden layer nodes.Similar to cross-validation, they do not take into account the generalization capacity under various training and test set divisions.They want to improve their generalization skills and accuracy.Numerous runs of the solution to must be performed.When comparing the generalization capacity or the network performance of various hidden layer node numbers, it is impossible to ensure that the hidden layer network parameters are consistent, making it difficult to accurately and intuitively learn their impact on the model [8].This is due to the random generation of hidden layer network parameters.

Problem Description and Research Motivation
In recent years, many scholars have applied neural network-based algorithms to data classification research, such as BP neural network, discrete Hopfield network, support vector machine, self-organizing network, fuzzy neural network, and generalized neural network, and achieved many results.Since most neural networks use the gradient descent method, there are often shortcomings, such as slow training speed, easy to fall into local minimum, and learning rate sensitivity.Therefore, it is necessary to explore a fast-training speed, accurate optimal solution, and good generalization The performance of the training algorithm is the main goal of improving the performance of the neural network.
The KELM model relies on both feature selection and parameter optimization, and these two processes are complementary and cooperative.To avoid overfitting and lower the computational cost of the training model, feature selection chooses the most pertinent and discriminative feature subset from the original feature space and eliminates redundant and unimportant features [9].A proper parameter setup can significantly enhance the KELM model's classification performance and yield superior classification outcomes.These two elements are taken into account during the design and are optimized simultaneously to increase the KELM model's capacity for generalization [10].
Due to its strong search capabilities, the meta-heuristic algorithm has received considerable attention recently.Numerous study findings have established that this algorithm is superior to conventional approaches for solving optimization issues [11].Researchers working on this project have to date suggested a few optimization strategies for parameter and feature selection.For instance, Alcin et al. introduced the genetic algorithm (GA) to KELM model in 2014 after using the GA method to improve the sparse output weight vector of the KELM model.The kernel KELM model's parameters were optimized using the particle swarm optimization (PSO) technique by Bin Li et al. in the same year [12,13].
The primary issues with KELM's categorization model are: The selection of pertinent parameters is a significant issue for both the KELM study model and the SVM model, although there are not many studies on this topic [14].
It is common to conduct feature selection and parameter optimization independently.In essence, the two cooperate and support one another.The best model cannot be assured if they are optimized individually [15].
A novel classification method for kernel extreme learning machines (PSOBOA-KELM) based on the modified PSO butterfly optimization algorithm is suggested in order to solve the aforementioned issues.This technique concurrently performs parameter optimization of the kernel ELM model and feature selection based on the enhanced butterfly algorithm (PSOBOA) to increase KELM's generalization performance [16,17].The particle swarm optimization algorithm is introduced at the same time to improve the GSA algorithm's performance in optimization by addressing the latter stages of iteration's slow convergence speed and weak local search ability, and the chaos control strategy is intended to broaden the group's diversity.The PSOBOA-KELM technique suggested in this work simultaneously maximizes the number of hidden layer nodes and generalization ability.It saves a vast amount of time by dividing training into segments based on samples.Thus, potential solutions should be investigated.

Contribution
Compared with the traditional method, the butterfly optimization algorithm optimized by particle swarm optimization proposed in this paper comprehensively considers issues, such as improving the classification accuracy of the algorithm and the generalization performance of KELM.The main contributions of this paper are as follows: 1.
Characterize the swarm intelligence optimization algorithm/butterfly optimization algorithm and classify the current data classification methods.

2.
Propose a novel data classification method of kernel extreme learning machine based on the butterfly optimization algorithm optimized by particle swarm optimization (PSOBOA-ELM).

3.
Provide extensive simulation results to demonstrate the use and efficiency of the proposed data classification method.

4.
Evaluate the performance of the proposed algorithms by comparing them with the data classification methods of other algorithms.
The remainder of this paper is organized as follows: Section 2 discusses the related work.Section 3 describes the basic principles of kernel extreme learning machine.Section 4 describes the principles of the butterfly optimization algorithm optimized using particle swarm optimization.Section 5 describes the implement steps of the proposed algorithm design idea of the kernel extreme learning machine method based on the butterfly optimization algorithm optimized by particle swarm optimization.Section 6 provides the parameters and simulation results that validate the performance of the proposed algorithm.Section 7 concludes the paper.

Related Work
Data categorization is frequently employed as the fundamental processing technique in contemporary intelligent data processing.Machine learning is a powerful tool for achieving the objective of data processing since it makes use of data sets to create classification models with great generalization capabilities.
Support vector machines are at present being used by some academics to handle multi-instance learning challenges.The conventional SVM is particularly sensitive to noisy points and singular points in the sample.The CA-SVM based sentiment analysis model that Cyril et al. suggested uses automatic learning to read Twitter datasets, analyze them, and extract features to provide a list of phrases [18].More characteristics in the input electrocardiosignal (ECG) signal were classified using the SVM model and weighted kernel function approach by Varatharajan et al.At present, the existing multi-label classification methods mainly include: batch processing methods and online learning methods.Among them, the batch processing method defaults to the one-time arrival of each training and testing data set, and uses problem transformation and algorithm self-adaptation to solve multi-label classification problems based on all existing information.The extreme learning machine (ELM) proposed by Huang and its improved algorithm have the characteristics of high speed and high efficiency, avoiding the cumbersome iterative learning process [19].The random setting of learning parameters caused by the iterative learning of the traditional feed-forward neural network can easily encounter problems, such as the local minima, while improving the algorithm can further improve the classification accuracy.Therefore, related research based on (kernel) extreme learning machines has been widely applied to multi-label classification problems, and a series of results has been achieved [20].However, due to the characteristics of massive and fast data streams emerging in the practical application field, it is difficult to obtain them all at once.At the same time, when new data arrives, these batch-processing algorithms continue to retrain new data and discard old models, resulting in a large loss of effective historical data [21].Therefore, learning models that can handle data stream environments are receiving more and more attention.
Ensemble techniques have been one of the most significant advancements in machine learning over the past ten years.In actuality, the kernel function is utilized as a crucial theoretical tool in the data preprocessing since the data in the data set are linearly indistinguishable, making it necessary to build an appropriate data classification procedure.The goal of the kernel function is to discover the classification hyperplane of the low-dimensional, indistinguishable data in the new high-dimensional space by nonlinear transformation, allowing for the separation of the data.The construction and parameter selection of the kernel function are at present its key areas of emphasis.To address the multi-class unbalanced data classification problem, Zhang et al. suggested a support vector machine (SVM) technique based on a proportional kernel and proposed a scaling kernel function, which employs weighting factors to compute its parameters.The issue of skewed distribution-induced classifier performance reduction has a high degree of generalizability [22].To discriminate between various kinds of ground objects, Chen et al. presented a novel hybrid kernel function SVM point cloud classification technique, and they created a Gaussian and polynomial hybrid kernel function to increase the classification accuracy [23].Xie et al. calculated the similarity of samples with several unknown attributes using the characteristics of kernel functions.To compute the kernel function and solve additive kernel singular values, an effective technique was also shown [24].Zhang et al. developed a novel conformal function to scale the kernel matrix of ODM in order to increase the separability of the training data in the feature space.They also presented a kernel modified ODM kernel function (KMODM) to remove the unbalanced data classification approach [25].When tackling small sample, nonlinear, and high dimensional problems, it demonstrates several distinct benefits and may be successfully applied to various machine learning disciplines.
Although it has been demonstrated that the Ada Boost approach, which uses a neural network as the basic classifier, has a high generalization performance, training is still not without its challenges.Diversity has undergone extensive research as a crucial component in the generalization performance of classifiers, and certain approaches to assess diversity have also been presented.In order to overcome the limitations of fixed representations, Deng et al. used deep learning to perform large-scale task-driven feature learning from big data.They also demonstrated its utility in image classification, high-frequency financial data prediction, and brain MRI And how well these three duties may be divided [26].Saritas et al. assessed the classification performance of a Bayesian classifier and an artificial neural network applied to nine inputs and one output and compared the findings [27].By fusing morphological neurons with perceptrons, Gerardo et al. suggested two novel hybrid neural architectures.They then evaluated them using 25 low-dimensional standard data sets and a large data set.The suggested approach achieved improved accuracy while using fewer learning parameters [28].In contrast to conventional techniques and other state-of-the-art techniques, Wu et al. used a convolutional recurrent neural network (CRNN) to learn more discriminative features for hyperspectral data classification, using recurrent layers to further extract spectral context information from features generated by convolutional layers.The suggested technique offers improved classification performance for hyperspectral data classification when compared to deep learning methods [29].
At present, some achievements have used the sliding window technology to apply extreme learning machines to solve the multi-label classification of data streams, but this method does not consider the problem of class label correlation between multiple labels and concept drift in the data stream environment.On the other hand, some researchers pointed out that, when dealing with data streams, it is necessary to consider the model to make accurate predictions under limited time and memory and include solutions to the problem of concept drift.These requirements pose more challenges to the classification of multi-label data streams.Most of the multi-label classification algorithms in the data flow environment use problem transformation to convert classification into a series of stable learning tasks.Although this method can be applied to a certain extent, it ignores the correlation between labels.At the same time, it does not take into account the highspeed and changeable characteristics of the newly arrived data, and the implicit concept drift problem is also difficult to solve by the problem transformation method.One of the challenging issues in data mining is data categorization, a topic that has attracted considerable attention from both domestic and international scholars studying artificial intelligence.It is required to fundamentally optimize the imbalanced data in order to address the issue of unbalanced data categorization.At present, the outcomes of academic research are improving yearly.Nearly 900 scholarly publications on unbalanced data categorization were published between May 2018 and 2022, a significant increase over the preceding ten years.

Kernel Extreme Learning Machine
The extreme learning machine consists of three parts: input layer, hidden layer, and output layer.For a given training sample, continuously optimize the input weights and bias values between the connected input layer and the hidden layer, and maintain them unchanged during the training process.Assume a training sample set of {x i , c i }, i = 1, 2, . . ., N is given, where x i is the input value of the training sample and c i is the corresponding output value.Let the limit learning machine have h hidden layer nodes, the network output is f, and g(*) is the activation function; then, the input and output model of the limit learning machine can be expressed by Formula (1) [30].
In the formula, the output weight of the input node and the i-th hidden layer node are represented by β i .The input weight of the i-th hidden node and the input node are represented by ω i .The offset value of the i-th hidden node is represented by b i .
The output weight can be expressed by Formula (3).
In the equation, H * is the inverse of matrix H.
Replace the hidden layer in the extreme learning machine with the idea of kernel function mapping in the support vector machine.Then, the kernel extreme learning machine can be expressed by Formula (4) [31].
Therefore, the input and output model of the kernel extreme learning machine is as shown in Formula (5).
Define the extreme learning machine kernel matrix as Formula ( 6) [32].
The corresponding input-output model can be expressed as shown in Formula (7).
The feature mapping h(x) of the hidden layer is unknown in kernel limit learning machines, but it is usually calculated using the kernel K(µ, ν) (K(µ, ν) = exp(−γ µ − ν 2 )) to reduce the impact of poor classification results caused by the unreasonable setting of the number of hidden layer nodes (the dimension of feature space).
As a result, kernel ELM has the benefit of effective ELM SVM classification.The number of hidden layer nodes need not be predetermined because KELM determines the hidden layer mapping kernel function in the form of an inner product by introducing a kernel function.As a result, the generalization of the KELM-based electric load forecasting model results in significantly increased capacity and stability.

Butterfly Optimization Algorithm
A novel meta-heuristic algorithm called the butterfly optimization algorithm (BOA) mimics the foraging and courting behavior of butterflies.It has a high resilience and global convergence ability while solving complicated functions [33,34].The BOA algorithm has two crucial inputs: switching probability and scent.The switching probability determines the likelihood that the butterfly will select one of two movement modes, either global or local, and the smell stands in for the quality of the particular butterfly's current position [35].The butterfly colony is first dispersed at random in the solution space, and the butterflies that have a strong scent draw other individual butterflies to them.The aim optimization is accomplished by consistently updating the butterfly colony's location [36].Each butterfly in the butterfly optimization algorithm has a distinct scent and perception ability, and the strength of smell perception varies between individuals.Formula (8) illustrates how strongly other butterflies can smell a person among them: Among them, f (x) represents the odor intensity function; c represents the sensory shape coefficient; I represents the stimulus intensity, that is, the fitness value of the function; and a represents the intensity coefficient, and the value is in [0, 1].
The sensory shape coefficient c can theoretically take any value within [0, ∞), and its calculation is shown in Formula (9): Among them, the initial value of c is 0.01, and T max is the maximum number of iterations of the algorithm.The BOA algorithm determines the global search and local search of the algorithm according to the switching probability p, and the position update formula is shown in Formula (10): Among them, g* is the best position of all butterflies in the current iteration; x t j and x t k represent the spatial positions of the j-th butterfly and the k-th butterfly in the t-th iteration, respectively; the value of r is a random number of [0,1] number; and f i is the fitness value of the i-th butterfly.

Particle Swarm Optimization Algorithm
A swarm intelligence optimization system called particle swarm mimics how birds fly while looking for food in a multidimensional search environment.Particle position and velocity are the two key aspects of the PSO method optimization [37,38].Each one of them is referred to as a particle, and each particle's initial position and velocity in the search space are initialized at random [39].The particles' positions and velocities are updated in accordance with Formulas (11) and (12): Among them, v t i and v i t+1 represent the velocities of the i-th particle at the t and t + 1 iterations, respectively; and p best and g best represent the initial and global optimal positions of particles, respectively.Generally, the hyperparameter c 1 = c 2 = 2; rand 1 and rand 2 are random numbers of (0, 1); and ω represents the inertia weight coefficient.

Butterfly Optimization Algorithm Optimized by PSO (PSOBOA)
(1) Algorithm population initialization Assume that, in the D-dimensional search space, the greedy strategy is used to generate a new race to generate the initial solution expression, which is shown in Formula (13): Among them, X i represents the spatial position of the i-th butterfly (i = 1, 2, 3, . . ., N) in the butterfly population, and N represents the number of initial solutions.L b and U b represent the upper and lower bounds of the search space, respectively; and O represents a matrix of random numbers with elements (0, 1).
(2) Algorithmic Global Search The global search phase of butterfly optimization algorithm optimized by PSO (PSOBOA) can be expressed by Formulas ( 14) and ( 15): Among them, ω represents the inertia weight coefficient, and V t i and V i t+1 represent the velocities of the i-th particle at time t and t + 1, respectively.The hyperparameter C 1 = C 2 = 2, and the values of r 1 and r 2 are random numbers (0, 1).
(3) Local search of algorithms The local search stage of the PSOBOA algorithm can be represented by Equations ( 16) and ( 17): Among them, X k t−1 and X J t−1 are the positions of the k-th and j-th butterflies randomly selected from the solution space of the t − 1 iteration, respectively; and ω represents the inertia weight coefficient.C 1 = C 2 = 2, r 1 , and r 2 are random numbers with the values of (0, 1).
(4) Control strategy Chaos theory has many applications in swarm intelligence optimization algorithm, such as chaos population initialization and chaos adjustment strategy of control parameters.Logistic mapping is a classical chaotic mapping method in chaos theory, and its expression is shown in Formula ( 18): Among them, l represents the number of iterations of the chaotic map and µ is the chaotic parameter, and its value is in [0,4].The chaotic sequence of logistic mapping is (0, 1); when µ = 4, the mapping produces a chaotic phenomenon.
The Lyapunov index is an important index to distinguish the characteristics of chaos.The larger the maximum Lyapunov exponent of the chaotic map, the more obvious its chaotic characteristics and the higher the degree of chaos.The index expression is shown in Formula (19): where λ represents the Lyapunov exponent; f (•) represents the first derivative of the chaotic mapping function; and n h represents the number of iterations of the chaotic mapping.
The expression of the sensory shape coefficient c in the PSOBOA algorithm is shown in Formula (20): The inertia weight coefficient ω has a direct impact on the particle flight speed of the PSO algorithm and can adjust the global search and local search capabilities of the algorithm.In this paper, an adaptive adjustment strategy was adopted, as shown in Formula (21): Among them, ω max = 0.9, ω min = 0.2, and T max is the maximum number of iterations of the algorithm.
(5) Algorithm complexity analysis Assuming that the number of populations of the algorithm is N, the dimension of the search space is D, and the maximum number of searches is T max , the complexity of PSOBOA includes: the initialization complexity of the population O(ND), the fitness value calculation complexity O(ND), the global and the position update complexity of local search O(N 2 logN), the fitness value sorting complexity of the algorithm O(N 2 ), and the control parameter update complexity of the algorithm O(ND).Then, the complexity of PSOBOA algorithm is shown in Formula (22): The algorithm time complexity of PSOBOA is shown in Formula (23):

Data Classification of KELM Based on PSOBOA Algorithm
The regularization coefficient C and the kernel function parameter S of the kernel extreme learning machine were optimized using the particle swarm optimization butterfly technique, which raises the network's classification recognition accuracy.We created the data categorization mathematical model after obtaining the optimal parameters.The following are the precise PSOBOA-KELM steps:

Benchmark Function Test
In order to test the performance of the PSOBOA algorithm, eight test functions were used for testing, and it was compared with the particle swarm optimization algorithm (PSO), crow search algorithm (CSA), binary bat algorithm (BBA), and butterfly optimization algorithm (butterfly optimization algorithm, BOA).The improved PSOBOA algorithm was compared and analyzed.The eight test functions in CEC2017 were all evaluated as minimization problems, which were divided into multimodal test functions, mixed functions, and composite functions.The test functions are shown in Table 1.For fair comparison, the solution dimension of all test functions was 30, the population size was set to 30, the search space was all [−100, 100], all algorithms were run independently on each test function 30 times, and the maximum number of iterations for each run was for 100.

Algorithm Simulation and Result Analysis 6.1. Benchmark Function Test
In order to test the performance of the PSOBOA algorithm, eight test functions were used for testing, and it was compared with the particle swarm optimization algorithm (PSO), crow search algorithm (CSA), binary bat algorithm (BBA), and butterfly optimization algorithm (butterfly optimization algorithm, BOA).The improved PSOBOA algorithm was compared and analyzed.The eight test functions in CEC2017 were all evaluated as minimization problems, which were divided into multimodal test functions, mixed functions, and composite functions.The test functions are shown in Table 1.For fair comparison, the solution dimension of all test functions was 30, the population size was set to 30, the search space was all [−100, 100], all algorithms were run independently on each test function 30 times, and the maximum number of iterations for each run was for 100.
In this paper, the results of PSO, CSA, BBA, BOA, and PSOBOA algorithms independently running 30 times on eight test functions were counted.The iterative calculation results of the test functions of the five algorithms are shown in Figure 2.

Function Equation Dimension Bounds Optimum
[−5.12, 5.12] 0 F5 In this paper, the results of PSO, CSA, BBA, BOA, and PSOBOA algorithms independently running 30 times on eight test functions were counted.The iterative calculation results of the test functions of the five algorithms are shown in Figure 2.  (e) F5.
(h) F8.It can be seen from Figure 2 that, when solving the test function, the optimization results of the BBA, BOA, and PSOBOA algorithms are not much different, but they are all significantly better than the PSO algorithm and CSA algorithm.When solving the multipeak test function, although the CSA algorithm achieves better results on the two test functions, according to the average ranking of the five algorithms on the multi-peak test function, the PSOBOA algorithm is better than the other four algorithms, the convergence speed is faster.When solving the mixed function, the PSOBOA algorithm achieved the best results on the test functions.When solving the composite function, the optimization effect of the PSOBOA algorithm is not significant compared with the BBA algorithm and BOA algorithm, but it can be seen from the comprehensive mean and standard deviation that the PSOBOA algorithm has high optimization and stable results.
At the same time, it can be seen from the experimental data in Figure 2 that for F3, F4, F6, and F8, PSOBOA has the strongest optimization performance, which is obviously better than PSO, CSA, BOA, and BBA, and F1, F2, F3, and F4 can directly find the optimal value of 0. For F7, the optimization performance of PSOBOA and BOA is almost the same, the average of optimization is slightly better than BBA, and the effect of PSO is the worst.For F6, the optimization performance of PSOBOA algorithm is obviously better than that of PSO, CSA, BOA and BBA, and the optimization stability is the best.The above analysis shows that the overall optimization ability of PSOBOA is better than that of PSO, CSA, BOA, and BBA.

Simulation Environment Construction
The proposed algorithm was tested using the database's standard classification data set, and a number of comparisons and experiments were conducted with conventional It can be seen from Figure 2 that, when solving the test function, the optimization results of the BBA, BOA, and PSOBOA algorithms are not much different, but they are all significantly better than the PSO algorithm and CSA algorithm.When solving the multi-peak test function, although the CSA algorithm achieves better results on the two test functions, according to the average ranking of the five algorithms on the multi-peak test function, the PSOBOA algorithm is better than the other four algorithms, the convergence speed is faster.When solving the mixed function, the PSOBOA algorithm achieved the best results on the test functions.When solving the composite function, the optimization effect of the PSOBOA algorithm is not significant compared with the BBA algorithm and BOA algorithm, but it can be seen from the comprehensive mean and standard deviation that the PSOBOA algorithm has high optimization and stable results.
At the same time, it can be seen from the experimental data in Figure 2 that for F3, F4, F6, and F8, PSOBOA has the strongest optimization performance, which is obviously better than PSO, CSA, BOA, and BBA, and F1, F2, F3, and F4 can directly find the optimal value of 0. For F7, the optimization performance of PSOBOA and BOA is almost the same, the average of optimization is slightly better than BBA, and the effect of PSO is the worst.For F6, the optimization performance of PSOBOA algorithm is obviously better than that of PSO, CSA, BOA and BBA, and the optimization stability is the best.The above analysis shows that the overall optimization ability of PSOBOA is better than that of PSO, CSA, BOA, and BBA.

Simulation Environment Construction
The proposed algorithm was tested using the database's standard classification data set, and a number of comparisons and experiments were conducted with conventional PSO-KELM [40], BBA-KELM [41], BOA-KELM, and other algorithms in order to confirm its viability and effectiveness.Windows 10 (64-bit), MATLAB 2020b, a 12th Gen Intel(R) Core(TM) i9-12900 CPU running at 3.20 GHz, and 32G of RAM were used as the simulation experiment setting.
We evaluated the classification performance of this method, which are classification accuracy (ACC), sensitivity (SEN), specificity (SPE), precision(PRE), and F-measure, which are defined as follows: Accuracy is the proportion of the total number of correct predictions.Use the following methods to determine it: Sensitivity is an index used to measure the classifier's recognition of abnormal records, and is also often expressed as the TP rate.
Specificity is often used to estimate the ability of a classification model to identify normal examples, which is also often expressed as the TN rate.
Precision is the correct proportion of positive instances of prediction, calculated using: Among them, TP, FP, TN, and FN represent true positive, false positive, true negative, and false positive, respectively.
Lewis and Gale proposed the F-measure in 1994, which is defined as follows: In Equation ( 28), there is a value from 0 to infinity to control the weights assigned to the precision and sensitivity.If all positive instances are classified incorrectly, any classifier evaluated using the above will have a metric of 0. In this experiment, the β value was set to 1.

Algorithm Test Comparison and Result Analysis
In order to verify the effectiveness of the proposed method, this part experiments on the PSOBOA-KELM algorithm on seven classification data sets, which are BreastEW, CongressEW, Hepatitis, JPNdata, Parkinson, SpectEW, and Wdbc.The data sets are from the UCI Machine Learning Library (http://archive.ics.uci.edu/mL/datasets,accessed on 1 October 2022).These data sets are mainly divided into binary classification problems, multi-classification problems, and regression fitting problems.The Breastcancer dataset has 699 data, including 9 features and two categories; the Parkinson dataset has 195 data, including 23 features and two categories; the BreastEW dataset has 569 data, including 30 features and two categories; and the Dermatology dataset has 358 data, including 35 features and six categories.The experiments selected seven real datasets widely used for multi-label classification.The learning factor (c 1 = c 2 = 2) in the important parameters of the particle swarm optimization algorithm was the inertia weight factor w 1 = 0.9 and w 2 = 0.4.Table 2 summarizes the data size, attribute dimension, number of tags, and cardinality of the seven datasets.The specific description information of the dataset is shown in Table 2.The data set had to be preprocessed before the experiment, and certain features were missing.These records were averaged in this experiment to guarantee the accuracy of the sample data.To reduce the gap between the eigenvalues and prevent the larger eigenvalues from adversely affecting the smaller eigenvalues, we normalized each eigenvalue to the [−1, 1] interval.The normalized calculation formula is: where x is the original value of the data, x is the normalized value, max a is the maximum value in feature a, and max a is the minimum value in feature a.
At the same time, in order to obtain an unbiased estimate of the algorithm's generalization accuracy, k-fold CV is generally used to evaluate the classification accuracy.In this method, all test sets are independent, which can improve the reliability of the results.In this study, the k value was set to 10, that is, each experimental data set was divided into 10 subsets, one of which was taken as the test set each time, and the rest was used as the training set, and then the average value of 10 experiments was calculated as the result of the ten-fold crossover.Each of the above classification experiments was run independently 20 times to ensure the stability of the algorithm.
The parameter settings of the contrast swarm intelligent optimization algorithm involved in this paper are shown in Table 3.
Table 3. Parameter settings of the swarm intelligence optimization algorithm.
From the results of Tables 4 and 5, it can be seen that the method proposed in this paper is accurate in accuracy, precision, F-measure, sensitivity, specificity, and MCC.The indicator performs significantly better than other comparative feature selection methods.For the accuracy indicator, the PSOBOA-KELM feature proposed in this paper has an accuracy rate of 96.49%, 96.56%, 87.87%, 83.96%, and 90% on BreastEW, CongressEW, hepatitisfulldata, JPNdata, Parkinson, SpectEW, and wdbc data sets, respectively.Compared with the PSO-KELM, BBA-KELM, and BOA-KELM feature selection methods, the method proposed in this paper has the highest accuracy rate.For example, in the BreastEW dataset, the accuracy of the method proposed in this paper is 0.91% higher than the PSO-KELM method, 0.91% higher than the BBA-KELM method, and 0.03% higher than the BOA-KELM method.For the precision indicator, on the BreastEW, CongressEW, hepatitisfulldata, JPNdata, Parkinson, SpectEW, and wdbc data sets, the accuracy of the PSOBOA-KELM feature selection method proposed in this paper is 95.98%, 100%, 100%, 78.89%, 92.86%, 87.5%, and 100%, respectively.Compared with the PSO-KELM, BBA-KELM, and BOA-KELM feature selection methods, the proposed method has the highest accuracy.For example, in the BreastEW dataset, the accuracy of the method proposed in this paper is 1.31% higher than the PSO-KELM method, 1.46% higher than the BBA-KELM method, and 1.31% higher than the BOA-KELM method.For the F-measure index, on the BreastEW, CongressEW, hepatitisfulldata, JPNdata, Parkinson, SpectEW, and wdbc data sets, the F-measure of the PSOBOA-KELM feature selection method proposed in this paper is 97.3%, 97.10%, 70.83%, 84.03%, 93.75%, 33.33%, and 96.4%, respectively.Compared with the PSO-KELM, BBA-KELM, and BOA-KELM feature selection methods, the proposed method's F-measure works better.For example, in the CongressEW dataset, the F-measure value of the method proposed in this paper is 2.76% higher than the PSO-KELM method, 4.79% higher than the BBA-KELM method, and 0.95% higher than the BOA-KELM method.
For the sensitivity index, on the BreastEW, CongressEW, hepatitisfulldata, JPNdata, Parkinson, SpectEW, and wdbc data sets, the sensitivity values of the PSOBOA-KELM feature selection method proposed in this paper are 100%, 94.37%, 58.33%, 93.75%, 100%, 20%, and 93.07%, respectively.Compared with the PSO-KELM, BBA-KELM, and BOA-KELM feature selection methods, the proposed method has a higher sensitivity value.For example, in the CongressEW dataset, the sensitivity value of the method proposed in this paper is 1.92% higher than the PSO-KELM method, 1.78% higher than the BBA-KELM method, and 1.78% higher than the BOA-KELM method.Compared with the PSO-KELM, BBA-KELM, and BOA-KELM feature selection methods, the method proposed in this paper has a higher specificity value.For example, in the BreastEW dataset, the specificity value of the method proposed in this paper is 1.96% higher than the PSO-KELM method, 2.38% higher than the BBA-KELM method, and 2.38% higher than the BOA-KELM method.For the MCC index, the PSOBOA-KELM feature selection method proposed in this paper has the MCC values of 92.58%, 93.16%, 66.39%, 68.1%, and 72.81% on the BreastEW, CongressEW, hepatitisfulldata, JPNdata, Parkinson, SpectEW, and wdbc data sets, respectively.Compared with the PSO-KELM, BBA-KELM, and BOA-KELM feature selection methods, the method proposed in this paper has a higher MCC value.For example, in the wdbc data set, the MCC value of the method proposed in this paper is 3.64% higher than the PSO-KELM method, 5.61% higher than the BBA-KELM method, and 1.93% higher than the BOA-KELM method.
In addition, in order to compare the performance of these four algorithms more intuitively, as shown in Figure 3, the performance evaluation indicators of these four methods are compared in detail.same time, the calculation and simulation time consumption of the four algorithms in the seven data sets are also presented, as shown in Figure 4.According to the experimental findings, the PSOBOA-KELM technique has an acceptable performance in terms of classification, and the calculation and simulation times are not too long.It may choose an acceptable and constrained feature subset, and its clas-  According to the experimental findings, the PSOBOA-KELM technique has an acceptable performance in terms of classification, and the calculation and simulation times are not too long.It may choose an acceptable and constrained feature subset, and its clas-  According to the experimental findings, the PSOBOA-KELM technique has an acceptable performance in terms of classification, and the calculation and simulation times are not too long.It may choose an acceptable and constrained feature subset, and its classification performance is noticeably better than that of comparable approaches.The algorithm also performs well when it comes to the challenge of classifying various data sets.It takes a fair amount of time and produces classification accuracy.The comparison of eight datasets is shown in Tables 6 and 7.In addition, in order to compare the performance of these four algorithms more intuitively, the performance evaluation indicators of these four methods are compared in detail, as shown in Figure 5.At the same time, the calculation and simulation time consumption of the four algorithms in the seven data sets are also presented, as shown in Figure 6.The four algorithms were tested in the Australian, Breastcancer, Dermatology, HeartEW, Diabetes, Glass, Heart, and Vote8 data sets regarding accuracy, precision, Fmeasure, sensitivity, specificity, MCC, and other six indicators, and achieved a good classification performance.The calculation and simulation time were also relatively short for the PSOBOA-KELM method.It may choose an acceptable and constrained feature subset, and its classification performance is noticeably better than that of comparable approaches.In addition to achieving an improved classification accuracy, the algorithm also performs well while classifying data from various data sets.
In addition, the simulation experiment comparison of the Sinc function was conducted.The four algorithms were compared by fitting the Sinc function.The expression of the Sinc function is as follows: sin( ) , 0 () 0 , 0 The four algorithms were tested in the Australian, Breastcancer, Dermatology, HeartEW, Diabetes, Glass, Heart, and Vote8 data sets regarding accuracy, precision, F-measure, sensitivity, specificity, MCC, and other six indicators, and achieved a good classification performance.The calculation and simulation time were also relatively short for the PSOBOA-KELM method.It may choose an acceptable and constrained feature subset, and its classification performance is noticeably better than that of comparable approaches.In addition to achieving an improved classification accuracy, the algorithm also performs well while classifying data from various data sets.
In addition, the simulation experiment comparison of the Sinc function was conducted.The four algorithms were compared by fitting the Sinc function.The expression of the Sinc function is as follows: x , x = 0 0 , x = 0 (30) We set to generate 2000 [−10, 10] uniformly distributed data sets x, calculated 2000 data sets {x i , f (x i )}, i = 1, 2, 3, . . ., 2000, and then generated 2000 [−0.2, 0.2] uniformly distributed noise ε.Let the training set be {x i , f (x i ) + ε i }, i = 1, 2, 3, . . ., 2000 and then generate another set of 2000 data sets {y i , f (y i )}, i = 1, 2, 3, . . ., 2000 as the test set.In addition, the root-mean-square error (RMSE), mean absolute error (MAE), and relative standard deviation (RSD) were used as the evaluation indicators for error analysis.The calculation formulas of the three indicators are as follows: (y(i) − y (i)) 2 (33) where parameter y(i) represents the measured value, y (i) represents the predicted value, parameter N is the number of samples, parameter e i =y (i) − y(i) is the absolute error, and the numerator and denominator of RSD are in the form of standard deviation.The comparison of Sinc function fitting results is shown in Table 8.It can be seen from Table 8 that, calculated by the PSO-KELM algorithm, the index values of RMSE and MAE are the largest, and the index value of RSD is closer to the smallest, and the performance of the test results is poor.The index value is even smaller, and the performance of the test result is average.The RMSE and MAE index values of the BOA-KELM algorithm are smaller, the RSD index values are closer to larger, and the test results have a better performance.The PSOBOA-KELM algorithm has the smallest RMSE and MAE index values, the RSD index value is closer to the largest, and the test results have the best performance.It shows that the error of the PSOBOA-KELM model is relatively smaller, and the prediction accuracy is better than that of the PSO-KELM, BBA-KELM, and BOA-KELM algorithms.At the same time, this can also be known from the data change trend in Table 2, which indicates that the PSOBOA-KELM algorithm has the best performance, and optimizing the KELM regularization parameter C and kernel function S can improve the prediction accuracy of the KELM model.

Conclusions
The model selection problem of kernel extreme learning machines was investigated in this paper utilizing an enhanced butterfly technique that is based on particle swarm optimization (PSOBOA).This study compared the PSO-KELM, BBA-KELM, and BOA-KELM approaches in-depth to the proposed PSOBOA-KELM model.To assess the model's performance, we used six indicators: accuracy, precision, F-measure, sensitivity, specificity, and MCC.According to the experimental findings, PSOBOA-KELM can swiftly converge to the best solution inside the search space.The model may combine the benefits of the PSOBOA and KELM models and has a good optimization performance thanks to the inclusion of the original butterfly optimization method in the particle swarm search approach.Better performance, fewer algorithm parameters, and quick search times are some of its features.The performance as well as the classification accuracy have both dramatically increased.

Figure 2 .
Figure 2. Comparison of the function iteration calculation.

Figure 2 .
Figure 2. Comparison of the function iteration calculation.

Biomimetics 2023, 8 , 23 Figure 3 .
Figure 3.Comparison of the evaluation index parameters of the four algorithms.

Figure 4 .
Figure 4. Comparison of the simulation time consumption of the four algorithms.

Figure 3 . 23 Figure 3 .
Figure 3.Comparison of the evaluation index parameters of the four algorithms.

Figure 4 .
Figure 4. Comparison of the simulation time consumption of the four algorithms.

Figure 4 .
Figure 4. Comparison of the simulation time consumption of the four algorithms.

Figure 5 .
Figure 5.Comparison of the evaluation index parameters of the four algorithms.Figure 5. Comparison of the evaluation index parameters of the four algorithms.

Figure 5 .
Figure 5.Comparison of the evaluation index parameters of the four algorithms.Figure 5. Comparison of the evaluation index parameters of the four algorithms.

Figure 6 .
Figure 6.Comparison of the simulation time consumption of the four algorithms.

30 )Figure 6 .
Figure 6.Comparison of the simulation time consumption of the four algorithms.

Table 2 .
Detailed description of the dataset.

Table 6 .
Experimental results of the four datasets of Australian, Breastcancer, Dermatology, and HeartEW.

Table 7 .
Experimental results tested in the Diabetes, Glass, Heart, and Vote data sets.

Table 8 .
Comparison of the Sinc function fitting results.