A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer

Dong, Minhui; Wang, Yu; Todo, Yuki; Hua, Yuxiao

doi:10.3390/electronics13132554

Open AccessArticle

A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer

¹

Division of Electrical Engineering and Computer Science, Graduate School of Natural Science & Technology, Kanazawa University, Kakuma-Machi, Kanazawa 920-1192, Japan

²

Faculty of Electrical, Information and Communication Engineering, Kanazawa University, Kakuma-Machi, Kanazawa 920-1192, Japan

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(13), 2554; https://doi.org/10.3390/electronics13132554

Submission received: 24 April 2024 / Revised: 11 June 2024 / Accepted: 24 June 2024 / Published: 28 June 2024

(This article belongs to the Special Issue Advancements in Cross-Disciplinary AI: Theory and Application—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Cervical cancer is the fourth most commonly diagnosed cancer and one of the leading causes of cancer-related deaths among females worldwide. Early diagnosis can greatly increase the cure rate for cervical cancer. However, due to the need for substantial medical resources, it is difficult to implement in some areas. With the development of machine learning, utilizing machine learning to automatically diagnose cervical cancer has currently become one of the main research directions in the field. Such an approach typically involves a large number of features. However, a portion of these features is redundant or irrelevant. The task of eliminating redundant or irrelevant features from the entire feature set is known as feature selection (FS). Feature selection methods can roughly be divided into three types, including filter-based methods, wrapper-based methods, and embedded-based methods. Among them, wrapper-based methods are currently the most commonly used approach, and many researchers have demonstrated that these methods can reduce the number of features while improving the accuracy of diagnosis. However, this method still has some issues. Wrapper-based methods typically use heuristic algorithms for FS, which can result in significant computational time. On the other hand, heuristic algorithms are often sensitive to parameters, leading to instability in performance. To overcome this challenge, a novel wrapper-based method named the Binary Harris Hawks Optimization (BHHO) algorithm is proposed in this paper. Compared to other wrapper-based methods, the BHHO has fewer hyper-parameters, which contributes to better stability. Furthermore, we have introduced a rank-based selection mechanism into the algorithm, which endows BHHO with enhanced optimization capabilities and greater generalizability. To comprehensively evaluate the performance of the proposed BHHO, we conducted a series of experiments. The experimental results show that the proposed BHHO demonstrates better accuracy and stability compared to other common wrapper-based FS methods on the cervical cancer dataset. Additionally, even on other disease datasets, the proposed algorithm still provides competitive results, proving its generalizability.

Keywords:

feature selection; cervical cancer; heuristic algorithm; machine learning; early diagnosis

1. Introduction

Cervical cancer is the fourth most prevalent cancer globally, as well as one of the leading causes of female mortality attributed to cancer. In the year 2020 alone, approximately 604,127 new cases of cervical cancer were diagnosed, accompanied by about 341,831 lives lost [1]. There are many factors that contribute to cervical cancer. The most significant factor is the Human Papillomavirus (HPV) [2]. But, this does not mean that individuals with HPV will definitely develop cervical cancer, as the development of cervical cancer due to HPV’s influence typically takes a long time [3]. Other risk factors such as smoking, HIV infection, organ transplantation, and so on, may increase the probability of cervical cancer to varying degrees [4]. Although cervical cancer can lead to a high mortality rate if not treated, it is worth emphasizing that it is also one of the most treatable forms of cancer when detected early [5]. As a matter of fact, nearly

95 %

of cervical cancer-related fatalities occur in low-income countries with insufficient medical resources [6]. Currently, cervical cancer diagnosis predominantly hinges on biopsy results obtained through colposcopy. This approach encounters resistance among women due to the discomfort involved, and it imposes a significant burden on medical resources. Consequently, it becomes less accessible to individuals with lower incomes. Therefore, there is an urgent need for a cost-effective early cervical cancer diagnostic method.

In recent years, with the rapid advancement of machine learning (ML) technologies, automated cancer diagnosis by ML methods has become a prominent research direction. For example, in the field of lung cancer, Fermino et al. [7] utilized a cascade support vector machine (CSVM) for the prediction of cancer, while Setio et al. [8] introduced a novel computer-aided diagnostic system based on a multi-view convolutional network. In the domain of breast cancer, Abdel-Zaher et al. [9] developed a CAD system based on Deep Belief Networks (DBN), while Mughal B et al. [10] proposed a classification model based on Backpropagation Neural Networks (BPNN) for automated breast cancer diagnosis. Additionally, Ben-Cohen A et al. [11] presented a liver segmentation and liver metastases detection model using a fully convolutional network (FCN), and Rau et al. [12] developed a predictive model for liver cancer in individuals with Type II diabetes by employing both an Artificial Neural Network (ANN) and Logistic Regression (LR). In addition, there are numerous other cases [13,14,15,16,17,18].

The above examples demonstrate that ML can indeed provide significant assistance in automated cancer diagnosis. However, there are still some issues that need to be addressed. ML methods typically require a sufficiently large number of samples to avoid overfitting and ensure generalization. Moreover, they do not necessitate an excess of features, to avoid the curse of dimensionality [19]. However, tasks related to disease diagnosis often involve a large number of features. Among these features, redundant or irrelevant ones will increase computational costs and even lead to decreased prediction accuracy [20]. Furthermore, since some features are derived from medical examinations at hospitals, retaining useless features can lead to an unnecessary increase in financial burden on patients. The process of reducing features in machine learning (ML) methods is known as feature selection (FS). In this study, we extend research in this direction by proposing a new FS method designed to eliminate useless features in the context of ML-based automatic diagnosis of cervical cancer.

The remaining structure of this paper is as follows: Section 2 introduces related works and the contributions of this paper. Section 3 provides a detailed presentation of the proposed method. The experiments and discussions are shown in Section 4. Finally, Section 5 concludes this paper.

2. Related Work

Currently, the most commonly used ML-based approaches for cervical cancer diagnosis is utilizing pap-smear images [21]. For example, Liu et al. [22] proposed a novel CVM-Cervix model for cervical cancer diagnosis by combining the vision transformer (VIT) [23] and DeiT Model [24]. Pramanik et al. [25] in their work employ an ensemble approach for cervical cancer detection, which achieving higher accuracy compared to using InceptionV2 [26], MobileNetV2 [27], and ResNetV2 [28] individually. In addition, an exemplar pyramid deep feature extraction-based method was utilized by Yaman et al. [29] for predicting cervical cancer. Furthermore, Shi et al. [30] employed graph neural networks to predict cervical cancer based on pap-smear images, and Tripathi et al. [31] utilized ResNet-152 for cervical cancer classification.

On the other hand, aided by the advancement of computer-assisted technologies, alternative approaches such as molecular dynamics simulation techniques have provided new insights into the research on cervical cancer identification. Through comprehensive molecular and integrative analysis, researchers have been able to uncover numerous novel genomic and proteomic characteristics specific to different subtypes of cervical cancer [32]. Previous researchers have discovered that a significant quantity of microRNAs (miRNAs) exhibit abnormal expression in cervical cancer tissues, contributing to tumorigenesis, progression, and metastasis. This has subsequently led to the identification of multiple miRNA sequences as potential diagnostic biomarkers for cervical cancer [33]. Additionally, long non-coding RNAs (lncRNAs) have emerged as biomarkers for cervical cancer [34]. DNA methylation, a pivotal epigenetic mechanism, holds significant sway in biological processes [35]. It was verified that methylation markers exhibit higher sensitivity than protein markers in cancer diagnosis [36]. Subsequently, numerous studies have unveiled methylation biomarkers specific to cervical cancer [37]. All the methods mentioned above have shown considerable advancements in the field of computer-aided diagnosis for cervical cancer. However, these approaches still face certain challenges, including limited flexibility and substantial associated costs, complicating their deployment in low-income regions. A viable solution is to utilize the risk factors identified in early screenings to predict the presence of cervical cancer. In 2017, the University of California released a relevant dataset on the UCI database, which greatly propelled research in this regard [38]. Subsequently, researchers have conducted a series of studies on the dataset. For one, the dataset exhibits significant class imbalance, which can severely impact the accuracy of model predictions. To deal with the problem, Newaz et al. [39] proposed a novel data balancing method by combining techniques such as SMOTE [40], the Condensed Nearest Neighbor Rule (CNN) [41], and the Edited Nearest Neighbor Rule (ENN) [42]. Experimental results demonstrated that the proposed approach outperformed using SMOTE, CNN, or ENN individually, yielding a higher accuracy. Additionally, as different classifiers have varying preferences in extracting information from the data, Lu et al. attempted to use an ensemble method that incorporates five different classifiers in order to surpass the performance of a single classifier [43]. The above two works attempted to enhance the predictive accuracy of this dataset from different perspectives.

As mentioned before, FS is also a crucial step when using ML-based methods. Significant improvements in both predictive accuracy and speed of the model can be achieved by removing irrelevant or redundant features. The FS methods can be categorized into three main types: filter-based, wrapper-based, and embedded-based methods. Firstly, filter-based methods employ statistical measures to assess the relevance of features with respect to a class label. These methods fall into two primary categories: ranking-based (univariate) and search-space-based (multivariate). In the ranking-based category, features with higher ranks are selected using a predefined threshold value. These ranks are determined based on the associations between each feature and the specified class label, aiming to eliminate the least pertinent features. Conversely, the search-space-based approaches consider inter-feature relationships; thus, they are capable of eliminating both irrelevant and redundant features [44]. Secondly, wrapper-based methods rely on the evaluation of classifiers to selected features. Consequently, they are capable of selecting a feature subset that can yield optimal results. Wrapper-based methods comprise three main components: a search algorithm, a classifier, and a fitness function [45]. Finally, embedded methods automatically enhance classification performance by selecting features as an integral part of the learning process [20]. Each method comes with its own set of pros and cons. Filter-based methods are known for their lower computational complexity and reduced risk of overfitting compared to other two methods, while wrapper-based methods enhance accuracy beyond what filter methods offer; they do so at the cost of increased computational time [46]. In addition, embedded methods are notably affected by the choice of classifiers and hyper-parameters, which can make them difficult to apply effectively. Nithya et al. in their work explored the utilization of these three FS techniques to determine the significance of various risk factors in cervical cancer diagnosis [47]. It was demonstrated that wrapper-based methods outperformed other approaches in terms of performance, albeit requiring more time. Moreover, it is worth mentioning that the previously introduced two works by Newaz et al. and Lu et al. also employed wrapper-based methods by default for FS. In fact, wrapper-based methods are the most commonly used FS methods for such datasets. And variants of heuristic algorithms like the Genetic Algorithm (GA) [48], the Differential Evolution Algorithm (DE) [49], and the Particle Swarm Optimization Algorithm (PSO) [50] are frequently employed as the primary tools of wrapper-based methods. However, these methods all suffer from performance instability. This is attributed to the fact that most current heuristic algorithms possess hyper-parameters that requiring manual adjustment. Furthermore, the performance of these algorithms is easily influenced by these hyper-parameters when applied to a specific tasks. To address this issue, this paper proposes a novel wrapper-based FS method named the Binary Harris Hawk Optimization (BHHO) algorithm. Compared to other algorithms, BHHO has the smallest number of hyper-parameters, which significantly reduces the extent to which it is affected by hyper-parameters, thereby providing more stable performance. BHHO is developed from the Harris Hawk Optimization algorithm (HHO) [51]. The HHO algorithm is a heuristic algorithm designed for solving continuous numerical problems and is not inherently suitable for addressing binary numerical problems such as FS. In this study, we have extended it into BHHO that is capable of handling binary numerical problems. Additionally, we have introduced a novel rank-based selection mechanism in order to make BHHO more suitable for specific tasks like cervical cancer prediction by considering various ranking approaches. The main contributions of this paper are as follows:

(1) Based on the HHO algorithm, a novel BHHO algorithm is proposed for FS of cervical cancer data. The BHHO algorithm has fewer hyper-parameters and better stability compared to other wrapper-based FS algorithms.

(2) In the BHHO algorithm, we have introduced an rank-based selection mechanism. This mechanism directs the generation of new solutions based on the feature ranking, thereby further enhancing the algorithm’s performance.

(3) We compared the proposed BHHO algorithm with commonly used wrapper-based and filter-based methods on the cervical cancer dataset, verifying the superiority of the proposed BHHO algorithm.

(4) To assess the generality of the proposed BHHO algorithm, we conducted experiments on three additional disease datasets apart from the cervical cancer dataset. The results indicate that BHHO performs remarkably well even on other datasets.

(5) On the cervical cancer dataset, the proposed BHHO algorithm was further integrated with filter-based feature selection methods to reduce computational costs while maintaining or enhancing performance.

In the next section, we will provide a detailed explanation of the proposed BHHO algorithm.

3. Materials and Methods

In this section, we first provide a brief overview of the original HHO algorithm. Subsequently, we delve into the detailed explanation of our proposed BHHO algorithm.

3.1. Overview of the HHO Algorithm

The original HHO algorithm emulates the cooperative behavior and hunting tactics of Harris hawks. Harris hawks initially form an encircling formation around their prey during the capture process. Then, they will employ various types of actions to gradually wear down the prey’s stamina while reducing the distance between themselves and the prey. Finally, when the timing is right, they make a sudden dash to seize the prey. The HHO algorithm employs exploration and exploitation phases to simulate different stages of Harris hawks’ hunting behaviors. Harris hawks will exhibit distinct behavior patterns in different phases. For a specific Harris hawk, it determines the phase it is in according to the prey’s stamina and then performs specific action patterns with a certain probability. Assume that E represents the current stamina of the prey and that q and r are the probabilities of actions. Figure 1 illustrates how the Harris hawk performs different actions based on E, q, and r.

In the original HHO algorithm, the authors defined these actions as continuous operations; thus, it is unable to address FS problems. Zhang et al. [52] in their work transformed the output of the HHO algorithm into binary values (0 or 1) to make it applicable to FS problems. However, this method was too simplistic and rough, which prevented the HHO algorithm from achieving its full potential. On the other hand, Dokeroglu et al. [53] introduced a new set of discrete operations for the HHO algorithm in their work, proposing a novel robust multi-objective HHO algorithm. They applied this algorithm to FS problem and achieved significant results. Building upon the foundation of Dokeroglu’s work, in this paper we introduce a novel set of discrete operations within the framework of the HHO algorithm. Additionally, a feature ranking-based selection mechanism is proposed to further enhance its performance. In the rest parts of this section, we will introduce the details of the proposed BHHO algorithm.

3.2. Proposed BHHO Algorithm

3.2.1. Exploration Phase

As shown in Figure 1, based on the current energy level E of the prey, the entire algorithm is primarily divided into two parts. The energy level of prey (E) decreases during iterations following Equation (1):

E = 2 E_{0} (1 - \frac{t}{T})

(1)

where

E_{0}

represents the initial energy, which is randomly assigned a value within the range of

[- 1, 1]

at the beginning of each iteration. t denotes the current iteration, and T represents the total number of iterations. As the prey gains strength, the initial energy

E_{0}

is adjusted to reflect this by moving towards 1. Conversely, a decrease in

E_{0}

towards

- 1

signifies the prey is losing stamina. The overall energy level E diminishes as the number of iterations t approaches T, indicating a gradual expenditure of energy.

When the prey has sufficient energy (

| E | \geq 1

), the algorithm is in the exploration phase. During this phase, it will perform actions based on the probability q, namely either “perching based on random locations” or “perching based on the position of other hawks”. On the other hand, when the prey’s energy is insufficient (

| E | < 1

), the algorithm enters the exploitation phase. In this phase, depending on the action probability parameter r and the value of the energy level E, it is divided into four different behavioral modes, which including “soft besiege”, “soft besiege with progressive rapid dives”, “hard besiege”, and “hard besiege with progressive rapid dives”.

When the energy level

| E | \geq 1

, the proposed BHHO algorithm performs two operations, namely “perching based on random locations” and “perching based on the position of other hawks”. In the first operation, we randomly select a

h a w k_{r}

from the population, and choose a random number N which is less than the total number of features. Then we swap the positions of the current hawk and

h a w k_{r}

at N random locations. In the second operation, we randomly select two hawks from the population, namely

h a w k_{1}

and

h a w k_{2}

. Then, we copy the hawk with higher fitness to the individual with lower fitness. Figure 2 shows the example of the two operations. It is noted that in the original HHO, hawks execute only one of the two operations, determined by the factor q. Conversely, in our proposed BHHO, both operations are executed simultaneously during the exploration phase.

3.2.2. Exploitation Phase

When the energy level

| E | < 1

, the algorithm is at exploitation. In this phase, according to the action probability parameter r and the value of E. The Harris hawk will take either a soft or hard action based on the prey’s energy level E, while simultaneously seeking an opportunity to capture the prey. When the prey still has sufficient energy left (

| E | > = 0.5

), the Harris hawk will opt for a soft besiege. In the BHHO algorithm, the soft besiege action will randomly select J positions and copy the values at these positions on the prey to the current Harris hawk. The prey is the best individual among the current population in the BHHO algorithm, and J is a random number that is less than the total number of features. When

r < 0.5

, it indicates that the prey has stuck their neck out, and the Harris hawk will attempt to capture the prey directly. The action is called soft besiege with progressive rapid dives in the BHHO algorithm. Specifically, this action select M percent of the positions in the current hawk. The formula for M is as follows:

M = ⌊ | E | * K ⌋

(2)

where K is the total number of features. Subsequently, values of the selected positions are changed. Figure 3 shows examples of the soft besiege and soft besiege with progressive rapid dives.

Conversely, when the prey’s energy is nearly depleted (

| E | < 0.5

), the Harris hawk will take hard action. Like the soft action, depending on the value of r, the hard action includes two behavioral patterns: hard besiege (

r > = 0.5

) and hard besiege with progressive rapid dives (

r < 0.5

). However, unlike the soft action, we have introduced a new rank-based selection mechanism to further increase the probability of finding superior solutions in the hard action.

The rank-based selection mechanism uses feature rankings to guide the individuals in the BHHO algorithm. Specifically, in “hard besiege” and “hard besiege with progressive rapid dives” we have incorporated two discrete operations. One operation includes the rank-based selection mechanism, while the other operation does not. The two operations generate two new individuals, and the superior one is retained. It should be noted that the ranking of features is derived from filter-based FS methods. Clearly, different filter-based methods will have varying effects in different tasks. To identify a method that is more suitable for addressing the cervical cancer issue, this paper attempts to use three filter-based methods, including the ReliefF (Rf) method [54], the variance thresholding (VT) method [55], and the mutual information (MI) method [56], as tools for feature ranking. The proposed BHHO with these three ranking strategies are referred to as RfBHHO, VTBHHO, and MIBHHO, respectively.

Eventually, the “hard besiege” is defined as follows: First, a position where the value differs between the prey and the current hawk is selected. Then the prey’s value is copied to the current hawk to generate

h a w k 1

. Next, we select a feature that has highest rank and has not been chosen by the current hawk, then allow the current hawk select this feature to generate

h a w k 2

. Finally, we copy the better one of

h a w k 1

and

h a w k 2

to the current hawk. For another, the “hard besiege with progressive rapid dives” is defined as follows: First, based on the current prey’s energy, we select M (obtained by Equation (2)) positions with values 1 in the prey, and copy them to the current hawk to generate

h a w k 1

. Second, based on the current prey’s energy, we select the top M ranked features that have not yet been chosen by the current hawk, and allow the current hawk select these feature to generate

h a w k 2

. Finally, we copy the better one of

h a w k 1

and

h a w k 2

to the current hawk. We illustrate the examples of hard besiege and hard besiege with progressive rapid dives in Figure 4 and Figure 5, respectively.

3.3. The Theoretical Analysis of the Proposed BHHO Algorithm

As introduced above, we have defined a new set of discrete operators in the BHHO algorithm. Since the

E_{0}

in Equation (1) is a random value, any operation could potentially be executed at any stage of the algorithm. However, due to the presence of the

1 - \frac{t}{T}

term, the prey’s energy level

| E |

tends to decrease as the algorithm iterates. Therefore, the two operations in the exploration phase are more likely to be executed in the early stages of the algorithm, while the four operations in the exploitation phase are more likely to be executed in the latter half stages.

During the exploration phase, the variable N can be any number from 1 to the total number of features, which offers a larger search space for the individuals. In this part, the algorithm tries to find better optimal solutions through larger step changes. In the exploitation phase, the algorithm introduce the variable M to limit the search space of the individuals. Since the value of M depends on the variable E, the M in soft actions is always larger than in hard actions. During the mid-stages of the algorithm,

| E |

is more likely to fall into the range of [0.5, 1], indicating that soft actions are more likely to be executed. The individuals can move towards the optimal individual by soft actions, thereby improving the quality of the entire population. At the later stages of the algorithm,

| E |

is more likely to fall into the range of [0, 0.5], meaning there is a greater possibility of executing hard actions. At this part, since the algorithm has converged or is close to convergence, the algorithm not only utilizes the prey to guide other individuals but also introduces a rank-based selection mechanism to attempt to find better global optimal solutions.

In summary, the entire BHHO algorithm ensures convergence by gradually reducing the search space of the individuals. The specific operation executed by the algorithm is decided by Equation (1), which guarantees the diversity of the population. In addition, the rank-based selection mechanism enhances the algorithm’s capability to search for a global optimal solution. Moreover, there are no other hyper-parameters in the algorithm apart from the population size and the total number of iterations, which greatly alleviates the issue of heuristic algorithms being overly sensitive to hyper-parameters.

The whole process of proposed algorithm is demonstrated in Algorithm 1.

Algorithm 1 Pseudo-code of the proposed BHHO algorithm

1:: Input: The population size P and the maximum number of iterations T
2:: Output: Best individual
3:: Generate the set of feature rankings by a filter-based method
4:: Calculate the fitness values (F1-score) of hawks
5:: Designate the optimal hawk as the prey.
6:: while $t < = T$ do
7:: i = 1
8:: while $i < = P$ do
9:: Assign a random number from [−1, 1] to $E_{0}$
10:: Calculate the prey’s energy level E
11:: if $| E | \geq 1$ then ▹ Exploration phase
12:: Execute perching based on random locations
13:: Execute perching based on the position of other hawks
14:: end if
15:: if $| E | < 1$ then ▹ Exploitation phase
16:: Assign a random number from [0, 1] to r
17:: if $r > = 0.5$ and $| E | > = 0.5$ then Execute soft besiege
18:: end if
19:: if $r > = 0.5$ and $| E | < 0.5$ then Execute hard besiege
20:: end if
21:: if $r < 0.5$ and $| E | > = 0.5$ then Execute soft besiege with progressive rapid dives
22:: end if
23:: if $r < 0.5$ and $| E | < 0.5$ then Execute hard besiege with progressive rapid dives
24:: end if
25:: end if
26:: end while
27:: Calculate the fitness values (F1-score) of hawks
28:: Designate the optimal hawk as the prey.
29:: end while
30:: Output: prey (Best individual)

3.4. Combination of Proposed BHHO and Filter-Based Methods

Wrapper-based FS methods are better at finding feature subset that can improve predication performance, but they also require more computational time. On the other hand, completely useless features in wrapper-based methods may also negatively impact the quality of the final solution. A common hybrid FS strategy to address these issues is to first use filter-based methods to eliminate a small portion of redundant features and then apply wrapper-based methods to process the remaining features. In this paper, we will attempt to combine the best-performing BHHO variant from our experiments with the RF, VT, and MI methods. Specifically, we used the top

90 %

ranked features from the Rf, VT, and MI methods separately as the input for the best BHHO variant.

4. Results and Discussion

In this study, seven classifiers include the Support Vector Machine (SVM) [57], the Random Forest (RF) [58], the Naive Bayes (NB) [59], the Adaptive Boosting (Adaboost) [60], the Discriminant Analysis (DA) [61], the Logistic Regression (LR) [62], and the k-Nearest Neighbors (KNN) [63] are used in this paper. In addition, the commonly used wrapper-based methods including GA algorithm, PSO algorithm, and DE algorithm were used for comparison. The flowchart of experiments is illustrated in Figure 6. We commence by preprocessing datasets, which eliminating missing values within datasets. Subsequently, we employ a 5-fold cross-validation scheme to partition the data. In each fold, we conduct an FS operation on the training set. Since the dataset is severely imbalanced, the SMOTE [40] algorithm is used to balancing the training set for a better performance. Then, both the training set and testing set undergo normalization. Finally, we utilize the training set for model training and employ the testing set for performance evaluation. All experiments were conducted on a personal PC with an Intel(R) Core

i 9

,

2.20

GHz, and 16 GB memory using Python.

4.1. Dataset

4.1.1. Cervical Cancer Dataset

The cervical cancer dataset is available at the UCI repository and was originally collected from the Hospital Universitario de Caracas in Venezuela. It was utilized in this study. There are 858 records, each characterized by 32 features that encompass habits, sexual history, demographic information, and more. The original dataset includes 55 positive cases and 803 negative cases. Due to privacy concerns, some patients choose not to answer some questions. Hence, the data contains some missing values. Two features, “STDs: Time since first diagnosis” and “STDs: Time since the last diagnosis”, have a total number of 787 missing values. As the number of missing values in the two features exceeds

90 %

, we have chosen to remove them. Additionally, there are 105 patients with missing values across 18 features. These patients were excluded from the dataset since more than half of their feature information was incomplete. The rest of the missing values were filled by the Multivariate Imputation by Chained Equation (MICE) method [64]. MICE is a highly flexible imputation method that more accurately measures the uncertainty of missing values compared to other imputation techniques. It replaces missing data by running multiple regression models, where each missing value is imputed based on the other variables in the dataset. To prevent data leakage, we train the MICE algorithm model solely on the training set, and then use the trained model to impute the missing values in the entire dataset. As a result, we have obtained a dataset consisting of information from 753 patients with 30 features. The dataset includes 53 positive cases and 700 negative cases.

4.1.2. Other Datasets

To better illustrate the performance of the proposed BHHO algorithm, we conducted additional experiments on three other disease datasets. These three datasets include the Cleveland dataset [65], the Z-Alizadeh Sani dataset [66], and the Parkinson dataset [67]. The Cleveland dataset contains 303 cases, with 139 positive cases and 164 negative cases. Although this dataset originally comprises 75 features, it is common for most researchers to utilize only 13 of these features. In this study, we also only selected these 13 features as the input to our model. The Z-Alizadeh Sani dataset encompasses 303 cases, consisting of 216 positive cases and 87 negative cases, with a total of 55 features. The Parkinson dataset encompasses 240 cases, consisting of 120 positive cases and 120 negative cases, with a total of 46 features. The objective of both the Cleveland dataset and the Z-Alizadeh Sani dataset is to ascertain whether individuals have heart disease, while the Parkinson dataset aims to identify the presence of Parkinson’s disease in individuals. All three datasets are two classification problems.

4.2. Performance Metrics

In this paper, six metrics including

a c c u r a c y

,

r e c a l l

,

s p e c i f i c i t y

,

p r e c i s i o n

,

F 1

-

s c o r e

, and

G e o m e t r i c

-

M e a n

(G-

m e a n

) are employed to evaluate the performance of the proposed method. The equations of these six metrics are defined as follows:

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(3)

R e c a l l = \frac{T P}{T P + E N}

(4)

S p e c i f i c i t y = \frac{T N}{F P + T N}

(5)

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

F 1 - s c o r e = \frac{2 * R e c a l l * P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(7)

G - m e a n = \sqrt{R e c a l l * S p e c i f i c i t y}

(8)

where

T P

represents the correct identification of a diseased person as sick;

T N

signifies the correct classification of a healthy person as being healthy;

F P

denotes the incorrect classification of a healthy person as diseased; and

F N

refers to the incorrect classification of a diseased person as healthy. It is noted that the

F 1

-

s c o r e

is used as the objective function when employing wrapper-based methods, with the macro average applied due to the imbalanced nature of the data.

4.3. Validation on the Cervical Cancer Dataset

In this part, we validate the performance of proposed BHHO algorithm on the cervical cancer dataset. Table 1 presents the experimental results without an FS method. The results of proposed BHHO methods are demonstrated in Table 2. In addition, the results of GA method, PSO method, and DE method are shown in Table 3. Since we employed a 5-fold strategy to partition the dataset, all the table presents the experimental results in the form of “mean ± variance”.

Clearly, all the experimental results exhibit high

a c c u r a c y

and

p r e c i s i o n

but low

s e n s i t i v i t y

and

s p e c i f i c i t y

. This is because the dataset belongs to the category of highly imbalanced datasets, namely, the majority class vastly outnumbers the minority class. In the real world, the diagnosis of cervical cancer is precisely such a situation where negative results overwhelmingly outnumber positive results. In such a situation, using simple metrics like

a c c u r a c y

or

p r e c i s i o n

may not accurately assess the performance since these two metrics can easily yield high value when the majority class greatly outnumbers the minority class. In the case of cervical cancer, the consequences of incorrectly predicting the minority class are much more severe than incorrectly predicting the majority class. Therefore, we pay more attention to the results of

r e c a l l

and

s p e c i f i c i t y

. However, this does not mean that

a c c u r a c y

and

p r e c i s i o n

should be ignored. To comprehensively evaluate the performance, F1-score and G-mean are used as the main reference indicators in our experiments. So, we have highlighted the best F1-score and G-mean in bold on each table.

From the three tables, it can be observed that when using the wrapper-based methods, most results tend to be better than those obtained using all features, which represents that using wrapper-based FS algorithms can capturing the implicit relationships between features, effectively reducing the number of features while improving performance. From Table 2, it can be observed that among three BHHO variants, VTBHHO achieved the highest F1-score of

28.60

when using Adaboost as the classifier and the highest G-mean of

64.18

when using LR as the classifier. Furthermore, it can be found that the highest result obtained by proposed BHHO algorithm surpasses all outcomes using the GA method, PSO method, and DE method. Figure 7 visually demonstrated such a situation. In Figure 7, the best F1-score and G-mean of each wrapper-based method are selected as representatives. The specific values and the classifiers that achieved these results are annotated above each bar. Clearly, the best results from all the wrapper-based methods surpassed the best results obtained using all features, and the best results from proposed BHHO methods surpassed other commonly used wrapper-based methods.

Table 4, Table 5, Table 6 show the experimental results after employing Rf method, VT method, and MI method. The top

50 %

,

60 %

,

70 %

, and

80 %

of features are selected as the features subset. It can be observed that when using Rf method, most classifiers did not surpass the performance of the original feature set, while when using VT method or MI method, a majority of the classifiers managed to exceed the performance of the original feature set, while there were still some classifiers that experienced a decline in performance. It is worth emphasizing that only in MI method (using top

80 %

of features), the performance of DA completely surpassed the results of all classifiers using the all feature set. But this result is still less than majority of wrapper-based method. It is evident that not all FS methods yield positive results. While filter-based FS methods can reduce redundant features and enhance accuracy to some extent, their effectiveness is quite limited. In some cases, this method may even degrade the performance. This is because the importance of features in a dataset cannot be judged solely from a statistical perspective. The relationships between features must also be considered. Some features, which might seem statistically insignificant on their own, can significantly improve performance when used in conjunction with other features. Filter-based methods often overlook this aspect. Therefore, in practical applications, most researchers do not rely solely on filter-based methods.

Based on the above validation, we further attempted a hybrid method that first uses filter-based methods to eliminate a small portion of redundant features and then applies proposed BHHO methods to process the remaining features. Specifically, we used the top

90 %

ranked features from the Rf, VT, and MI methods separately as the input for the VTBHHO method. The results of the experiment are shown in Table 7. We bold the best F1-score and G-mean among all results. In the combination of Rf and VTBHHO, the highest F1-score (

30.21 %

) was obtained when using SVM as the classifier. In the combination of MI and VTBHHO, the highest G-mean (

61.69 %

) was achieved when using Adaboost as the classifier. It can be observed that, compared to using VTBHHO alone, the hybrid approaches achieved a higher F1-score but obtained a lower G-mean. This is because we used the F1-score as the objective function for the wrapped-based methods, so the methods are more inclined to favor feature subsets that can improve the F1-score. When using wrapper-based methods alone, some features with negative effects impacted the quality of the final solution. However, by first using filter-based methods to eliminate those features that were most likely completely useless, the quality of all feature subsets was improved, thereby increasing the likelihood of obtaining better solutions.

Finally, we analyze the selected feature sets obtained when combining Rf and VTBHHO (using SVM as the classifier), that is, the situation that achieves the highest F1-score. The selected feature results are shown in Table 8. Obviously, when the distribution of the dataset varies, even using the same FS method and classifier, the selected features can vary significantly. In the five different subsets of the dataset, the number of features used ranged from a maximum of 18 features to a minimum of 8 features among the 30 features. The table summarizes the usage of all features, and it can be observed that the last feature “DX” has been used in all subsets. This suggests that past diagnostic results play a crucial role in the current decision-making process. Additionally, “Hormonal Contraceptives (year)”, “STDs”, “STDs:syphilis”, and “STDs:molluscum contagiosum” have been used four times, indicating that these risk factors significantly increase the likelihood of cervical cancer detection. On the other hand, “STDs:condylomatosis” and “STDs:cervical condylomatosis” have been shown to have no association with the discovery of cervical cancer. These statistical findings can provide valuable recommendations to the general population. When highly relevant risk factors appear in daily life, we should be vigilant and seek further medical diagnosis.

4.4. Validation on Other Datasets

In this part, we validate the proposed BHHO algorithm on the other three disease datasets. The experimental results are shown in Table 9, Table 10, Table 11, Table 12, Table 13 and Table 14 that are presented in the form of

m e a n \pm v a r i a n c e

, and the best F1-score and G-mean of each method are highlighted in bold. On the Cleveland dataset, VTBHHO’s DA achieved the best F1-score (

83.29 %

) and G-mean (

84.2 %

). On the Z-Alizadeh Sani dataset, GA’s SVM achieved the highest F1-score (

91.11 %

), while DE’s LR obtained the highest G-mean (

87.77 %

). On the Parkinson’s dataset, MIBHHO’s KNN achieved the best F1-score (

82.35 %

) and the highest G-mean (

83.22 %

). Figure 8, Figure 9 and Figure 10 illustrate the experimental conditions for these three datasets using barcharts. The proposed BHHO achieved the best results in two out of the three datasets. Although it did not achieve the best result on the Z-Alizadeh Sani dataset, it still performed close to the optimal outcome. It demonstrates that the proposed BHHO method is not only effective for cervical cancer dataset but also has a certain degree of generalizability to other disease datasets. Furthermore, observing the results of the three BHHO variants on different datasets reveals that different ranking strategies play varying roles depending on the dataset. The VTBHHHO variant, which performed best on the cervical cancer dataset, may not necessarily be the optimal choice for other datasets. Therefore, we can adjust the ranking strategy according to the actual tasks to achieve higher performance. In other words, the ranking strategy incorporated into BHHO allows it to flexibly adapt to different environments. Although the combination of the proposed BHHO with RF, VT, and MI did not achieve the best results on the Z-Alizadeh Sani dataset, it can be expected that combining it with other ranking strategies could yield higher accuracy. In summary, the experimental results prove that proposed BHHO method can offer a competitive performance than other commonly used methods.

5. Conclusions

This study introduces a novel feature selection (FS) strategy for the automated diagnosis of cervical cancer using machine learning (ML). Disease datasets typically contain a large number of features, but a lot of them are redundant or irrelevant. Utilizing wrapper-based FS methods to eliminate these useless features is currently one of the effective methods for this problem. Following this perspective, in this paper, we have improved the Harris Hawks Optimization (HHO) algorithm to propose the Binary Harris Hawks Optimization (BHHO) algorithm. Specifically, we have defined a new set of discrete operations under the framework of HHO to better address the FS problem. Additionally, we introduced a new rank-based selection mechanism into the algorithm to enhance its optimization ability.

To comprehensively evaluate the performance of the proposed algorithms, we compared the proposed BHHO algorithm with commonly used wrapper-based and filter-based FS methods. The results shows that the proposed BHHO algorithm achieves better results on the cervical cancer problem than other commonly used methods. In addition, we attempted a hybrid approach by combining the filter-based method and the BHHO algorithm. As a result, the hybrid approach achieved the highest F1-score among all experiments. Moreover, the proposed BHHO algorithm was validated on other disease datasets. Experimental results shows the BHHO algorithm can provide competitive performance even on other datasets, which proves its generalizability. In future work, we plan to explore the integration of other feature ranking methods and the proposed BHHO algorithm. Additionally, we will explore using alternative functions such as G-mean or aggregate function as the algorithm’s objective function.

Author Contributions

Conceptualization, M.D.; Software, M.D.; Validation, Y.W. and Y.H.; Formal analysis, Y.W. and Y.H.; Data curation, Y.H.; Writing—original draft, M.D.; Supervision, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by JST SPRING grant number JPMJSP2135. The APC was funded by JST SPRING.

Data Availability Statement

Cervical Cancer Dataset: https://archive.ics.uci.edu/dataset/383/cervical+cancer+risk+factors; Cleveland dataset: https://archive.ics.uci.edu/dataset/45/heart+disease; Z-Alizadeh Sani dataset: https://archive.ics.uci.edu/dataset/411/extention+of+z+alizadeh+sani+dataset; Parkinson dataset: https://archive.ics.uci.edu/dataset/489/parkinson+dataset+with+replicated+acoustic+features, accessed date 23 April 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef]
Gadducci, A.; Barsotti, C.; Cosio, S.; Domenici, L.; Riccardo Genazzani, A. Smoking habit, immune suppression, oral contraceptive use, and hormone replacement therapy use and cervical carcinogenesis: A review of the literature. Gynecol. Endocrinol. 2011, 27, 597–604. [Google Scholar] [CrossRef]
Rodríguez, A.C.; Schiffman, M.; Herrero, R.; Hildesheim, A.; Bratti, C.; Sherman, M.E.; Solomon, D.; Guillén, D.; Alfaro, M.; Morales, J.; et al. Longitudinal study of human papillomavirus persistence and cervical intraepithelial neoplasia grade 2/3: Critical role of duration of infection. J. Natl. Cancer Inst. 2010, 102, 315–324. [Google Scholar] [CrossRef]
Hillemanns, P.; Soergel, P.; Hertel, H.; Jentschke, M. Epidemiology and early detection of cervical cancer. Oncol. Res. Treat. 2016, 39, 501–506. [Google Scholar] [CrossRef]
World Health Organization. Comprehensive Cervical Cancer Control: A Guide to Essential Practice; World Health Organization: Geneva, Switzerland, 2006.
Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef]
Firmino, M.; Angelo, G.; Morais, H.; Dantas, M.R.; Valentim, R. Computer-aided detection (CADe) and diagnosis (CADx) system for lung cancer with likelihood of malignancy. Biomed. Eng. Online 2016, 15, 2. [Google Scholar] [CrossRef]
Setio, A.A.A.; Ciompi, F.; Litjens, G.; Gerke, P.; Jacobs, C.; Van Riel, S.J.; Wille, M.M.W.; Naqibullah, M.; Sánchez, C.I.; Van Ginneken, B. Pulmonary nodule detection in CT images: False positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 2016, 35, 1160–1169. [Google Scholar] [CrossRef]
Abdel-Zaher, A.M.; Eldeib, A.M. Breast cancer classification using deep belief networks. Expert Syst. Appl. 2016, 46, 139–144. [Google Scholar] [CrossRef]
Mughal, B.; Sharif, M.; Muhammad, N.; Saba, T. A novel classification scheme to decline the mortality rate among women due to breast tumor. Microsc. Res. Tech. 2018, 81, 171–180. [Google Scholar] [CrossRef]
Ben-Cohen, A.; Diamant, I.; Klang, E.; Amitai, M.; Greenspan, H. Fully convolutional network for liver segmentation and lesions detection. In Proceedings of the Deep Learning and Data Labeling for Medical Applications: First International Workshop, LABELS 2016, and Second International Workshop, DLMIA 2016, Held in Conjunction with MICCAI 2016, Athens, Greece, 21 October 2016; Proceedings 1. Springer: Cham, Switzerland, 2016; pp. 77–85. [Google Scholar]
Rau, H.H.; Hsu, C.Y.; Lin, Y.A.; Atique, S.; Fuad, A.; Wei, L.M.; Hsu, M.H. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput. Methods Programs Biomed. 2016, 125, 58–65. [Google Scholar] [CrossRef]
Asuntha, A.; Srinivasan, A. Deep learning for lung Cancer detection and classification. Multimed. Tools Appl. 2020, 79, 7731–7762. [Google Scholar] [CrossRef]
Shanthi, S.; Rajkumar, N. Lung cancer prediction using stochastic diffusion search (SDS) based feature selection and machine learning methods. Neural Process. Lett. 2021, 53, 2617–2630. [Google Scholar] [CrossRef]
Acharya, S.; Alsadoon, A.; Prasad, P.; Abdullah, S.; Deva, A. Deep convolutional network for breast cancer classification: Enhanced loss function (ELF). J. Supercomput. 2020, 76, 8548–8565. [Google Scholar] [CrossRef]
Ak, M.F. A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. Healthcare 2020, 8, 111. [Google Scholar] [CrossRef]
Saba, L.; Dey, N.; Ashour, A.S.; Samanta, S.; Nath, S.S.; Chakraborty, S.; Sanches, J.; Kumar, D.; Marinho, R.; Suri, J.S. Automated stratification of liver disease in ultrasound: An online accurate feature classification paradigm. Comput. Methods Programs Biomed. 2016, 130, 118–134. [Google Scholar] [CrossRef]
Gatos, I.; Tsantis, S.; Spiliopoulos, S.; Karnabatidis, D.; Theotokas, I.; Zoumpoulis, P.; Loupas, T.; Hazle, J.D.; Kagadis, G.C. A machine-learning algorithm toward color analysis for chronic liver disease classification, employing ultrasound shear wave elastography. Ultrasound Med. Biol. 2017, 43, 1797–1810. [Google Scholar] [CrossRef]
Bellman, R. Dynamic programming. Science 1966, 153, 34–37. [Google Scholar] [CrossRef]
Manikandan, G.; Abirami, S. A survey on feature selection and extraction techniques for high-dimensional microarray datasets. In Knowledge Computing and its Applications: Knowledge Computing in Specific Domains: Volume II; Springer: Singapore, 2018; pp. 311–333. [Google Scholar]
William, W.; Ware, A.; Basaza-Ejiri, A.H.; Obungoloch, J. A review of image analysis and machine learning techniques for automated cervical cancer screening from pap-smear images. Comput. Methods Programs Biomed. 2018, 164, 15–22. [Google Scholar] [CrossRef]
Liu, W.; Li, C.; Xu, N.; Jiang, T.; Rahaman, M.M.; Sun, H.; Wu, X.; Hu, W.; Chen, H.; Sun, C.; et al. CVM-Cervix: A hybrid cervical Pap-smear image classification framework using CNN, visual transformer and multilayer perceptron. Pattern Recognit. 2022, 130, 108829. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning (PMLR), Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
Pramanik, R.; Biswas, M.; Sen, S.; de Souza Júnior, L.A.; Papa, J.P.; Sarkar, R. A fuzzy distance-based ensemble of deep models for cervical cancer detection. Comput. Methods Programs Biomed. 2022, 219, 106776. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Yaman, O.; Tuncer, T. Exemplar pyramid deep feature extraction based cervical cancer image classification model using pap-smear images. Biomed. Signal Process. Control 2022, 73, 103428. [Google Scholar] [CrossRef]
Shi, J.; Wang, R.; Zheng, Y.; Jiang, Z.; Zhang, H.; Yu, L. Cervical cell classification with graph convolutional network. Comput. Methods Programs Biomed. 2021, 198, 105807. [Google Scholar] [CrossRef]
Tripathi, A.; Arora, A.; Bhan, A. Classification of cervical cancer using Deep Learning Algorithm. In Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 1210–1218. [Google Scholar]
Cancer Genome Atlas Research Network. Integrated genomic and molecular characterization of cervical cancer. Nature 2017, 543, 378. [Google Scholar] [CrossRef]
Nahand, J.S.; Taghizadeh-boroujeni, S.; Karimzadeh, M.; Borran, S.; Pourhanifeh, M.H.; Moghoofei, M.; Bokharaei-Salim, F.; Karampoor, S.; Jafari, A.; Asemi, Z.; et al. microRNAs: New prognostic, diagnostic, and therapeutic biomarkers in cervical cancer. J. Cell. Physiol. 2019, 234, 17064–17099. [Google Scholar] [CrossRef]
Luo, W.; Wang, M.; Liu, J.; Cui, X.; Wang, H. Identification of a six lncRNAs signature as novel diagnostic biomarkers for cervical cancer. J. Cell. Physiol. 2020, 235, 993–1000. [Google Scholar] [CrossRef]
Bock, C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 2012, 13, 705–719. [Google Scholar] [CrossRef]
Qureshi, S.A.; Bashir, M.U.; Yaqinuddin, A. Utility of DNA methylation markers for diagnosing cancer. Int. J. Surg. 2010, 8, 194–198. [Google Scholar] [CrossRef][Green Version]
Xu, W.; Xu, M.; Wang, L.; Zhou, W.; Xiang, R.; Shi, Y.; Zhang, Y.; Piao, Y. Integrative analysis of DNA methylation and gene expression identified cervical cancer-specific diagnostic biomarkers. Signal Transduct. Target. Ther. 2019, 4, 55. [Google Scholar] [CrossRef]
Dua, D.; Graff, C. UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 1 February 2024).
Newaz, A.; Muhtadi, S.; Haq, F.S. An intelligent decision support system for the accurate diagnosis of cervical cancer. Knowl.-Based Syst. 2022, 245, 108634. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Hart, P. The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 1968, 14, 515–516. [Google Scholar] [CrossRef]
Wilson, D.L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 1972; SMC-2, 408–421. [Google Scholar]
Lu, J.; Song, E.; Ghoneim, A.; Alrashoud, M. Machine learning for assisting cervical cancer diagnosis: An ensemble approach. Future Gener. Comput. Syst. 2020, 106, 199–205. [Google Scholar] [CrossRef]
Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Distributed feature selection: An application to microarray data classification. Appl. Soft Comput. 2015, 30, 136–150. [Google Scholar] [CrossRef]
Saw, T.; Myint, P.H. Swarm intelligence based feature selection for high dimensional classification: A literature survey. Int. J. Comput 2019, 33, 69–83. [Google Scholar]
Alhenawi, E.; Al-Sayyed, R.; Hudaib, A.; Mirjalili, S. Feature selection methods on gene expression microarray data for cancer classification: A systematic review. Comput. Biol. Med. 2022, 140, 105051. [Google Scholar] [CrossRef] [PubMed]
Nithya, B.; Ilango, V. Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction. SN Appl. Sci. 2019, 1, 641. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
Storn, R.; Price, K. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, R.; Wang, X.; Chen, H.; Li, C. Boosted binary Harris hawks optimizer and feature selection. Eng. Comput. 2021, 37, 3741–3770. [Google Scholar] [CrossRef]
Dokeroglu, T.; Deniz, A.; Kiziloz, H.E. A robust multiobjective Harris’ Hawks Optimization algorithm for the binary classification problem. Knowl.-Based Syst. 2021, 227, 107219. [Google Scholar] [CrossRef]
Kira, K.; Rendell, L.A. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA, USA, 12–16 July 1992; pp. 129–134. [Google Scholar]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Sahami, M.; Dumais, S.; Heckerman, D.; Horvitz, E. A Bayesian approach to filtering junk e-mail. In Proceedings of the Learning for Text Categorization: Papers from the 1998 Workshop, Madison, WI, USA, 26–27 July 1998; Volume 62, pp. 98–105. [Google Scholar]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Cox, D.R.; Snell, E.J. Analysis of Binary Data; CRC Press: Boca Raton, FL, USA, 1989; Volume 32. [Google Scholar]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Van Buuren S, Oudshoorn C G M Multivariate imputation by chained equations. J. Stat. Softw. 2011, 45, 1–67.
Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Detrano, R. Heart Disease; UCI Machine Learning Repository: Espoo, Finland, 1988. [Google Scholar] [CrossRef]
Alizadehsani, R.; Roshanzamir, M.; Sani, Z. Z-Alizadeh Sani; UCI Machine Learning Repository: Espoo, Finland, 2017. [Google Scholar] [CrossRef]
Prez, C. Parkinson Dataset with Replicated Acoustic Features; UCI Machine Learning Repository: Espoo, Finland, 2019. [Google Scholar] [CrossRef]

Figure 1. The schematic diagram of the HHO algorithm.

Figure 2. The examples of operations in the exploration phase.

Figure 3. The examples of operations of soft actions.

Figure 4. The examples of hard besiege.

Figure 5. The examples of hard besiege with rapid dives.

Figure 6. Flowchart of the experimental framework.

Figure 7. Bar chart for the results of the cervical cancer dataset. Each bar denotes the best result under the feature selection method. There is a specific numerical value along with the classifier’s name at the top of each bar.

Figure 8. Bar chart for the results of the Cleveland dataset. Each bar denotes the best result under the feature selection method. There is a specific numerical value along with the classifier’s name at the top of each bar.

Figure 9. Bar chart for the results of the Z-Alizadeh Sani dataset. Each bar denotes the best result under the feature selection method. There is a specific numerical value along with the classifier’s name at the top of each bar.

Figure 10. Bar chart for the results of the Parkinson dataset. Each bar denotes the best result under the feature selection method. There is a specific numerical value along with the classifier’s name at the top of each bar.

Table 1. Results for cervical cancer dataset with all features. The best F1-score and G-mean across all methods are highlighted in bold.

Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)	G-Mean (%)
SVM	87.25 ± 1.40	20.42 ± 5.30	19.05 ± 10.16	92.33 ± 2.61	18.59 ± 6.05	43.06 ± 5.33
RF	90.97 ± 1.55	9.14 ± 10.48	15.67 ± 13.48	97.29 ± 0.91	10.52 ± 10.06	22.01 ± 19.97
NB	78.21 ± 7.51	29.40 ± 20.01	12.00 ± 5.83	81.85 ± 9.33	14.74 ± 6.59	45.17 ± 14.28
Adaboost	80.08 ± 1.48	31.77 ± 7.74	13.06 ± 4.70	83.73 ± 2.02	18.25 ± 5.76	51.12 ± 6.70
DA	83.66 ± 1.09	41.87 ± 13.45	19.28 ± 7.09	86.88 ± 1.66	25.90 ± 8.61	59.34 ± 10.55
LR	80.61 ± 0.55	34.62 ± 8.62	13.92 ± 3.94	84.16 ± 0.99	19.56 ± 4.75	53.44 ± 7.28
KNN	78.22 ± 1.00	43.90 ± 7.93	15.02 ± 4.79	80.72 ± 0.89	22.26 ± 6.37	59.26 ± 5.96

Table 2. Results for cervical cancer dataset using proposed BHHO variants. The best F1-score and G-mean across all methods are highlighted in bold.

VTBHHO
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	87.78 ± 1.46	31.77 ± 10.03	22.88 ± 5.57	92.15 ± 1.58	25.82 ± 5.84	53.33 ± 8.77
RF	89.11 ± 1.73	11.35 ± 2.42	18.87 ± 12.18	95.03 ± 2.59	12.97 ± 4.39	32.65 ± 3.42
NB	76.89 ± 8.15	35.58 ± 17.45	11.82 ± 3.70	80.19 ± 10.28	16.35 ± 4.15	50.35 ± 12.35
Adaboost	83.67 ± 2.26	46.96 ± 8.51	20.78 ± 4.69	86.42 ± 2.09	28.60 ± 5.92	63.46 ± 5.48
DA	84.99 ± 3.32	32.23 ± 12.52	17.75 ± 6.45	88.96 ± 2.73	22.85 ± 8.50	52.08 ± 12.43
LR	81.68 ± 1.95	49.53 ± 10.59	19.04 ± 4.56	84.15 ± 1.91	27.21 ± 5.96	64.18 ± 6.81
KNN	83.13 ± 2.60	21.66 ± 13.89	11.39 ± 6.84	87.53 ± 2.76	14.89 ± 9.13	38.10 ± 20.26
RfBHHO
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	86.85 ± 1.98	28.13 ± 12.38	18.90 ± 6.47	91.44 ± 2.00	21.92 ± 7.64	49.18 ± 12.12
RF	87.64 ± 2.77	13.74 ± 4.99	16.21 ± 12.07	93.29 ± 2.79	13.85 ± 6.77	35.17 ± 6.83
NB	80.47 ± 7.46	30.73 ± 15.57	12.78 ± 3.74	84.27 ± 9.08	17.00 ± 4.70	48.30 ± 11.31
Adaboost	83.54 ± 3.17	38.26 ± 2.75	19.06 ± 4.94	87.03 ± 3.62	24.85 ± 4.60	57.63 ± 1.69
DA	85.13 ± 1.81	32.23 ± 12.52	17.62 ± 6.64	89.13 ± 1.24	22.60 ± 8.37	52.13 ± 12.35
LR	82.61 ± 3.07	34.23 ± 12.80	15.67 ± 6.21	86.33 ± 4.28	20.76 ± 7.74	52.64 ± 11.90
KNN	82.46 ± 2.29	23.48 ± 7.96	12.80 ± 6.07	86.87 ± 2.81	16.21 ± 6.59	44.53 ± 7.48
MIBHHO
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	87.12 ± 1.23	23.74 ± 11.39	16.91 ± 4.73	92.00 ± 1.19	19.23 ± 6.35	45.29 ± 11.05
RF	90.44 ± 1.72	9.92 ± 2.37	24.32 ± 14.37	96.59 ± 2.26	12.55 ± 2.27	30.72 ± 3.30
NB	85.65 ± 3.28	27.58 ± 18.16	17.72 ± 7.52	90.16 ± 4.21	19.27 ± 9.20	46.70 ± 15.96
Adaboost	83.54 ± 2.67	37.66 ± 15.16	17.70 ± 7.60	87.02 ± 3.45	23.55 ± 9.81	55.33 ± 13.79
DA	85.79 ± 1.96	32.81 ± 12.14	19.11 ± 7.21	89.86 ± 1.24	23.85 ± 8.47	52.90 ± 12.36
LR	82.08 ± 2.43	46.57 ± 5.17	19.39 ± 5.93	84.74 ± 2.65	27.04 ± 6.92	62.74 ± 3.98
KNN	81.79 ± 4.53	29.09 ± 12.83	14.78 ± 9.41	85.71 ± 4.79	18.91 ± 9.81	48.70 ± 11.08

Table 3. Results for cervical cancer dataset using GA, DE, and PSO methods. The best F1-score and G-mean of each wrapper-based method are highlighted in bold.

GA
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	87.78 ± 1.55	23.45 ± 15.17	17.24 ± 7.62	92.87 ± 0.99	19.22 ± 10.05	43.94 ± 15.46
RF	89.24 ± 2.19	14.49 ± 13.56	16.89 ± 14.08	95.02 ± 1.80	14.77 ± 13.40	31.46 ± 19.73
NB	76.49 ± 7.77	35.58 ± 17.45	11.53 ± 4.05	79.75 ± 9.83	16.17 ± 4.51	50.28 ± 12.47
Adaboost	83.93 ± 2.84	45.53 ± 9.61	21.38 ± 6.01	86.88 ± 3.07	28.59 ± 7.23	62.56 ± 6.43
DA	85.79 ± 2.12	32.62 ± 12.96	19.11 ± 8.44	89.86 ± 1.38	23.83 ± 9.82	52.65 ± 12.89
LR	82.48 ± 1.90	43.71 ± 6.31	18.71 ± 5.07	85.45 ± 1.79	25.89 ± 5.88	60.98 ± 4.75
KNN	82.07 ± 2.93	19.19 ± 11.57	11.42 ± 6.57	86.74 ± 3.57	14.01 ± 7.77	36.29 ± 18.92
DE
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	80.87 ± 3.65	26.99 ± 10.31	12.44 ± 6.01	85.02 ± 4.02	16.45 ± 6.55	47.05 ± 8.88
RF	89.11 ± 2.60	9.06 ± 6.41	17.56 ± 17.09	95.14 ± 2.07	11.73 ± 9.34	25.82 ± 14.41
NB	82.87 ± 2.16	20.31 ± 12.22	10.91 ± 5.20	87.77 ± 3.34	13.36 ± 5.50	40.39 ± 11.37
Adaboost	83.93 ± 2.02	41.90 ± 3.69	19.68 ± 3.23	87.13 ± 1.52	26.60 ± 3.47	60.37 ± 2.96
DA	85.13 ± 1.54	36.81 ± 16.40	19.37 ± 8.10	88.88 ± 2.05	24.69 ± 9.75	55.20 ± 14.43
LR	84.07 ± 2.86	34.62 ± 13.17	16.97 ± 5.97	87.85 ± 3.02	22.42 ± 8.00	53.50 ± 12.66
KNN	81.94 ± 4.65	29.58 ± 10.12	15.92 ± 9.61	85.94 ± 5.52	19.83 ± 9.98	49.71 ± 9.08
PSO
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	85.12 ± 3.39	22.99 ± 13.62	15.51 ± 6.67	89.88 ± 4.52	16.94 ± 6.11	43.70 ± 10.78
RF	88.45 ± 3.16	7.64 ± 7.40	14.43 ± 15.72	94.58 ± 2.90	9.60 ± 9.92	20.55 ± 17.66
NB	77.68 ± 8.67	35.58 ± 17.45	12.43 ± 3.90	81.01 ± 10.55	17.14 ± 4.77	50.67 ± 12.66
Adaboost	82.21 ± 2.38	40.26 ± 18.11	16.44 ± 7.18	85.46 ± 3.24	22.88 ± 10.24	56.36 ± 15.27
DA	84.86 ± 1.06	40.05 ± 16.74	19.75 ± 8.47	88.30 ± 1.37	26.00 ± 10.63	57.45 ± 15.01
LR	82.21 ± 2.09	41.90 ± 3.69	17.96 ± 4.54	85.30 ± 2.11	24.80 ± 4.67	59.73 ± 3.00
KNN	81.28 ± 2.25	21.01 ± 14.48	11.42 ± 7.59	85.76 ± 2.90	14.50 ± 9.51	37.44 ± 20.48

Table 4. Results for cervical cancer dataset with the Rf method. The best F1-score and G-mean of each classifier are highlight in bold.

Top 50% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	80.62 ± 3.25	15.92 ± 13.83	7.27 ± 4.59	85.42 ± 3.65	9.80 ± 6.89	31.16 ± 18.86
RF	91.10 ± 1.37	8.29 ± 7.90	14.71 ± 12.32	97.43 ± 0.71	10.26 ± 9.26	21.52 ± 18.44
NB	66.90 ± 12.20	24.16 ± 20.15	5.51 ± 3.56	70.14 ± 14.38	8.51 ± 5.95	33.09 ± 20.11
Adaboost	70.52 ± 5.41	35.38 ± 16.74	8.95 ± 4.72	73.16 ± 6.63	13.99 ± 6.71	48.60 ± 12.10
DA	76.36 ± 2.13	40.83 ± 15.64	12.67 ± 4.65	79.02 ± 3.05	19.13 ± 7.09	55.48 ± 11.02
LR	71.04 ± 5.74	35.38 ± 16.74	9.29 ± 5.30	73.74 ± 6.98	14.34 ± 7.28	48.82 ± 12.36
KNN	76.23 ± 2.39	37.19 ± 12.83	11.94 ± 4.23	79.15 ± 2.61	17.93 ± 6.31	53.50 ± 8.82
Top 60% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.69 ± 5.16	19.64 ± 9.09	9.78 ± 1.72	86.44 ± 5.98	12.09 ± 2.42	39.62 ± 9.02
RF	91.10 ± 1.91	9.14 ± 10.48	15.71 ± 13.93	97.43 ± 0.97	10.79 ± 10.68	22.03 ± 20.03
NB	74.09 ± 6.15	29.51 ± 18.94	8.43 ± 3.94	77.59 ± 7.70	12.67 ± 6.55	44.85 ± 14.04
Adaboost	74.64 ± 4.60	35.19 ± 11.07	10.70 ± 3.91	77.59 ± 5.56	16.14 ± 5.31	51.27 ± 7.34
DA	79.15 ± 2.41	32.99 ± 17.28	12.08 ± 7.27	82.74 ± 2.91	17.35 ± 9.91	46.47 ± 23.58
LR	74.91 ± 5.36	37.01 ± 11.83	11.43 ± 4.62	77.74 ± 6.27	17.15 ± 6.23	52.58 ± 8.21
KNN	77.03 ± 2.35	43.69 ± 8.66	13.84 ± 3.00	79.57 ± 2.45	20.84 ± 4.23	58.67 ± 5.49
Top 70% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	83.14 ± 4.65	16.60 ± 4.45	10.10 ± 2.91	88.12 ± 4.78	12.21 ± 3.22	37.74 ± 4.96
RF	91.37 ± 1.51	5.43 ± 7.79	18.33 ± 26.03	97.87 ± 1.44	8.38 ± 12.00	14.20 ± 18.31
NB	79.00 ± 7.40	23.69 ± 21.42	10.05 ± 6.08	83.31 ± 9.49	11.70 ± 6.37	39.49 ± 15.51
Adaboost	77.83 ± 1.87	27.77 ± 7.61	10.15 ± 3.22	81.56 ± 2.10	14.75 ± 4.46	47.05 ± 6.23
DA	80.88 ± 2.11	37.58 ± 14.32	15.37 ± 6.51	84.16 ± 1.63	21.61 ± 8.91	55.24 ± 11.26
LR	79.15 ± 1.15	36.44 ± 9.69	13.28 ± 3.75	82.44 ± 1.16	19.25 ± 5.08	54.19 ± 7.97
KNN	76.63 ± 2.82	42.47 ± 7.32	13.73 ± 4.06	79.14 ± 2.64	20.66 ± 5.65	57.77 ± 5.90
Top 80% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	83.41 ± 4.66	20.42 ± 5.30	12.22 ± 4.15	88.11 ± 4.40	15.07 ± 4.52	42.05 ± 5.37
RF	90.84 ± 1.14	8.29 ± 7.90	12.57 ± 11.23	97.15 ± 0.97	9.67 ± 8.96	21.45 ± 18.39
NB	79.00 ± 7.40	23.69 ± 21.42	10.05 ± 6.08	83.31 ± 9.49	11.70 ± 6.37	39.49 ± 15.51
Adaboost	78.75 ± 2.33	34.62 ± 8.62	12.50 ± 2.97	82.14 ± 2.30	18.18 ± 4.09	52.78 ± 7.17
DA	81.54 ± 2.17	37.58 ± 14.32	15.88 ± 6.41	84.88 ± 2.10	22.09 ± 8.85	55.43 ± 11.16
LR	79.42 ± 1.55	36.44 ± 9.69	13.43 ± 3.52	82.72 ± 1.74	19.40 ± 4.87	54.25 ± 7.86
KNN	76.76 ± 2.75	42.47 ± 7.32	13.77 ± 3.99	79.28 ± 2.54	20.70 ± 5.56	57.82 ± 5.83

Table 5. Results for cervical cancer dataset with the VT method. The best F1-score and G-mean of each classifier are highlight in bold.

Top 50% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	79.82 ± 3.23	10.68 ± 11.11	4.69 ± 4.36	84.98 ± 3.13	6.48 ± 6.24	22.39 ± 19.66
RF	91.10 ± 1.37	3.43 ± 4.30	10.67 ± 13.73	97.72 ± 0.81	5.18 ± 6.53	11.55 ± 14.23
NB	69.02 ± 13.45	20.16 ± 16.23	5.68 ± 3.85	72.69 ± 15.51	8.30 ± 5.62	31.18 ± 17.80
Adaboost	68.66 ± 5.64	31.77 ± 12.61	7.59 ± 3.34	71.43 ± 6.82	12.05 ± 4.90	45.86 ± 9.81
DA	77.03 ± 1.83	32.13 ± 19.25	10.52 ± 6.80	80.47 ± 2.77	15.63 ± 9.85	44.80 ± 23.60
LR	69.32 ± 5.64	31.77 ± 12.61	7.78 ± 3.42	72.15 ± 6.82	12.27 ± 4.98	46.09 ± 9.87
KNN	75.30 ± 1.90	31.77 ± 5.18	10.23 ± 2.83	78.58 ± 2.11	15.34 ± 3.82	49.80 ± 4.06
Top 60% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	83.94 ± 3.44	13.92 ± 8.38	9.24 ± 4.69	89.31 ± 3.67	10.57 ± 5.66	33.91 ± 9.48
RF	90.44 ± 1.08	8.29 ± 7.90	11.86 ± 10.26	96.72 ± 0.81	9.41 ± 8.56	21.42 ± 18.35
NB	77.28 ± 6.27	29.40 ± 20.01	9.91 ± 4.34	80.85 ± 8.04	13.99 ± 6.94	45.01 ± 14.45
Adaboost	78.75 ± 0.73	36.44 ± 9.69	13.11 ± 4.03	82.03 ± 1.70	18.96 ± 5.15	54.02 ± 7.78
DA	79.95 ± 1.51	36.81 ± 16.40	13.69 ± 5.80	83.31 ± 1.99	19.62 ± 8.22	53.43 ± 14.03
LR	78.75 ± 1.06	38.26 ± 11.82	13.55 ± 4.68	81.88 ± 1.55	19.72 ± 6.34	55.15 ± 9.17
KNN	77.56 ± 1.13	37.22 ± 5.54	12.89 ± 3.56	80.58 ± 1.31	18.99 ± 4.64	54.63 ± 4.31
Top 70% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	86.46 ± 1.34	17.56 ± 7.41	14.02 ± 4.87	91.76 ± 2.63	14.85 ± 5.37	39.12 ± 8.40
RF	90.84 ± 1.94	10.10 ± 6.74	22.50 ± 22.91	97.00 ± 1.43	13.17 ± 9.95	27.45 ± 15.11
NB	74.11 ± 3.96	35.40 ± 17.65	9.90 ± 4.24	77.02 ± 5.69	15.15 ± 6.55	49.75 ± 12.63
Adaboost	79.41 ± 1.00	38.44 ± 11.14	14.06 ± 4.43	82.59 ± 1.26	20.30 ± 5.83	55.60 ± 8.94
DA	82.60 ± 1.28	38.05 ± 14.71	16.76 ± 7.63	86.03 ± 1.79	22.85 ± 9.61	55.46 ± 13.81
LR	79.94 ± 1.58	38.44 ± 11.14	14.48 ± 4.57	83.15 ± 0.88	20.82 ± 6.14	55.83 ± 9.22
KNN	77.82 ± 0.91	43.90 ± 7.93	14.75 ± 4.67	80.29 ± 0.79	21.96 ± 6.28	59.11 ± 5.96
Top 80% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	86.85 ± 0.74	17.56 ± 7.41	14.25 ± 4.47	92.18 ± 2.04	15.12 ± 5.34	39.22 ± 8.44
RF	90.84 ± 1.13	10.49 ± 6.62	18.88 ± 11.84	97.02 ± 1.17	12.73 ± 7.50	28.09 ± 14.98
NB	78.21 ± 7.51	29.40 ± 20.01	12.00 ± 5.83	81.85 ± 9.33	14.74 ± 6.59	45.17 ± 14.28
Adaboost	80.08 ± 1.29	34.62 ± 8.62	13.63 ± 4.15	83.59 ± 1.95	19.21 ± 4.95	53.25 ± 7.17
DA	83.53 ± 0.98	41.87 ± 13.45	19.02 ± 6.76	86.74 ± 1.49	25.71 ± 8.43	59.29 ± 10.54
LR	80.34 ± 0.96	34.62 ± 8.62	13.72 ± 3.91	83.87 ± 1.31	19.35 ± 4.70	53.35 ± 7.26
KNN	78.22 ± 1.00	43.90 ± 7.93	15.02 ± 4.79	80.72 ± 0.89	22.26 ± 6.37	59.26 ± 5.96

Table 6. Results for cervical cancer data with the MI method. The best F1-score and G-mean of each classifier are highlight in bold.

Top 50% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	86.19 ± 1.56	21.27 ± 11.72	13.90 ± 2.83	91.30 ± 1.72	16.22 ± 5.63	42.40 ± 11.66
RF	89.50 ± 3.03	9.74 ± 6.23	13.84 ± 9.21	95.55 ± 2.40	11.21 ± 7.25	26.78 ± 14.51
NB	68.41 ± 15.63	41.48 ± 18.37	10.78 ± 6.32	70.71 ± 18.03	15.74 ± 7.12	50.39 ± 12.31
Adaboost	80.75 ± 2.55	34.62 ± 8.62	14.55 ± 4.98	84.33 ± 3.29	19.97 ± 5.61	53.45 ± 7.08
DA	83.40 ± 3.27	30.81 ± 8.27	16.67 ± 7.18	87.49 ± 4.36	20.70 ± 6.65	51.31 ± 6.53
LR	80.88 ± 2.67	34.62 ± 8.62	14.65 ± 5.06	84.46 ± 3.08	20.11 ± 5.66	53.52 ± 7.26
KNN	78.74 ± 3.17	39.51 ± 13.39	14.57 ± 6.91	81.57 ± 3.17	21.09 ± 9.20	56.00 ± 9.67
Top 60% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	84.73 ± 3.37	23.27 ± 5.07	15.02 ± 4.74	89.44 ± 3.66	17.59 ± 3.28	45.32 ± 4.74
RF	89.51 ± 2.13	7.25 ± 7.36	10.39 ± 10.61	95.71 ± 1.53	8.43 ± 8.62	19.82 ± 17.27
NB	81.28 ± 6.39	24.16 ± 18.83	9.36 ± 7.99	85.58 ± 7.53	13.24 ± 11.16	38.22 ± 22.96
Adaboost	80.60 ± 2.00	38.05 ± 10.83	14.97 ± 5.20	83.85 ± 1.69	21.25 ± 6.60	55.79 ± 9.02
DA	84.59 ± 1.43	41.87 ± 13.45	20.83 ± 8.36	87.89 ± 2.10	27.13 ± 9.30	59.68 ± 10.62
LR	79.95 ± 1.14	36.62 ± 10.57	13.84 ± 3.77	83.30 ± 1.64	19.79 ± 4.84	54.50 ± 8.28
KNN	76.09 ± 2.51	36.65 ± 10.55	12.35 ± 5.88	79.04 ± 3.19	18.12 ± 7.38	53.30 ± 7.57
Top 70% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	87.12 ± 1.26	18.99 ± 5.78	15.73 ± 4.46	92.29 ± 1.51	16.72 ± 3.83	41.40 ± 5.70
RF	90.83 ± 2.64	6.68 ± 5.73	18.33 ± 18.56	97.27 ± 1.26	9.32 ± 7.90	19.70 ± 16.29
NB	78.74 ± 6.38	32.83 ± 18.95	12.13 ± 4.50	82.24 ± 7.69	16.67 ± 5.73	49.02 ± 13.46
Adaboost	79.02 ± 0.82	34.44 ± 9.78	12.71 ± 4.08	82.45 ± 1.65	18.26 ± 5.09	52.62 ± 7.76
DA	83.26 ± 2.64	39.87 ± 11.15	18.21 ± 5.83	86.55 ± 1.68	24.90 ± 7.51	58.06 ± 9.69
LR	80.75 ± 2.01	37.87 ± 9.99	15.30 ± 5.41	84.04 ± 3.00	21.32 ± 6.65	55.69 ± 8.06
KNN	78.63 ± 3.05	33.97 ± 10.14	13.65 ± 6.05	81.91 ± 3.38	19.26 ± 7.83	52.10 ± 9.66
Top 80% features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	86.32 ± 0.66	17.38 ± 6.63	13.43 ± 4.40	91.60 ± 2.06	14.61 ± 5.24	39.02 ± 7.90
RF	90.84 ± 1.29	10.10 ± 6.74	22.67 ± 17.56	97.02 ± 1.76	12.92 ± 9.11	27.45 ± 15.04
NB	77.55 ± 6.24	29.40 ± 20.01	9.87± 4.19	81.14 ± 8.09	14.02 ± 6.87	45.05 ± 14.39
Adaboost	79.28 ± 1.15	32.62 ± 8.29	12.37 ± 3.39	82.87 ± 1.82	17.65 ± 4.18	51.46 ± 6.66
DA	82.74 ± 2.87	39.87 ± 11.15	18.31 ± 7.25	86.05 ± 3.53	24.50 ± 8.60	57.79 ± 9.31
LR	80.74 ± 0.38	38.44 ± 11.14	14.91 ± 3.99	84.01 ± 1.35	21.18 ± 5.40	56.02 ± 8.76
KNN	77.55 ± 2.25	32.34 ± 8.85	11.69 ± 4.19	81.03 ± 2.85	16.80 ± 5.11	50.77 ± 6.51

Table 7. Results for cervical cancer dataset using the combination of the VTBHHO method and filter-based methods. The best F1-score and G-mean across all methods are highlight in bold.

Rf+VTBHHO
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	87.92 ± 2.01	39.01 ± 11.91	26.70 ± 5.78	91.74 ± 3.19	30.21 ± 3.60	59.02 ± 7.72
RF	89.24 ± 2.36	5.82 ± 7.92	13.00 ± 16.61	95.59 ± 2.26	8.00 ± 10.67	14.81 ± 18.71
NB	79.83 ± 6.62	35.97 ± 13.91	14.18 ± 4.60	83.14 ± 7.39	19.82 ± 6.43	53.29 ± 9.56
Adaboost	82.47 ± 2.60	41.53 ± 14.16	17.55 ± 5.44	85.57 ± 2.84	24.32 ± 7.76	58.57 ± 10.25
DA	84.47 ± 1.68	34.62 ± 8.62	18.50 ± 5.95	88.33 ± 2.65	23.43 ± 6.12	54.71 ± 7.23
LR	80.48 ± 1.18	45.53 ± 9.61	16.96 ± 4.70	83.16 ± 1.04	24.46 ± 6.20	61.23 ± 6.37
KNN	80.34 ± 3.19	16.99 ± 10.95	9.26 ± 6.52	85.02 ± 3.65	11.83 ± 7.91	33.45 ± 18.36
VT+VTBHHO
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	86.71 ± 3.47	31.97 ± 11.31	21.43 ± 6.96	21.43 ± 6.96	24.79 ± 8.03	52.95 ± 10.24
RF	87.64 ± 2.63	12.31 ± 7.20	13.67 ± 13.84	93.45 ± 3.09	11.97 ± 8.83	29.94 ± 15.82
NB	76.89 ± 8.15	35.58 ± 17.45	11.82 ± 3.70	80.19 ± 10.28	16.35 ± 4.15	50.35 ± 12.35
Adaboost	82.88 ± 2.91	34.81 ± 13.86	16.12 ± 6.58	86.61 ± 3.84	21.32 ± 8.10	53.12 ± 12.66
DA	84.33 ± 1.05	34.23 ± 12.80	17.20 ± 6.60	88.15 ± 1.56	22.53 ± 8.30	53.37 ± 12.46
LR	82.74 ± 2.25	41.90 ± 3.69	18.57 ± 4.61	85.87 ± 2.18	25.41 ± 4.87	59.94 ± 3.21
KNN	83.92 ± 3.03	26.91 ± 9.08	15.09 ± 6.31	88.11 ± 2.88	19.17 ± 7.22	47.88 ± 8.43
MI+VTBHHO
Classifier	Accuracy (%)	Sensitivity (%)	Specificity (%)	Precision (%)	F1-score (%)	G-mean (%)
SVM	86.05 ± 3.84	22.88 ± 12.13	17.08 ± 9.54	90.99 ± 3.28	18.78 ± 9.19	44.13 ± 12.35
RF	88.31 ± 1.88	12.60 ± 9.87	12.91 ± 6.89	94.16 ± 2.66	11.42 ± 6.58	29.55 ± 17.19
NB	82.19 ± 8.28	31.87 ± 18.72	14.59 ± 6.63	85.95 ± 9.72	18.96 ± 9.07	48.66 ± 15.49
Adaboost	82.34 ± 1.66	45.14 ± 9.55	18.78 ± 5.35	85.16 ± 1.63	26.26 ± 6.84	61.69 ± 6.34
DA	85.52 ± 0.98	30.81 ± 11.60	18.18 ± 7.85	89.74 ± 1.54	22.29 ± 8.42	51.21 ± 11.67
LR	83.66 ± 2.51	37.51 ± 12.32	18.03 ± 4.63	87.15 ± 3.63	23.62 ± 5.96	56.13 ± 8.74
KNN	84.06 ± 2.86	22.05 ± 7.55	13.03 ± 4.32	88.69 ± 2.84	16.12 ± 4.83	43.58 ± 6.69

Table 8. The feature selection results when using Rf+VTBHHO as the feature selection algorithm and SVM as the classifier.

Number	Features	Selected Features
Number	Features	1st-Fold	2nd-Fold	3rd-Fold	4th-Fold	5th-Fold	Total
1	Age	✓					1
2	Number of sexual partners					✓	1
3	First sexual intercourse			✓			1
4	Num of pregnancies			✓	✓	✓	3
5	Smokes			✓			1
6	Smokes (years)			✓	✓		2
7	Smokes (packs/year)			✓			1
8	Hormonal Contraceptives				✓		1
9	Hormonal Contraceptives (years)	✓	✓	✓		✓	4
10	IUD				✓		1
11	IUD (years)	✓					1
12	STDs	✓		✓	✓	✓	4
13	STDs (number)		✓	✓	✓		3
14	STDs:condylomatosis						0
15	STDs:cervical condylomatosis						0
16	STDs:vaginal condylomatosis	✓				✓	2
17	STDs:vulvo-perineal condylotosis	✓				✓	2
18	STDs:syphilis	✓		✓	✓	✓	4
19	STDs:pelvic inflammatory disease	✓				✓	2
20	STDs:genital herpes		✓	✓			2
21	STDs:molluscum contagiosum	✓	✓	✓		✓	4
22	STDs:AIDS		✓	✓			2
23	STDs:HIV	✓			✓	✓	3
24	STDs:Hepatitis B			✓	✓	✓	3
25	STDs:HPV		✓	✓			2
26	STDs:Number of diagnosis			✓	✓		2
27	Dx:Cancer	✓					1
28	Dx:CIN			✓			1
29	Dx:HPV			✓			1
30	Dx	✓	✓	✓	✓	✓	5
		11	8	18	12	11

Table 9. Results for Cleveland dataset using GA, DE, and PSO methods as well as all features. The best F1-score and G-mean of each wrapper-based method are highlight in bold.

All features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	84.49 ± 3.36	81.27 ± 3.36	84.42 ± 4.05	86.91 ± 5.54	82.75 ± 2.85	83.98 ± 3.30
RF	81.85 ± 3.60	77.19 ± 3.51	82.70 ± 4.82	85.58 ± 6.45	79.69 ± 2.00	81.18 ± 3.14
NB	80.86 ± 4.48	79.64 ± 6.90	78.66 ± 7.16	81.50 ± 7.16	78.95 ± 5.76	80.40 ± 5.00
Adaboost	83.51 ± 3.69	81.73 ± 3.41	82.08 ± 6.01	84.50 ± 5.94	81.85 ± 4.43	83.04 ± 3.62
DA	84.15 ± 2.96	80.86 ± 5.56	84.11 ± 4.02	86.06 ± 6.05	82.25 ± 2.98	83.26 ± 2.83
LR	84.50 ± 3.95	81.27 ± 4.11	84.76 ± 4.38	86.66 ± 6.49	82.91 ± 3.54	83.83 ± 3.80
KNN	83.15 ± 3.78	82.69 ± 2.76	81.20 ± 4.91	83.08 ± 6.38	81.86 ± 3.07	82.81 ± 3.51
GA
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.85 ± 2.92	79.44 ± 3.83	81.13 ± 4.67	83.93 ± 6.12	80.07 ± 1.33	81.55 ± 2.91
RF	79.21 ± 3.69	73.06 ± 5.26	80.29 ± 2.61	84.44 ± 4.41	76.34 ± 2.42	78.46 ± 3.37
NB	81.51 ± 4.98	79.73 ± 3.51	80.08 ± 9.05	82.86 ± 8.13	79.72 ± 5.44	81.19 ± 4.83
Adaboost	83.50 ± 2.73	81.93 ± 4.12	81.94 ± 3.96	84.50 ± 4.85	81.86 ± 3.13	83.13 ± 2.99
DA	84.15 ± 3.09	81.73 ± 4.01	83.18 ± 3.35	85.60 ± 4.88	82.42 ± 3.36	83.58 ± 3.11
LR	82.86 ± 5.32	80.64 ± 3.82	82.53 ± 5.78	84.05 ± 9.71	81.45 ± 3.82	82.16 ± 5.39
KNN	82.51 ± 3.01	78.94 ± 4.67	82.24 ± 8.01	85.37 ± 6.95	80.28 ± 4.29	81.95 ± 3.23
DE
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	82.19 ± 2.78	80.77 ± 2.35	80.69 ± 3.99	83.29 ± 5.56	80.62 ± 1.29	81.95 ± 2.80
RF	80.85 ± 3.87	73.82 ± 6.52	83.62 ± 6.12	86.92 ± 6.78	77.99 ± 2.70	79.90 ± 3.53
NB	82.49 ± 4.46	79.56 ± 4.67	82.13 ± 7.63	84.35 ± 9.10	80.59 ± 4.37	81.73 ± 4.46
Adaboost	82.51 ± 3.01	81.93 ± 4.12	80.08 ± 5.79	82.88 ± 4.67	80.90 ± 4.12	82.35 ± 3.27
DA	84.49 ± 2.24	83.27 ± 4.35	82.83 ± 5.92	85.29 ± 5.30	82.86 ± 3.43	84.17 ± 2.41
LR	81.86 ± 4.11	80.64 ± 3.82	80.54 ± 6.47	82.56 ± 9.10	80.34 ± 2.84	81.40 ± 4.35
KNN	81.20 ± 2.97	82.31 ± 3.19	78.25 ± 7.89	80.89 ± 6.12	79.88 ± 3.51	81.49 ± 2.59
PSO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	82.19 ± 2.57	80.11 ± 2.89	81.26 ± 5.88	84.06 ± 6.49	80.45 ± 1.44	81.95 ± 2.63
RF	80.86 ± 4.69	77.24 ± 7.38	80.73 ± 4.18	83.73 ± 6.46	78.70 ± 4.13	80.23 ± 4.41
NB	79.21 ± 7.53	74.94 ± 6.23	79.62 ± 9.12	82.26 ± 11.28	77.00 ± 6.47	78.34 ± 7.35
Adaboost	82.86 ± 6.50	80.93 ± 6.08	82.11 ± 6.06	84.20 ± 8.18	81.47 ± 5.77	82.49 ± 6.59
DA	84.16 ± 1.92	82.40 ± 3.94	82.48 ± 3.94	85.19 ± 3.68	82.40 ± 3.44	83.72 ± 2.25
LR	84.84 ± 4.30	81.31 ± 3.94	85.30 ± 4.52	87.33 ± 6.28	83.22 ± 3.90	84.20 ± 4.19
KNN	82.50 ± 3.62	79.74 ± 3.53	81.32 ± 6.76	84.82 ± 3.83	80.46 ± 4.99	82.24 ± 3.63

Table 10. Results for Cleveland dataset using proposed BHHO variants. The best F1-score and G-mean across all method are highlight in bold.

VTBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.85 ± 2.92	79.44 ± 3.83	81.13 ± 4.67	83.93 ± 6.12	80.07 ± 1.33	81.55 ± 2.91
RF	81.18 ± 3.46	77.61 ± 4.65	80.59 ± 3.20	83.75 ± 4.86	79.00 ± 3.29	80.55 ± 3.36
NB	82.82 ± 4.93	78.73 ± 6.60	83.11 ± 7.32	85.56 ± 8.43	80.61 ± 5.42	81.88 ± 5.11
Adaboost	84.50 ± 4.22	82.80 ± 5.36	83.44 ± 4.34	85.58 ± 5.70	83.07 ± 4.38	84.11 ± 4.44
DA	85.15 ± 2.29	81.73 ± 4.01	85.22 ± 5.44	87.56 ± 5.94	83.29 ± 3.02	84.48 ± 2.40
LR	83.19 ± 5.28	80.64 ± 3.82	83.12 ± 5.92	84.72 ± 9.75	81.73 ± 3.85	82.48 ± 5.40
KNN	80.86 ± 4.25	81.16 ± 5.06	78.56 ± 11.29	80.90 ± 9.76	79.27 ± 5.59	80.76 ± 4.01
RfBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	79.89 ± 4.13	78.61 ± 3.31	78.41 ± 8.62	81.23 ± 8.22	78.14 ± 4.11	79.75 ± 3.76
RF	79.86 ± 3.26	75.73 ± 4.69	79.55 ± 4.52	83.36 ± 4.43	77.42 ± 2.85	79.38 ± 3.07
NB	82.49 ± 4.46	78.06 ± 6.26	83.08 ± 7.27	85.56 ± 8.43	80.21 ± 4.80	81.51 ± 4.57
Adaboost	83.50 ± 2.73	81.93 ± 4.12	81.94 ± 3.96	84.50 ± 4.85	81.86 ± 3.13	83.13 ± 2.99
DA	84.15 ± 3.09	81.73 ± 4.01	83.18 ± 3.35	85.60 ± 4.88	82.42 ± 3.36	83.58 ± 3.11
LR	83.19 ± 5.28	80.64 ± 3.82	83.12 ± 5.92	84.72 ± 9.75	81.73 ± 3.85	82.48 ± 5.40
KNN	81.19 ± 4.13	79.91 ± 4.88	79.80 ± 11.41	82.97 ± 8.79	79.28 ± 5.59	81.21 ± 3.82
MIBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.85 ± 2.92	79.44 ± 3.83	81.13 ± 4.67	83.93 ± 6.12	80.07 ± 1.33	81.55 ± 2.91
RF	79.54 ± 3.98	73.06 ± 5.26	81.10 ± 3.24	84.98 ± 5.07	76.69 ± 2.41	78.70 ± 3.48
NB	82.82 ± 4.93	78.73 ± 6.60	83.11 ± 7.32	85.56 ± 8.43	80.61 ± 5.42	81.88 ± 5.11
Adaboost	83.83 ± 2.41	81.93 ± 4.12	82.42 ± 3.76	85.19 ± 3.68	82.11 ± 3.17	83.49 ± 2.69
DA	84.15 ± 3.09	80.40 ± 3.17	84.45 ± 5.84	86.89 ± 6.53	82.24 ± 3.07	83.48 ± 2.96
LR	83.19 ± 5.28	81.27 ± 4.11	82.89 ± 6.35	84.03 ± 11.11	81.87 ± 3.67	82.36 ± 5.61
KNN	80.53 ± 3.36	81.20 ± 4.44	77.36 ± 8.81	80.34 ± 6.20	78.92 ± 5.19	80.65 ± 3.24

Table 11. Results for Z-Alizadeh Sani dataset using GA, DE, and PSO methods as well as all features. The best F1-score and G-mean of each FS method are highlight in bold.

All features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	84.83 ± 4.82	92.70 ± 5.35	86.98 ± 2.95	65.68 ± 6.65	89.67 ± 3.34	77.95 ± 5.40
RF	86.80 ± 4.90	92.69 ± 5.34	89.67 ± 4.65	72.10 ± 16.20	90.98 ± 3.16	81.02 ± 9.86
NB	83.48 ± 4.40	86.20 ± 7.19	90.66 ± 3.63	76.79 ± 12.48	88.10 ± 3.33	80.82 ± 6.13
Adaboost	82.51 ± 2.85	85.22 ± 2.92	89.81 ± 2.70	75.88 ± 6.35	87.41 ± 2.05	80.34 ± 3.72
DA	85.14 ± 3.83	85.78 ± 6.45	93.31 ± 3.79	84.10 ± 10.61	89.13 ± 2.93	84.56 ± 4.59
LR	83.49 ± 4.46	82.48 ± 3.96	93.76 ± 3.06	85.85 ± 7.26	87.72 ± 3.17	84.10 ± 4.97
KNN	66.97 ± 6.53	59.83 ± 8.49	91.07 ± 3.86	85.07 ± 7.50	71.80 ± 6.55	71.02 ± 5.46
GA
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	87.14 ± 3.36	92.60 ± 3.87	89.81 ± 3.16	73.52 ± 9.34	91.11 ± 2.37	82.30 ± 5.24
RF	85.82 ± 3.01	92.65 ± 4.15	88.50 ± 4.61	68.99 ± 15.33	90.35 ± 1.78	79.23 ± 8.58
NB	82.52 ± 4.89	87.08 ± 5.00	88.82 ± 5.77	71.18 ± 18.73	87.70 ± 3.00	77.71 ± 10.93
Adaboost	84.82 ± 1.91	86.15 ± 2.21	92.24 ± 3.00	81.59 ± 8.60	89.03 ± 1.02	83.67 ± 4.10
DA	82.17 ± 3.24	82.98 ± 4.28	91.53 ± 3.17	80.48 ± 8.66	86.92 ± 2.17	81.53 ± 4.35
LR	85.13 ± 4.37	85.69 ± 6.85	93.31 ± 3.62	83.43 ± 11.05	89.11 ± 3.24	84.17 ± 5.01
KNN	83.80 ± 4.19	83.77 ± 6.34	93.10 ± 3.30	83.34 ± 9.28	87.99 ± 3.11	83.29 ± 4.22
DE
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	85.15 ± 4.19	92.67 ± 4.70	87.38 ± 2.64	66.68 ± 6.85	89.88 ± 2.86	78.51 ± 5.03
RF	86.81 ± 4.03	93.11 ± 3.94	89.30 ± 4.64	71.10 ± 15.84	91.02 ± 2.48	80.67 ± 9.44
NB	83.19 ± 3.74	88.02 ± 2.36	88.52 ± 3.90	71.39 ± 10.68	88.22 ± 2.39	79.05 ± 6.49
Adaboost	83.85 ± 4.26	87.50 ± 4.22	89.83 ± 4.75	74.90 ± 13.36	88.53 ± 3.06	80.54 ± 7.33
DA	85.80 ± 4.02	85.73 ± 4.08	94.16 ± 3.94	85.99 ± 11.02	89.64 ± 2.71	85.62 ± 5.87
LR	87.11 ± 2.92	86.18 ± 4.08	95.46 ± 1.66	89.52 ± 4.59	90.51 ± 2.06	87.77 ± 2.59
KNN	76.21 ± 6.29	75.63 ± 7.92	89.85 ± 4.28	78.01 ± 11.31	81.85 ± 4.95	76.45 ± 6.69
PSO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	84.17 ± 3.33	90.33 ± 1.98	88.05 ± 4.34	69.19 ± 11.61	89.10 ± 2.09	78.77 ± 6.49
RF	85.15 ± 2.96	89.87 ± 3.82	89.76 ± 4.10	73.28 ± 13.98	89.66 ± 1.68	80.61 ± 7.52
NB	81.85 ± 2.92	84.76 ± 2.40	89.48 ± 3.61	74.61 ± 10.93	86.98 ± 1.69	79.25 ± 5.97
Adaboost	84.48 ± 4.33	86.69 ± 5.66	91.33 ± 2.44	79.21 ± 6.91	88.82 ± 3.16	82.73 ± 4.35
DA	84.81 ± 4.63	85.34 ± 6.13	93.21 ± 4.01	84.10 ± 10.61	88.89 ± 3.40	84.42 ± 5.55
LR	85.49 ± 2.39	85.18 ± 4.91	94.20 ± 3.71	86.21 ± 10.93	89.28 ± 1.93	85.34 ± 4.38
KNN	82.18 ± 3.93	84.69 ± 2.46	90.12 ± 4.91	74.70 ± 15.22	87.23 ± 2.43	79.06 ± 7.83

Table 12. Results for Z-Alizadeh Sani cancer dataset using proposed BHHO variants. The best F1-score and G-mean across all methods are highlight in bold.

VTBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	85.49 ± 5.00	90.82 ± 4.65	89.26 ± 4.48	72.70 ± 11.47	89.93 ± 3.44	81.00 ± 7.10
RF	86.47 ± 4.10	92.27 ± 6.47	89.78 ± 5.11	72.66 ± 16.94	90.69 ± 2.72	80.96 ± 9.24
NB	82.85 ± 5.11	87.54 ± 4.99	88.86 ± 5.80	71.18 ± 18.73	87.97 ± 3.17	77.94 ± 11.08
Adaboost	84.84 ± 3.75	87.07 ± 4.67	91.36 ± 3.21	79.48 ± 8.36	89.07 ± 2.90	83.02 ± 4.75
DA	85.46 ± 6.19	87.68 ± 7.79	91.90 ± 3.37	80.32 ± 9.62	89.53 ± 4.62	83.67 ± 6.40
LR	83.80 ± 4.71	82.05 ± 7.86	94.88 ± 2.18	88.56 ± 6.36	87.70 ± 3.91	84.97 ± 2.72
KNN	81.52 ± 4.35	81.02 ± 4.39	92.36 ± 4.23	82.39 ± 10.44	86.21 ± 3.14	81.50 ± 5.56
RfBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	84.83 ± 6.10	91.76 ± 6.41	87.67 ± 3.82	67.90 ± 9.84	89.56 ± 4.24	78.73 ± 7.30
RF	84.16 ± 3.88	90.83 ± 5.41	87.74 ± 3.92	67.90 ± 11.39	89.09 ± 2.68	78.16 ± 6.09
NB	82.20 ± 4.75	87.10 ± 3.68	88.35 ± 5.81	70.31 ± 18.24	87.53 ± 2.83	77.33 ± 10.93
Adaboost	84.15 ± 3.93	86.68 ± 6.03	90.94 ± 2.89	78.25 ± 8.03	88.58 ± 2.96	82.13 ± 4.10
DA	83.16 ± 5.64	83.42 ± 6.49	92.46 ± 3.62	82.54 ± 10.05	87.55 ± 4.26	82.75 ± 6.29
LR	84.80 ± 4.91	84.77 ± 4.05	93.51 ± 3.56	84.29 ± 10.26	88.88 ± 3.32	84.40 ± 6.40
KNN	85.78 ± 4.10	86.19 ± 5.72	93.68 ± 3.20	84.83 ± 9.00	89.62 ± 2.95	85.29 ± 4.60
MIBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	84.49 ± 2.25	89.84 ± 3.57	88.63 ± 1.70	71.24 ± 4.68	89.17 ± 1.64	79.92 ± 2.45
RF	86.14 ± 3.56	91.27 ± 5.08	89.91 ± 4.22	73.50 ± 14.16	90.40 ± 2.35	81.31 ± 7.64
NB	80.20 ± 3.41	85.20 ± 3.01	87.24 ± 4.89	67.64 ± 16.55	86.05 ± 1.74	75.07 ± 9.66
Adaboost	85.80 ± 3.61	87.51 ± 5.01	92.21 ± 1.73	81.61 ± 3.96	89.72 ± 2.76	84.44 ± 2.86
DA	85.47 ± 3.72	85.77 ± 5.28	93.74 ± 3.95	85.21 ± 10.76	89.38 ± 2.64	85.18 ± 5.13
LR	82.15 ± 6.15	81.58 ± 6.66	92.66 ± 3.01	83.45 ± 7.85	86.65 ± 4.65	82.42 ± 6.10
KNN	86.13 ± 4.01	86.53 ± 4.59	94.14 ± 5.74	84.41 ± 17.08	89.95 ± 2.62	84.79 ± 8.38

Table 13. Results for Parkinson dataset using GA, DE, and PSO methods as well as all features. The best F1-score and G-mean of each FS methods are highlight in bold.

All features
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.67 ± 4.25	78.55 ± 4.13	84.23+10.64	85.89 ± 8.69	80.86 ± 5.09	81.98 ± 4.30
RF	81.25 ± 6.59	77.26 ± 9.92	83.46 ± 11.03	85.79 ± 8.16	79.84 ± 8.90	81.19 ± 7.17
NB	75.00 ± 5.59	73.06 ± 8.08	76.95 ± 11.82	78.63 ± 10.09	74.24 ± 6.35	75.50 ± 5.84
Adaboost	77.50 ± 3.33	75.23 ± 2.74	79.47 ± 10.55	80.96 ± 9.53	76.78 ± 3.99	77.82 ± 3.52
DA	79.17 ± 6.32	76.31 ± 8.79	79.96 ± 9.01	82.13 ± 5.25	78.00 ± 8.42	79.11 ± 6.81
LR	72.50 ± 4.04	74.78 ± 5.39	71.76 ± 9.82	70.67 ± 10.40	72.76 ± 5.42	72.32 ± 4.32
KNN	82.08 ± 5.20	80.94 ± 8.12	82.69 ± 7.28	83.62 ± 5.32	81.52 ± 6.27	82.14 ± 5.30
GA
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.25 ± 4.56	78.55 ± 4.13	83.67 ± 11.38	85.18 ± 9.65	80.52 ± 5.37	81.60 ± 4.55
RF	79.17 ± 5.27	74.59 ± 5.86	81.48 ± 10.76	84.28 ± 7.80	77.64 ± 7.36	79.20 ± 5.82
NB	70.42 ± 4.04	68.80 ± 13.46	71.47 ± 6.71	72.85 ± 6.99	69.17 ± 6.90	70.12 ± 4.90
Adaboost	74.17 ± 6.80	73.79 ± 5.32	74.41 ± 12.51	75.39 ± 10.97	73.68 ± 8.11	74.38 ± 6.89
DA	77.50 ± 4.25	77.30 ± 5.38	77.17 ± 8.04	77.88 ± 4.86	77.06 ± 5.88	77.53 ± 4.25
LR	75.83 ± 5.53	77.10 ± 6.51	75.11 ± 10.86	75.10 ± 9.76	75.66 ± 7.10	75.83 ± 5.47
KNN	82.08 ± 2.83	81.29 ± 7.45	82.98 ± 6.59	83.84 ± 5.01	81.67 ± 3.65	82.35 ± 2.70
DE
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	82.92 ± 4.45	81.18 ± 6.17	85.01 ± 10.82	86.00 ± 10.05	82.40 ± 5.10	83.24 ± 4.33
RF	80.42 ± 4.49	77.39 ± 7.25	81.95 ± 8.06	83.88 ± 4.91	79.34 ± 6.22	80.45 ± 4.64
NB	75.00 ± 4.93	71.45 ± 10.73	79.07 ± 13.91	80.52 ± 13.34	73.70 ± 6.13	75.09 ± 5.11
Adaboost	78.33 ± 5.98	77.65 ± 4.45	79.45 ± 12.31	80.43 ± 11.02	77.97 ± 6.67	78.79 ± 5.85
DA	79.58 ± 5.80	78.06 ± 7.44	79.74 ± 8.59	81.23 ± 4.89	78.80 ± 7.56	79.60 ± 6.05
LR	75.83 ± 3.39	75.72 ± 6.63	76.26 ± 10.67	76.72 ± 10.53	75.39 ± 5.24	75.83 ± 3.81
KNN	82.92 ± 4.25	79.59 ± 8.04	85.83 ± 9.19	87.43 ± 7.57	82.03 ± 5.35	83.15 ± 4.40
PSO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.25 ± 5.43	77.79 ± 6.46	83.98± 10.93	85.89 ± 8.69	80.32 ± 6.44	81.57 ± 5.52
RF	78.75 ± 4.04	72.47 ± 7.68	83.00 ± 10.22	85.88 ± 8.04	76.83 ± 6.29	78.62 ± 4.82
NB	73.75 ± 3.86	74.07 ± 8.25	75.57 ± 11.18	74.68 ± 11.61	73.73 ± 2.69	73.81 ± 3.25
Adaboost	74.58 ± 6.37	74.09 ± 5.43	75.94 ± 14.70	76.62 ± 14.38	74.16 ± 7.31	74.88 ± 6.60
DA	78.75 ± 5.34	80.04 ± 8.08	77.71 ± 8.35	78.10 ± 5.07	78.57 ± 6.69	78.96 ± 5.29
LR	76.25 ± 4.08	75.92 ± 5.57	77.28 ± 10.08	77.32 ± 10.39	75.95 ± 4.64	76.24 ± 3.72
KNN	79.17 ± 3.73	78.00 ± 6.35	79.19 ± 7.82	80.53 ± 4.85	78.41 ± 5.95	79.16 ± 4.10

Table 14. Results for Parkinson dataset using proposed BHHO variants. The best F1-score and G-mean across all methods are highlight in bold.

VTBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	82.08 ± 3.39	78.86 ± 6.40	85.12 ± 9.59	86.66 ± 8.00	81.29 ± 4.00	82.43 ± 3.36
RF	79.58 ± 3.58	75.23 ± 9.48	83.20 ± 10.58	85.08 ± 9.16	78.17 ± 5.50	79.57 ± 4.30
NB	72.50 ± 4.25	69.59 ± 8.76	74.91 ± 11.54	77.03 ± 10.87	71.23 ± 5.87	72.71 ± 4.96
Adaboost	79.17 ± 7.10	79.33 ± 9.46	80.49 ± 14.74	80.33 ± 14.29	78.92 ± 8.18	79.25 ± 7.19
DA	77.08 ± 8.84	74.44 ± 13.13	77.05 ± 11.09	79.59 ± 5.53	75.58 ± 11.80	76.82 ± 9.45
LR	78.75 ± 7.84	78.48 ± 7.49	78.36 ± 13.16	79.69 ± 11.87	78.06 ± 9.32	78.88 ± 8.11
KNN	81.25 ± 2.64	77.41 ± 2.99	83.85 ± 7.90	85.44 ± 6.88	80.25 ± 3.56	81.22 ± 2.94
RfBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	82.08 ± 3.63	78.55 ± 4.13	85.12 ± 10.31	86.71 ± 8.78	81.21 ± 4.47	82.33 ± 3.59
RF	78.33 ± 4.29	70.49 ± 11.19	85.36 ± 12.06	87.82 ± 11.07	75.93 ± 6.43	78.00 ± 5.02
NB	73.75 ± 2.12	74.40 ± 9.65	75.51 ± 11.55	75.65 ± 11.93	73.59 ± 2.51	74.29 ± 2.32
Adaboost	78.33 ± 4.49	78.44 ± 7.20	77.99 ± 9.00	77.61 ± 8.96	77.83 ± 6.07	77.71 ± 4.11
DA	79.17 ± 5.10	77.44 ± 5.31	80.00 ± 8.24	80.97 ± 6.15	78.56 ± 5.99	79.15 ± 5.10
LR	75.83 ± 3.63	76.17 ± 9.08	75.04 ± 7.13	75.26 ± 5.41	75.23 ± 6.64	75.42 ± 3.90
KNN	82.08 ± 4.49	79.62 ± 11.04	84.46 ± 8.65	85.82 ± 7.72	81.11 ± 6.07	82.20 ± 4.76
MIBHHO
Classifier	Accuracy	Sensitivity	Specificity	Precision	F1-score	G-mean
SVM	81.67 ± 5.34	80.29 ± 6.48	83.38 ± 11.52	84.36 ± 10.27	81.17 ± 6.09	82.01 ± 5.12
RF	80.00 ± 4.86	74.10 ± 8.51	83.92 ± 9.44	86.38 ± 6.81	78.26 ± 6.94	79.78 ± 5.39
NB	70.00 ± 2.12	68.88 ± 11.44	71.14 ± 6.97	72.07 ± 7.29	69.10 ± 4.51	69.88 ± 2.72
Adaboost	77.50 ± 5.80	77.62 ± 6.71	77.74 ± 10.79	78.42 ± 9.24	77.21 ± 6.84	77.80 ± 5.80
DA	79.58 ± 5.65	78.24 ± 6.42	80.11 ± 10.57	81.79 ± 8.04	78.88 ± 7.08	79.89 ± 5.91
LR	77.92 ± 7.05	80.36 ± 5.66	76.60 ± 12.14	75.97 ± 11.53	78.06 ± 8.14	77.89 ± 7.01
KNN	82.92 ± 2.76	81.18 ± 6.17	84.51 ± 8.27	85.73 ± 6.96	82.35 ± 3.81	83.22 ± 2.96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, M.; Wang, Y.; Todo, Y.; Hua, Y. A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer. Electronics 2024, 13, 2554. https://doi.org/10.3390/electronics13132554

AMA Style

Dong M, Wang Y, Todo Y, Hua Y. A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer. Electronics. 2024; 13(13):2554. https://doi.org/10.3390/electronics13132554

Chicago/Turabian Style

Dong, Minhui, Yu Wang, Yuki Todo, and Yuxiao Hua. 2024. "A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer" Electronics 13, no. 13: 2554. https://doi.org/10.3390/electronics13132554

APA Style

Dong, M., Wang, Y., Todo, Y., & Hua, Y. (2024). A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer. Electronics, 13(13), 2554. https://doi.org/10.3390/electronics13132554

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Feature Selection Strategy Based on the Harris Hawks Optimization Algorithm for the Diagnosis of Cervical Cancer

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Overview of the HHO Algorithm

3.2. Proposed BHHO Algorithm

3.2.1. Exploration Phase

3.2.2. Exploitation Phase

3.3. The Theoretical Analysis of the Proposed BHHO Algorithm

3.4. Combination of Proposed BHHO and Filter-Based Methods

4. Results and Discussion

4.1. Dataset

4.1.1. Cervical Cancer Dataset

4.1.2. Other Datasets

4.2. Performance Metrics

4.3. Validation on the Cervical Cancer Dataset

4.4. Validation on Other Datasets

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI