Dengue Fever Detection Using Swarm Intelligence and XGBoost Classifier: An Interpretable Approach with SHAP and DiCE

Sarker, Proshenjit; Tiang, Jun-Jiat; Nahid, Abdullah-Al

doi:10.3390/info16090789

Open AccessArticle

Dengue Fever Detection Using Swarm Intelligence and XGBoost Classifier: An Interpretable Approach with SHAP and DiCE

by

Proshenjit Sarker

¹

,

Jun-Jiat Tiang

^2,*

and

Abdullah-Al Nahid

^1,*

¹

Electronics and Communication Engineering Discipline, Khulna University, Khulna 9208, Bangladesh

²

Centre for Wireless Technology, CoE for Intelligent Network, Faculty of Artificial Intelligence & Engineering, Multimedia University, Persiaran Multimedia, Cyberjaya 63100, Selangor, Malaysia

^*

Authors to whom correspondence should be addressed.

Information 2025, 16(9), 789; https://doi.org/10.3390/info16090789

Submission received: 28 July 2025 / Revised: 6 September 2025 / Accepted: 7 September 2025 / Published: 10 September 2025

Download

Browse Figures

Versions Notes

Abstract

Dengue fever is a mosquito-borne viral disease that annually affects 100–400 million people worldwide. Early detection of dengue enables easy treatment planning and helps reduce mortality rates. This study proposes three Swarm-based Metaheuristic Algorithms, Golden Jackal Optimization, Fox Optimizer, and Sea Lion Optimization, for feature selection and hyperparameter tuning, and an Extreme Gradient Boost classifier to forecast dengue fever using the Predictive Clinical Dengue dataset. Several existing models have been proposed for dengue fever classification, with some achieving high predictive performance. However, most of these studies have overlooked the importance of feature reduction, which is crucial to building efficient and interpretable models. Furthermore, prior research has lacked in-depth analysis of model behavior, particularly regarding the underlying causes of misclassification. Addressing these limitations, this study achieved a 10-fold cross-validation mean accuracy of 99.89%, an F-score of 99.92%, a precision of 99.84%, and a perfect recall of 100% by using only two features: WBC Count and Platelet Count. Notably, FOX-XGBoost and SLO-XGBoost achieved the same performance while utilizing only four and three features, respectively, demonstrating the effectiveness of feature reduction without compromising accuracy. Among these, GJO-XGBoost demonstrated the most efficient feature utilization while maintaining superior performance, emphasizing its potential for practical deployment in dengue fever diagnosis. SHAP analysis identified WBC Count as the most influential feature driving model predictions. Furthermore, DiCE explanations support this finding by showing that lower WBC Counts are associated with dengue-positive cases, whereas higher WBC Counts are indicative of dengue-negative individuals. SHAP interpreted the reasons behind misclassifications, while DiCE provided a correction mechanism by suggesting the minimal changes needed to convert incorrect predictions into correct ones.

Keywords:

dengue fever; Swarm-based Metaheuristic Algorithms; Golden Jackal Optimization; Fox Optimizer; Sea Lion Optimization; Extreme Gradient Boost classifier; SHAP; DiCE

1. Introduction

Dengue fever, a mosquito-borne viral disease, spreads mainly through the bites of female Aedes mosquitoes, especially the Aedes aegypti mosquito [1]. Annually, the WHO estimates 100–400 million infections globally, with 3.9 billion people in 129+ countries at risk. In 2023, a historic surge caused 6.5 million cases and 7300 deaths across 80+ countries. Factors like urbanization and climate change fuel outbreaks. Dengue risk rises with temperature—each 1 °C increase elevates risk by 13%, most pronounced in tropical monsoon and humid subtropical zones [2]. From 1980 up to 11 May 2025, over 37.7 million dengue cases were reported in the Americas, with more than 20,000 deaths recorded [3]. Figure 1 shows the dengue outbreak in America from 2014 to 2024 [3]. In 2024, 8384 life casualties were reported among the 13.1 million severe dengue cases. Dengue infection has become a burden not only in America but also worldwide. According to the World Health Organization (WHO) Global Dengue Dashboard, Brazil has recorded the highest number of confirmed dengue cases [4]. Figure 2 shows the top 10 countries most affected by dengue from January 2010 to March 2025. As of 1 March 2025, Brazil reported 1,37,51,963 confirmed dengue cases. However, Peru and Colombia reported the second- and third-highest confirmed cases, with 722,116 and 633,078 cases, respectively. These top 10 countries belong to two main regions: South America and Southeast Asia. India, Thailand, and Bangladesh are the top dengue-infected countries in Southeast Asia, ranked 6th, 9th, and 10th globally, respectively. Sometimes, dengue can be life-threatening, and no licensed vaccines or specific treatments are available [5]. Despite extensive vector control efforts, the global spread of dengue and its public health impact remain poorly understood.

Early detection of dengue enables prompt treatment planning and helps reduce mortality rates [6]. In this context, machine learning (ML) offers a faster and more efficient approach to detecting the presence of dengue. Recently, researchers have focused on developing more accurate models using various dengue-related datasets. Several datasets, such as the Universitas Indonesia Dengue Dataset [7], the Karuna Medical Hospital Kerala Dataset [8], the Delhi Multiple Hospital Dataset [9], and so on, have been used in previous studies. Table 1 provides a detailed summary of the previously used datasets and the findings of the associated ML models.

The Universitas Indonesia Dengue Dataset, which contains 130 samples and 8 feature columns, was utilized by Silitonga et al. to classify dengue events using the Random Forest (RF) classifier [7]. However, their study achieved an accuracy of only 57.09%. Another study showed an accuracy of 83.30% on the Karuna Medical Hospital Kerala Dataset, which has 100 records and 11 attributes [8]. The Delhi Multiple Hospital Dataset is made of 100 samples and 11 features, including the target class [9]. This dataset was classified using Particle Swarm Optimization with Artificial Neural Network (PSO-ANN) with 87.27% accuracy. Another dataset, named the Taiwan Dengue Fever Dataset, has 805 samples and 12 features and was used by Kuo et al.; they achieved 89.94% accuracy [10]. Next, Ref. [11] found 92% accuracy by using the Logit Boost classifier on the Discharged Patient Report Dataset. Riya et al. used the CBC Dengue Dataset Bangladesh, which has 320 records and 15 columns, and found an accuracy of 96.88% by using Stacking Classifier (SC) [12]. Another dataset, the Vietnam Dengue Clinical Dataset, consists of 2301 rows and 23 columns and was investigated via the XGBoost Classifier to achieve 98.60% accuracy [13]. The Dirgahayu Hospital Dengue Dataset, comprising 110 samples and 21 features, was used by Hamdani et al. with a Support Vector Machine (SVM) classifier, achieving an accuracy of 99.10% [14]. Abdualgalil et al. demonstrated the effectiveness of machine learning in identifying dengue cases by employing the Extra Trees (ET) classifier; they achieved a prediction accuracy of 99.03% by using 6694 samples and 22 features, including the target class [6].

In this study, we used the Predictive Clinical Dengue (PCD) dataset, collected from Upazila Health Complex, Kalai, and Jaipurhat in Bangladesh, for classifying the dengue malignant and benign individuals based on the Signs and Blood Parameters of the subjects. This dataset has 1003 records and 9 features: Age, Sex, Hemoglobin, WBC Count, Differential Count, RBC Panel, Platelet Count, PDW, and Final Output (target class). Further description of the dataset is given in the Dataset Subsection. This dataset highlights several critical features: Lower White Blood Cell (WBC) and Platelet Count that effectively distinguish dengue fever from other febrile illnesses. WBC and Platelet Counts in days 2–5 of illness are highly associated with dengue [15].

According to Table 1, the datasets exhibit diversity in their dimensions and the performance of the models, ranging from 57.69% to 99.10%. Silitonga et al. employed a minimal set of seven features to forecast the target class [7]. However, the highest number of features (22) was used by Chowdhury et al. to develop a classification model, and they ensured a good accuracy of 98.60% [13]. The highest accuracy was acquired by Hamdani et al. by using 20 features on the Dirgahayu Hospital Dengue Dataset. An increase in the number of features leads to higher clinical testing costs, places a greater burden on patients, and may also delay treatment due to late dengue identification. As per our literature review, we can barely find any ML models that have maintained extremely high accuracy while maintaining sufficient feature reduction. So, considering this, we have organized our contribution as follows:

Feature Minimization: Three different Swarm-based Metaheuristic Algorithms (SBMHAs), Golden Jackal Optimization (GJO), Fox Optimizer (FOX), and Sea Lion Optimization (SLO), as feature selectors.
Classification with Fold Validation: Extreme Gradient Boost (XGBoost) is used to develop a more accurate ML model using ten-fold cross-validation.
Hyperparameter Tuning: We have developed a custom-defined framework along with SBMHAs that will find the best value of the parameters for the XGBoost classifier.
Model Evaluation: Accuracy, F-score, precision, recall, AUC, area under the precision–recall curve (PR AUC), Brier Score Loss (BSL), and calibration curve are measured for evaluating our models. Model complexity is analyzed in terms of different execution times.
Feature Ranking: We have applied Shapley Additive Explanations (SHAP) as Explainable Artificial Intelligence (XAI) to demonstrate the importance of features in the model that have influenced the model’s predictions.
Misclassification Analysis: We identify the cause of misclassification through a detailed analysis of a specific incorrect prediction.
DiCE Analysis: We implement Diverse Counterfactual Explanations (DiCE), especially for wrong classification, to convert the wrong classification into a correct classification.

The rest of the study is primarily divided into three key sections: Methodology, Results, and Conclusion. In Section 2 (Methodology), the dataset, work strategy, and proposed frameworks, including the associated ML algorithm, are explained. Section 3 (Results) presents the findings of our models and reports the XAI explanations, a comparative analysis between the previous works and our outcomes, including our future scope of study. Finally, Section 4 (Conclusions) concludes this research work, highlighting the key findings and potential limitations.

2. Methodology

In this work, the PCD dataset, introduced by Mim et al. [16], was used. The dataset contains some missing values, so we started with a data cleaning process and prepared 10-fold cross-validation (CV) datasets. Each fold was then passed through the proposed framework, which integrates an XGBoost classifier with SBMHAs. Consequently, each fold was individually optimized and hyperparameter-tuned, resulting in distinct feature subsets and hyperparameter settings. Finally, we developed a globally optimized model by applying a voting-based ensemble technique that considers the selected features and hyperparameters from all folds. Then the model’s nature was described through SHAP. Figure 3 presents the overall methodology of this research work.

2.1. Dataset

The PCD dataset has a total of 1003 samples and 9 columns: 8 usable features and 1 target class. Among them, Hemoglobin, WBC Count, Platelet Count, and Platelet Distribution Width (PDW) are the continuous values. Age is an integer type, whereas Final Output, Differential Count, and Red Blood Cell (RBC) Panel are binary integer features [16]. And Sex is a ternary categorical feature. Dataset construction followed ethical guidelines and has been available on Mendeley since 18 September 2024 (URL: https://data.mendeley.com/datasets/xrsbyjs24t/1, accessed on 15 July 2025). WBC Count, PDW, Platelet Count, and Final Output have 24, 19, 17, and 14 missing values, which are 2.39%, 1.89%, 1.69%, and 1.40% of the total records, respectively. We decided not to impute missing values; as a result, we removed incomplete records (n = 72). The synthetic data can aid training and method development but are unsuitable for predicting real outcomes, since they miss complex associations and temporal patterns, especially in the medical sector [17]. In preprocessing, we removed all the records that contain missing values, resulting in a cleaned dataset with 931 records, having 631 records of class 1 (dengue positive), and 300 records of class 0 (dengue negative). Table 2 presents the PCD dataset description and the feature nations.

Table 3 presents both classes’ minimum, maximum, and average values. The minimum age is 3 years in both categories, whereas the maximum ages are 120 years in class 0 and 99 years in class 1. The average ages in both categories are 47.533 and 40.036. So, age has no distinct separation between dengue-positive and -negative groups. The dengue-positive (class 0) group has a range of WBC Count from 2000 (index 814) to 3700 (index 793), with an average value of 2849.921. In dengue-negative groups, the WBC Count varies from 3600 (index 795) to 10,900 (index 855), with an average of 7462.667. But the second-lowest value in class 0 is 4300 (index 178), which is higher than the highest WBC Count (3700) in class 1. A matter of consideration is that the lowest WBC (3600) in class 0 is lower than the highest WBC Count (3700) in class 1. Except for index 795, there are no overlapping values in WBC Count between classes 0 and 1. So, beyond this individual, there is proper separation between the two categories, indicating the importance of WBC Count on the classification models.

Figure 4a presents the correlation of the dataset’s attributes, where the larger circle and the tendency towards the yellow color represent a higher correlation. Final Output is highly correlated with WBC Count, Platelet Count, and PDW. In contrast, Sex, Differential Count, and RBC Panel exhibit weaker correlations. Age and Hemoglobin demonstrate a moderate correlation with Final Output. So, there is a strong possibility of identifying WBC Count and Platelet Count as the most impactful features for the SBMHAs during optimization. Figure 4b shows the t-SNE plot applied with

n_c o m p o n e n t s = 2

,

r a n d o m_s t a t e = 42

, and default values for perplexity, learning rate, and number of iterations, a dimensionality reduction technique that maps high-dimensional data to lower dimensions (typically two or three) for visualization and is an improved variant of Stochastic Neighbor Embedding [18]. The t-SNE plot reveals that the dataset forms two distinct clusters for classes 0 and 1 in the lower-dimensional space, suggesting a high potential for accurate classification of the target variable. But there is some overlap for indices 795, 785, and 762. Index 795 is placed within the cluster of class 1, though it is an individual of class 0. Similarly, indices 785 and 762 are positioned closer to the cluster representing class 0, despite belonging to class 1. This indicates a possibility that these instances may have been misclassified by the models.

Figure 5 shows the intersection points calculated using the Brentq method [19] for the features WBC Count and Platelet Count, as they exhibit the highest correlation with Final Output. The intersection for WBC Count occurs at approximately 3999.74. Below this threshold, most instances belong to the dengue-positive class, whereas values above it are associated with the dengue-negative class. So, WBC has almost no overlapping region around the intersection, showing its efficiency and indicating the possibility of misclassification of index 795 (WBC Count: 3600). In contrast, the intersection point for Platelet Count is 133,465.86. However, Platelet Count shows a broader overlapping region around the intersection, suggesting lower discriminative power and reduced effectiveness for classification models.

2.2. Extreme Gradient Boost Classifier

The Extreme Gradient Boost (XGBoost) algorithm was used to classify the target variable. XGBoost classifier has several parameters, such as Learning Rate (learning_rate), Maximum Depth (max_depth), Minimum Child Weight (min_child_weight), etc. The XGBoost classifier is defined in Equation (1), where

{\hat{y}}_{i}

is the output for the input

x_{i}

[20].

{\hat{y}}_{i} = ϕ (x_{i}) = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(1)

where

f_{k}

is an independent decision tree, F is the regression tree,

x_{i}

represents the variables with no dependencies, and K represents the additive functions. The loss function

L (ϕ)

in XGBoost is shown by Equation (2) [20].

L (ϕ) = \sum_{i} l ({\hat{y}}_{i}, y_{i}) + \sum_{k} P (f_{k})

(2)

The term

l ({\hat{y}}_{i}, y_{i})

measures the error between the predicted value

{\hat{y}}_{i}

and the actual value

y_{i}

, while P serves as a regularization term that captures the complexity of the model by accounting for both the number of leaf nodes and the scores assigned to them. Figure 6 presents the basic structure of an XGBoost classifier algorithm.

2.3. Swarm-Based Metaheuristic Algorithms

Feature selection is the process of algorithmic selection of the most important features from a raw dataset for ML models [21]. There are mainly four types of Metaheuristic Algorithms (MHAs): Evolution-Based Metaheuristic Algorithms, Swarm-Based Metaheuristic Algorithms (SBMHAs), Physics-Based Metaheuristic Algorithms, and Human Behavior-Based Metaheuristic Algorithms [22]. This research work applied GJO, FOX, and SLO as feature selectors and hyperparameter tuners from the numerous available SBMHAs. Section 2.3.1 presents the mathematical representation of GJO. Section 2.3.2 and Section 2.3.3 discuss the equivalent mathematical behavior of FOX and SLO, respectively.

2.3.1. Golden Jackal Optimization

The GJO algorithm, which was introduced by Chopra et al. in 2022, follows the cooperative hunting nature of a golden jackal as a pair [23]. Golden jackals has a higher success rate when they hunt in pairs. The pair hunting process has three main stages: (i) searching for prey and moving forward, (ii) enclosing the prey, and (iii) pouncing towards the prey. The initial solution within the search space of GJO is mathematically represented by Equation (3) [23].

Y_{0} = Y_{\min} + rand \times (Y_{\max} - Y_{\min})

(3)

where

Y_{0}

represents the initial candidate solution in the first attempt;

Y_{\min}

and

Y_{\max}

denote the minimum and maximum bounds of the variables, respectively; and

rand

is a vector containing uniformly distributed random values within the interval

[0, 1]

. In the exploration phase, the position update of the male and female jackals while searching for the prey is given by Equations (4) and (5) [23].

Y_{1} (t) = Y_{M} (t) - E \cdot |Y_{M} (t) - r l \cdot Prey (t)|

(4)

Y_{2} (t) = Y_{F M} (t) - E \cdot |Y_{F M} (t) - r l \cdot Prey (t)|

(5)

where

Y_{1} (t)

and

Y_{2} (t)

denote the updated positions of the male and female jackals at iteration t, respectively.

Y_{M} (t)

and

Y_{F M} (t)

are the current positions of the male and female jackals, respectively, while

Prey (t)

represents the position of the prey in the search space. The variable E corresponds to the prey’s evading energy, which decreases over time, and

r l

is a Levy-flight-based random vector. The new solution at the next iteration is given by Equation (6), where

Y_{1} (t)

and

Y_{2} (t)

are the current positions of the male and female jackals, respectively [23].

Y (t + 1) = \frac{Y_{1} (t) + Y_{2} (t)}{2}

(6)

In the exploitation phase, once the prey becomes weaker and its evading energy decreases, the jackal pair encloses and pounces on it. This process is mathematically represented by Equations (7) and (8) [23].

Y_{1} (t) = Y_{M} (t) - E \cdot |r l \cdot Y_{M} (t) - Prey (t)|

(7)

Y_{2} (t) = Y_{F M} (t) - E \cdot |r l \cdot Y_{F M} (t) - Prey (t)|

(8)

In this phase, the positions of both male and female jackals are also updated by the Equation (6). The switching between exploration and exploitation depends on the value of E. If

| E | \geq 1

, exploration is performed and otherwise, exploitation, as shown in Figure 7.

2.3.2. Fox Optimizer

FOX, introduced by Mohammed et al. in 2022, follows the hunting behavior of red foxes [24]. Red foxes hunt when the ground is covered with snow. They rely on ultrasound to detect and track their prey. As they approach the target, they perform a unique jumping maneuver to catch the prey quickly. This hunting strategy is typically categorized into two primary phases: exploration and exploitation.

During the exploitation phase, the total distance traveled by the ultrasound emitted by the fox is represented as

D i s t_S_T_{i t}

in Equation (9), where

S p_S

denotes the speed of sound in air (343 ms⁻¹) and

T i m e s_S_T_{i t}

refers to the time taken by the sound wave to travel [24]. The connection between the optimal position for exploitation,

B e s t P o s i t i o n_{i t}

, and

T i m e s_S_T_{i t}

is illustrated in Equation (10) [24].

D i s t_S_T_{i t} = S p_S \times T i m e s_S_T_{i t}

(9)

S p_S = \frac{B e s t P o s i t i o n_{i t}}{T i m e s_S_T_{i t}}

(10)

The separation between the red fox and its prey, denoted by

D i s t_F o x_P r e y_{i t}

, is determined by Equation (11), while the height achieved during the jump is given by

J u m p_{_(i t)}

in Equation (12) [24].

D i s t_F o x_P r e y_{i t} = 0.5 \times D i s t_S_T_{i t} = \frac{1}{2} \times D i s t_S_T_{i t}

(11)

J u m p_{_(i t)} = 0.5 \times 9.81 \times t^{2}; t = Average time of sound to travel .

(12)

A random number p is generated within the interval

[0, 1]

. If

p > 0.18

, the fox’s next position is calculated using Equation (13), where the coefficient

c_{1}

lies between 0 and 0.18. If

p \leq 0.18

, the position is updated using Equation (14), where

c_{2}

ranges from 0.19 to 1 [24].

X_{(i t + 1)} = D i s t_F o x_P r e y_{i t} \times J u m p_{i t} \times c_{1}

(13)

X_{(i t + 1)} = D i s t_F o x_P r e y_{i t} \times J u m p_{i t} \times c_{2}

(14)

In the exploration stage, the fox’s position is adjusted according to Equation (15), where

M i n T

stands for the minimum time parameter and a is the exploration coefficient [24]. Figure 8 outlines the complete FOX algorithm process [24].

X_{(i t + 1)} = B e s t X_{i t} \times r a n d (1, d i m e n s i o n) \times M i n T \times a

(15)

2.3.3. Sea Lion Optimization

SLOA is a well-known MHA technique inspired by the hunting strategies of sea lions [25]. When one sea lion detects prey, it signals the rest of the group to participate in the hunt. The overall hunting process is categorized into four distinct phases. In the Detecting and Tracking Phase, sea lions rely on their highly sensitive whiskers to accurately identify and follow the location of the prey. Within the context of the algorithm, it is assumed that the prey either represents the current best solution or lies near the optimal value. The positional update of sea lions during this phase is governed by Equation (16) [25].

\vec{S L (t + 1)} = \vec{P (t)} - \vec{D i s t} \cdot \vec{C}

(16)

Here,

\vec{D i s t}

denotes the vector distance between the sea lion and the prey,

\vec{P (t)}

is the prey’s position vector,

\vec{S L (t + 1)}

is the updated position of the sea lion for the next iteration, and

\vec{C}

is a decreasing constant that linearly declines from 2 to 0 throughout iterations. In the Vocalization Phase, sea lions communicate through vocalizations both above and below water to coordinate group hunting. These sound signals propagate via refraction and reflection, as shown in Figure 9a, and are mathematically modeled in Equation (17) [25].

{\vec{S P}}_{leader} = |(\frac{{\vec{V}}_{1} (1 + {\vec{V}}_{2})}{{\vec{V}}_{2}})|

(17)

In this equation,

{\vec{S P}}_{leader}

refers to the leader’s vocal signal strength, while

{\vec{V}}_{1}

and

{\vec{V}}_{2}

correspond to the sound velocities in water and air, respectively. During the Attacking Phase, sea lions strive to trap their prey by forming a bait ball. The hunt is initiated from the perimeter under the direction of the leading sea lion, considered the best-performing solution, as shown in Figure 9b. This behavior is described by Equation (18) [25].

\vec{S L} (t + 1) = |\vec{P} (t) - \vec{S L} (t)| \cdot cos (2 π m) + \vec{P} (t)

(18)

Here, m is a randomly generated number within the range

[- 1, 1]

. Finally, in the Exploration Phase, or Searching for Prey, sea lions exhibit random movement patterns to discover potential prey. In SLOA, this is simulated by updating each sea lion’s position based on a randomly chosen member, as formulated in Equation (19), where

{\vec{S L}}_{rnd} (t)

is a random sea lion [25].

\vec{S L} (t + 1) = {\vec{S L}}_{r n d} (t) - \vec{D i s t} \cdot \vec{C}

(19)

2.4. Proposed Framework Development

Each of the three SBMHAs, GJO, FOX, and SLO, was integrated with the XGBoost classifier. Our cleaned dataset contains 931 records and 9 columns in total (8 features and 1 target class). The proposed framework consists of four major stages: The Making of 10-Fold Datasets, Defining the Problem, Voting, and Final Model.

2.4.1. The Making of 10-Fold Datasets

For better analysis, we used a custom folding process for making 10-fold CV datasets instead of using the CV library. Each of the folds maintained 30% of the records for testing and 70% for training. A window sliding process was adopted to prepare the 10 different CV datasets while ensuring that the ratio of training–test data was 70:30. At the beginning of the fold creation process, the dataset was randomly shuffled to ensure unbiased sampling. For the 1st fold, the first 30% of the shuffled data were used as the test set, and the remaining 70% were used for training. For each subsequent fold, the test window was shifted forward by 10% of the total dataset size, while maintaining a 30% test set size. This process was repeated for all 10 folds. To maintain continuity in the last fold, the 10th fold test set was formed by taking the final 10% of the dataset and the first 20%, ensuring a complete 30% test set with a circular shift. In each fold, the remaining 70% was used for training, as visualized in Figure 10.

2.4.2. Defining the Framework Problem

Before initiating optimization, the boundary conditions (

B_{C o n d}

) for the XGBoost parameters were set. The XGBoost algorithm includes a variety of hyperparameters; however, in our study, we selected only a few, namely, Number of Estimators (n_estimators), Learning Rate (learning_rate), Maximum Depth (max_depth), and Minimum Child Weight (min_child_weight). Table 4 shows the boundary conditions used for optimization, including the feature range (features).

Then, the final optimization process was initiated on the training set for each fold set, as shown in Figure 10. During the optimization process, an epoch size of 100 and a population size of 50 were used. Over the epochs, the optimizer searches for the optimal solution based on the fitness value evaluated at each iteration (each epoch). The F-score was employed as the cost function, serving as the fitness measure, and computed for each agent in the population throughout the optimization process. The best fitness value at epoch t is the current best, denoted by

C_{best}

. The best overall fitness value across all epochs is the global best, denoted by

G_{best}

.

G_{best}

represents the best solution found during the optimization process and contains the optimal set of hyperparameters (

H_{best}

) and the corresponding feature subset (

F_{best}

). This whole process is shown in Algorithm 1.

Algorithm 1 Optimization process using SBMHAs for each fold

1:: Start Process
2:: (i) Initialization of the Problem with Boundary Conditions ( $B_{Cond}$ ):
3:: Define boundaries for the XGBoost hyperparameters:
4:: Number of Estimators (n): $100 \leq d \leq 300$
5:: Learning Rate (l): $0.001 \leq d \leq 0.05$
6:: Maximum Depth (m): $3 \leq η \leq 7$
7:: Minimum Child Weight (c): $1 \leq c \leq 10$
8:: Feature Selection (F): $1 \leq F \leq 8$
9:: Initialize MHA (epoch: $E p = 100$ , population size: $P o p = 50$ )
10:: (ii) Performing the XGBoost Classification each training set
11:: for Current Epoch i, where $i = 1$ to 100 do:
12:: Selected Features $F^{i}$
13:: Selected Value of the Parameters $H^{i}$ = { $n^{i}$ , $l^{i}$ , $m^{i}$ , and $c^{i}$ } $\in B_{C o n d}$
14:: Train XGBoost model with $F^{i}$ and $H^{i}$
15:: Compute the best cost function at epoch i, $C_{best}^{i}$
16:: Find the $G_{best}$ from the set of 100: ${C_{best}^{1}, C_{best}^{2}, \dots, C_{best}^{100}}$
17:: (iii) Decode the $G_{best}$ :
18:: Decode the $G_{best}$ and extract the $H_{best}$ , and $F_{best}$
19:: End Process

2.4.3. Voting

The training set for each fold was optimized independently, resulting in ten distinct sets of decoded

D G_{best}

, as illustrated in Equation (20). Here,

D G_{best}

consists of

H_{best}^{all}

, the set of the best hyperparameters from all 10 folds, and

F_{best}^{all}

, the set of the best feature subsets from the 10 folds.

To determine the final hyperparameter set (

H_{best}^{f}

) for the XGBoost model, a majority voting technique (the hyperparameter value that has appeared in the most folds) was applied to

H_{best}^{all}

. For selecting the final feature subset (

F_{best}^{f}

), the union of all feature sets in

F_{best}^{all}

was considered to avoid the elimination of any important features.

D G_{best}^{all} = {H_{best}^{all}, F_{best}^{all}} = \{\begin{matrix} (H_{best}^{1}, F_{best}^{1}) \\ (H_{best}^{2}, F_{best}^{2}) \\ ⋮ \\ (H_{best}^{10}, F_{best}^{10}) \end{matrix}\}

(20)

2.4.4. Final Model

Finally, for each optimizer—GJO, FOX, and SLO—three final models were developed using the finalized hyperparameter set and feature subset.

M_{GJO}

,

M_{FOX}

, and

M_{SLO}

are the final models for the GJO, FOX, and SLO frameworks, as shown in Equations (21)–(23)

GJO - XGBoost = XGBoost (H_{best}^{f} (GJO), F_{best}^{f} (GJO))

(21)

FOX - XGBoost = XGBoost (H_{best}^{f} (FOX), F_{best}^{f} (FOX))

(22)

SLO - XGBoost = XGBoost (H_{best}^{f} (SLO), F_{best}^{f} (SLO))

(23)

2.5. Performance Evaluation

Accuracy, F-score, precision, recall, AUC, PR AUC, BSL, and calibration curve were considered for model evaluation. We also analyzed three types of execution complexity of the frameworks: optimization time complexity, training time complexity, and test time complexity. The formulas of the evaluation metrics are as follows:

A c c u r a c y = \frac{T P \times T N}{T P + T N + F P + F N}

(24)

P r e c i s i o n = \frac{T P}{T P + F P}

(25)

R e c a l l = = \frac{T P}{T P + F N}

(26)

F - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(27)

Brier Score Loss (BSL) = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - p_{i})}^{2}

(28)

where N is the number of observations,

y_{i}

is the binary target, and

p_{i}

is the predicted probability of the positive class.

True Positive (TP) refers to the number of instances where the model correctly predicts the positive class. True Negative (TN) is the number of instances where the model correctly predicts the negative class. False Positive (FP) occurs when the model incorrectly predicts a negative instance as positive. False Negative (FN) occurs when the model incorrectly predicts a positive instance as negative.

2.6. SHAP Explanation

Shapley Additive Explanations (SHAP) is a widely used method for interpreting classification models introduced by Lundberg et al. [26] and based on the Shapley values by Lloyd Shapley [27]. SHAP assigns each feature i a contribution score

ϕ_{i}

, calculated as

ϕ_{i} = \sum_{S \subseteq F ∖ {i}} \frac{| S |! (| F | - | S | - 1)!}{| F |!} [f_{S \cup {i}} (x_{S \cup {i}}) - f_{S} (x_{S})]

(29)

Here, F is the full feature set, S is any subset excluding i, and

f_{S} (x_{S})

denotes the model’s output using features in S. The final prediction is expressed as the sum of SHAP values and the model’s expected output:

f (x) = E [f (x)] + \sum_{0}^{n} ϕ_{i}

(30)

If

f (x) > 0.5

, the model predicts class 1 (dengue positive), otherwise class 0.

2.7. Diverse Counterfactual Explanations

Diverse Counterfactual Explanations (DiCE) is an ML method for generating multiple, diverse, and realistic counterfactual examples that lead to a desired outcome, helping users understand how to change inputs to alter a model’s prediction [28]. In Equation (31),

C (x)

denotes the set of k counterfactuals for input x [28].

C (x) = arg min_{c_{1}, \dots, c_{k}} \frac{1}{k} \sum_{i = 1}^{k} yloss (f (c_{i}), y) + \frac{λ_{1}}{k} \sum_{i = 1}^{k} dist (c_{i}, x) - λ_{2} \cdot dpp_diversity (c_{1}, \dots, c_{k})

(31)

In this equation, the loss function balances three terms:

yloss (f (c_{i}), y)

encourages predictions close to the target class y,

dist (c_{i}, x)

promotes proximity to the original input, and

dpp_diversity (c_{1}, \dots, c_{k})

ensures diversity among counterfactuals. The trade-off is controlled by hyperparameters

λ_{1}

and

λ_{2}

. DiCE helps to get the desired outcome by changing the feature values for a particular prediction. However, there is no fixed combination or the most efficient combination of feature values to alter the decision [28]. Figure 11 presents the basic layout of DiCE.

3. Results

The Results Section is divided into four main parts: Optimizer Outcome, Classifier Outcome, and Interpretation of the Models. The Optimizer Outcome Subsection presents the fitness progression across epochs, along with the selected hyperparameter values and feature subsets for each 10-fold of each model. However, the Classifier Outcome Subsection mainly shows the final model’s performance. The Interpretation of the Models Subsection explains the model’s overall nature and individual prediction, including DiCE interpretability. Figure 12 and Table 5 show the epoch vs. fitness nature of each model. In Table 5, the fitness at epoch 1 (Fitness@1), fitness at epoch 100 (Fitness@100), saturated fitness (Sat. Fitness), and the epoch at which the fitness is saturated (Sat. Epoch) are shown.

3.1. Optimizer Outcome

During optimization, all three algorithms—GJO, FOX, and SLO—achieved the maximum cost-function value (F-score) of 1.0 in every fold, indicating optimal performance:

For GJO, the fitness value saturated at 1.0 from epoch 1 across all folds, demonstrating immediate convergence.
For FOX, folds 4, 5, and 10 reached saturation at epoch 19, while the remaining folds achieved peak fitness from epoch 1.
For SLO, only folds 7, 8, and 9 attained the maximum fitness from epoch 1, whereas the rest reached it starting from epoch 4.

For each fold, the optimizers determined the best value for the hyperparameters and the optimal subset of the features. GJO assigned the value of the best hyperparameter for the first fold as

H_{best}^{1} = {n_{best}^{1}, l_{best}^{1}, m_{best}^{1}, c_{best}^{1}} = {249, 0.0455210569158742, 3, 1}

and

F_{best}^{1} = {f_{3}, f_{6}}

. However, FOX assigned these values for fold 1 as

H_{best}^{1} = {n_{best}^{1}, l_{best}^{1}, m_{best}^{1}, c_{best}^{1}} = {f_{1}, f_{2}, f_{3}, f_{6}}

. In contrast, SLO found the corresponding values for fold 1 as

H_{best}^{1} = {n_{best}^{1}, l_{best}^{1}, m_{best}^{1}, c_{best}^{1}} = {258, 0.0395697655228806, 6, 1}

and

F_{best}^{1} = {f_{2}, f_{3}, f_{6}}

. Detailed information is shown in Table 5. GJO demonstrated strong consistency in feature selection, consistently choosing between only the feature with index 3 (WBC Count) and the feature with index 6 (Platelet Count) across all folds. In FOX, the selected features varied among

f_{1}

(Sex),

f_{2}

(Hemoglobin),

f_{3}

, and

f_{6}

. In contrast, SLO showed a surprising level of consistency by selecting the exact subset

f_{2}, f_{3},

and

f_{6}

across all folds.

After applying majority voting to all sets of hyperparameters, the

H_{best}^{all}

of each of the frameworks, the finalized sets of hyperparameters (

H_{best}^{f} (GJO)

,

H_{best}^{f} (FOX)

, and

H_{best}^{f} (SLO)

) are presented in Table 6. However, the finalized feature subset (

F_{best}^{f}

) was determined by taking the union of the selected feature sets across all folds. So, GJO showed the maximum feature reduction from eight to only two, whereas FOX and SLO showed reductions from eight to four and from eight to three, respectively.

3.2. Classifier Outcome

First, we analyzed the multiple baseline classifiers with default settings by using 10-fold cross-validation. Table 7 shows the mean results of multiple baselines and well-known classifiers, including XGBoost, where most of the classifiers (CatBoost, Random Forest, Extra Trees, Gradient Boosting, and Decision Tree) achieved 99.83% mean test accuracy using all eight features. KNN and SVM showed comparatively lower mean test accuracy scores of 98.96% and 98.17%, respectively.

XGBoost with

F_{best}^{f} (GJO)

and

H_{best}^{f} (GJO)

was named GJO-XGBoost. Similarly, FOX-XGBoost and SLO-XGBoost are the other two XGBoost classifier frameworks optimized with the respective finalized hyperparameters and feature subsets. GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost attained 100% training accuracy, F-score, precision, and recall at each fold. However, the test performances varied for all frameworks in folds 7, 8, and 9; e.g., for fold No. 7, the test accuracy, F-score, precision, and recall are 99.64%, 99.72%, 99.45%, and 100%, as shown in Table 8. Although all models demonstrated similar performance during training and testing, they utilized different sets of features. For example, GJO-XGBoost used only two features, while FOX-XGBoost and SLO-XGBoost used four and three features, respectively, as summarized in Table 6.

There was no variation in training performance across any of the frameworks. However, slight deviations were observed in test accuracy, F-score, and precision. Notably, test recall remained consistent across all folds for every framework, as presented in Figure 13 and Table 9. According to Table 9, each model observed standard deviations in accuracy of 0.173135, in F-score of 0.127108, and in precision of 0.253540. This indicates that precision is the most sensitive metric across folds, showing the highest variability. In contrast, the recall showed the lowest deviation, suggesting relatively stable performance in balancing accuracy and F-score across different folds.

The models achieved excellent average performance on the training data, with 100% accuracy, F-score, precision, and recall. On the test data, the models also performed remarkably well, achieving an accuracy of 99.89%, an F-score of 99.92%, a precision of 99.84%, and a recall of 100%, as presented in Table 10. Each model showed similar results on different sets of features. From Figure 14, it is clear that each model achieved the same score in all metrics. The 10-fold confusion matrices of GJO-XGBoost are presented in Figure 15. Folds 7, 8, and 9 identified only a single wrong prediction, FP, and the rest of the folds did not face any wrong classifications; the same applies for FOX-XGBoost and SLO-XGBoost.

The individual with index 795 belongs to class 0 but was consistently misclassified as class 1 across all models and in every fold where they appeared in the test set. Table 11 presents the information related to the individual with index 795, based on the selected features used in each framework. Even though each framework was trained and tested using a different set of features, the individual with index 795 was consistently misclassified. This misclassification aligns with the observations from the t-SNE plot shown in Figure 4b, where the point corresponding to index 795 overlaps with the cluster of class 1.

However, the models showed strong overall performance, achieving a high mean area under the curve (AUC) of 0.998847. This indicates that despite the consistent misclassification of index 795, the models often effectively distinguished between the two classes. The highest AUC of 1.0 was observed across multiple, consistent folds—specifically folds 1, 2, 3, 4, 5, 6, and 10—in each model. In contrast, the lowest AUC of 0.99375 was consistently recorded in fold 9 for all models. Figure 16 presents the Receiver Operating Characteristic (ROC) curves corresponding to the best- and worst-performing folds across the models. Table 12 shows the fold-wise AUC value, including the mean AUC.

We then calculated the area under the precision–recall curve (PR AUC). Table 13 shows the PR AUCs of all optimized models. Figure 17 presents the PR AUCs of all optimized models at fold 1. All three models achieved consistently high PR AUCs across all folds, with fold 10 reaching a perfect score of 100%. The mean PR AUC of 99.84% indicates that all models demonstrate nearly identical and excellent predictive performance.

The execution complexity of the proposed frameworks was analyzed. This study was conducted on the Google Colaboratory platform using a free-tier environment with a Python 3 runtime on the Google Compute Engine backend. The experiments utilized an Intel(R) Xeon(R) processor running at 2.20 GHz with two cores, and the system had a total of 12.7 GB of RAM. Table 14 visualizes the fold-wise execution time taken for training, testing, and optimization by the models. GJO-XGBoost took 338.53 s to complete the optimization process using 100 epochs and a population size of 50. However, this optimization is a one-time operation; once completed, there is no need to re-optimize the model. Similarly, FOX-XGBoost and SLO-XGBoost took 161.37 s and 303.33 s, respectively, demonstrating the superior efficiency of FOX-XGBoost. This superiority was consistently maintained by FOX-XGBoost across all folds, achieving a mean optimization time of only 151.44 s over 10 folds. In contrast, GJO-XGBoost and SLO-XGBoost required mean optimization times of 294.42 and 303.55 s, respectively. For the mean training time, GJO-XGBoost outperformed the other two frameworks by taking only 50.13 ms. This happened due to the use of the minimal feature sets. However, FOX-XGBoost and SLO-XGBoost took slightly more time, with a mean training time of 78.69 ms and 50.46 ms, respectively. GJO-XGBoost also outperformed the others by taking the lowest mean test time of only 10.88 ms, whereas FOX-XGBoost and SLO-XGBoost required mean test times of 16.13 and 13.34 ms, respectively, as shown in Table 14 and Figure 18. In the case of optimization, FOX-XGBoost demonstrated the lowest standard deviation of 11.55 ms compared with the other two models.

Table 15 presents the pair-wise comparisons of optimization times for the three models using the Mann–Whitney U test. The test between FOX and GJO yielded a U-statistic of 100 with a p-value of 0.0002, indicating a statistically significant difference. In contrast, the comparison between GJO and SLO did not show significance. However, the comparison between SLO and FOX also produced a p-value of 0.0002, again highlighting a significant difference. These results show that any comparison involving FOX consistently yielded a lower p-value, underscoring its significant improvement in optimization time. Overall, the findings clearly establish FOX as the most computationally efficient optimizer among the evaluated models, with strong statistical support for its superior performance.

3.3. Analysis with Imputation of Data Samples

We additionally performed classification by using the optimized features and hyperparameter sets, filling the missing samples by using Multiple Imputation by Chained Equation (MICE). Table 16 presents the 10-fold cross-validation performance after imputation. All the models achieved 99.90%, 99.93%, 100%, and 99.85% mean training accuracy, F-score, precision, and recall, respectively. However, the mean test accuracy scores are 99.67%, 99.63%, and 99.63% for GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost, respectively. Figure 19 shows the ROC curves, including the AUC values of all folds.

3.4. One-Hot Encoding Analysis

To evaluate the impact of encoding strategies on model performance, we implemented one-hot encoding for the ternary variable Sex (Child, Female, and Male), the binary feature RBC Panel (0 and 1), and Differential Count (0 and 1). Originally, this variable was encoded as 0, 1, and 2. Firstly, we compared the performance on only the XGBoost classifier without any optimization on hyperparameter tuning. Table 17 shows the mean test performance using the label encoder and the one-hot encoder, where both approaches achieved very high accuracy, F-score, precision, and recall values. However, one-hot encoding slightly outperformed label encoding across all metrics, achieving the highest accuracy of 99.75% and F-score of 99.82%. This indicates that one-hot encoding provided a marginal performance gain by better representing the categorical features, although both encoding methods demonstrated nearly perfect classification capability.

Secondly, among all optimized models, only FOX-XGBoost selected the Sex feature, while RBC Panel and Differential Count were not selected in any model. After applying one-hot encoding, the FOX-XGBoost model was retrained with the same optimized hyperparameters. Table 18 presents the comparison of test performance metrics between label encoding and one-hot encoding. The results demonstrate that the choice of encoding strategy has no significant effect on optimized model performance, with both approaches yielding identical accuracy, F-score, precision, and recall values.

3.5. Calibration Analysis

In this study, all optimized models—GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost—achieved the same mean test accuracy of 99.89%. Therefore, we analyzed the calibration of the models by using the Brier Score Loss (BSL) and calibration curves. Table 19 presents the mean BSL values for all optimized models, where each model showed a BSL of 0.0014, indicating highly accurate probabilistic predictions and excellent calibration between predicted probabilities and true outcomes. Figure 20 shows the calibration curves of all optimized models compared with perfect calibration at fold 1. FOX-XGBoost was identified as the most accurately calibrated model compared with the other two optimized models: GJO-XGBoost and SLO-XGBoost.

3.6. Implementation of Internal Nested Cross-Validation

Fitness reached 1.0 very early; this suggests overfitting if fitness is calculated on training without internal validation. We also performed internal nested 5-fold cross-validation using the KFold library, calculating the fitness value for each iteration of the optimizer for GJO-XGBoost. Except fold 10, all the folds reached the peak value of 0.9984, as shown in Figure 21. Based on these operations, we applied the voting technique to select the final set of features and hyperparameters for the GJO-XGBoost model, as summarized in Table 20. Features with indices 3 and 6 were selected, with 117 estimators, a Learning Rate of 0.0378, a Maximum Tree Depth of 4, and a Minimum Child Weight of 1.

The performance of the finalized model, evaluated using internal nested 5-fold cross-validation, is presented in Table 21. In addition to accuracy, the table reports F-score, precision, and recall for both training and test sets. The model achieved high training metrics (accuracy = 99.89%, F-score = 99.92%, precision = 99.84%, and recall = 100%) and test metrics (accuracy = 99.75%, F-score = 99.82%, precision = 99.63%, and recall = 100%), indicating excellent predictive performance and balanced classification across both positive and negative classes.

3.7. SHAP Analysis

Fold 7 was considered for SHAP analysis as it has both wrong and correct predictions. The SHAP identified that WBC Count is the most influential feature while testing without any optimization algorithms. WBC Count achieved the highest mean SHAP value of +5.58. However, other features were assigned 0 SHAP values, as shown in Figure 22a. The violin plot in Figure 22b illustrates the relationship between WBC Count and the model’s predictions. It shows that higher WBC Count values (indicated by the pink-shaded area of the plot) are generally associated with negative SHAP values (dengue negative, or class 0), while lower values tend to correspond to positive SHAP values (dengue positive, or class 1).

Next, we evaluated the mean SHAP values for all proposed frameworks and present them in Figure 23. SHAP calculated the mean SHAP values for WBC Count as +5.55, +5.54, and +5.56 for GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost, respectively. The mean SHAP value for Platelet Count is +0.02 in GJO-XGBoost and FOX-XGBoost, whereas SLO-XGBoost has a 0 mean SHAP value for Platelet Count. Other features for FOX-XGBoost and SLO-XGBoost have a 0 mean SHAP value, indicating the importance of WBC Count in the models and dengue predictions. The violin and beeswarm plots in Figure 24 of the FOX-XGBoost model preserve the same pattern observed in the SHAP evaluation of the non-optimized model. Each of the optimized models shows that a higher WBC Count value is associated with negative SHAP values, meaning the dengue-negative class. However, the dengue-positive class primarily depends on lower WBC Count values.

Figure 25 shows the SHAP scatter plots for the features Sex, Hemoglobin, WBC Count, and Platelet Count of the FOX-XGBoost model. Sex exhibited the lowest contribution, with SHAP values remaining near 0 regardless of changes in the feature, indicating minimal impact. Hemoglobin also showed a smaller contribution to the model. In contrast, WBC Count and Platelet Count contributed the highest SHAP values. A decrease in their feature values is associated with positive SHAP contributions, while higher feature values correspond to negative SHAP contributions.

We also analyzed the behavior of the models for individual predictions, focusing specifically on fold 7 for TP (index 127), TN (index 322), and FP (index 795) cases. Since the proposed models did not produce any FN predictions, we excluded individual prediction analysis for the FN case. Figure 26 presents TP prediction for the individual indexed as 127 for all models. In the GJO-XGBoost model, the base SHAP value E[f(x)] = 0.903 and the model output f(x) = 6.125 were computed. The most significant contribution comes from WBC Count (WBC Count: 2100), with a SHAP value of +5.2, while Platelet Count contributes minimally, with a SHAP value of only +0.02. The model outputs f(x) = 6.116 and f(x) = 6.109 were recorded for the FOX-XGBoost and SLO-XGBoost models. In each framework, the value of f(x) is greater than 0.5; as a result, the prediction corresponds to class 1 and is identified as a correct classification.

Index 322 is a TN prediction for each model shown in Figure 27. WBC Count (7100) received a SHAP value of −6.2 across all models, strongly influencing the prediction toward class 0, which corresponds to the dengue-negative class. The contributions of the other features are negligible: Platelet Count has a SHAP value of −0.02, while the remaining features have SHAP values close to zero. Analysis based on Figure 26 and Figure 27 shows that the lower value of WBC Count predicts the individual as dengue-positive (class 1) with a higher value of f(x). However, the higher value of WBC Count is associated with dengue-negative predictions.

Index 795 is a wrong prediction (FP) which was observed for all the models. Figure 28 presents the waterfall plot for this individual prediction for all models. The corresponding feature values for index 795 were identified as WBC Count of 3600 and Platelet Count of 190,000. As shown in Figure 5a, WBC Count for this instance falls below the intersection threshold, which places it within the dengue-positive region. This classification is further supported by the SHAP-based interpretations, where the cumulative SHAP values for this instance were recorded as f(x) = 6.08 in GJO-XGBoost, 6.066 in FOX-XGBoost, and 6.109 in SLO-XGBoost. These consistently high positive values indicate a strong contribution of WBC Count to classifying the instance as dengue-positive across all three models.

3.8. DiCE Analysis

DiCE was introduced to find the feature values that will alter the wrong prediction to a correct prediction. We analyzed five counterfactual values for this particular prediction for each model. Table 22 shows the five counterfactual (CF) values that can alter the prediction from class 1 to 0. For the GJO-XGBoost framework, the mean WBC Count and Platelet Count across the generated counterfactuals are 8339.82 and 226,487.98, respectively. Notably, each counterfactual instance exhibits a WBC Count value greater than the class intersection threshold of 3999.74, as illustrated in Figure 5a. Similarly, the FOX-XGBoost framework’s mean counterfactual values across the five iterations are 15.04 for Hemoglobin and 7405.22 for WBC Count. In SLO-XGBoost, a similar nature is also observed, with a mean WBC Count of 7121.34. Each framework consistently increases WBC Count in its counterfactual instances, suggesting that a higher WBC Count contributes to the correct classification of this index by predicting a dengue-negative case.

3.9. Discussion

The dataset used in this research work has ethical approval from the patients or their guardians and has been approved by the review committee (Upazila Health Complex, Kalai, Joypurhat, Bangladesh) [16]. Since the dataset is very recent, our literature review has not identified any prior research using this dataset. As a result, we compared our work with our previous findings on other datasets. Table 23 presents the outcomes of existing works, including our findings using GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost. Refs. [7,8] achieved accuracy scores of 57.69% and 83.30%, respectively, using the RF classifier with 7 and 10 features on two different datasets. By implementing the PSO-ANN algorithm, Ref. [9] achieved an accuracy of 87.27% on the Delhi Multiple Hospital Dataset. However, these three studies did not report any additional evaluation metrics. The RF classifier was also applied to the Taiwan Dengue Fever Dataset, achieving an accuracy of 89.94% and a recall of 94.79% using a large set of 60 features [10]. Logit Boost demonstrated strong performance on the Discharged Patient Report Dataset, achieving 92% accuracy and F-score, 95% precision, and 90% recall using only eight features [11]. The CBC Dengue Dataset Bangladesh, when used with the SC, improved dengue fever classification by achieving 96.88% accuracy, an F-score of 96.46%, a precision of 97.73%, and a recall of 95.45% based on 14 features [12]. Ref. [13] implemented the XGBoost classifier on the Vietnam Dengue Clinical Dataset and successfully demonstrated a model accuracy of 98.60%, along with an F-score of 96%, a precision of 98%, and a recall of 94%. And Ref. [6] used ET to gain accuracy, F-score, precision, and recall of 99.03%, 99.04%, 98.92%, and 99.17%, respectively, using 21 features of the Taiz Dengue Surveillance Dataset. According to our investigation, Ref. [14] demonstrated that the SVM classifier applied to the Dirgahayu Hospital Dengue Dataset achieved the highest performance, with 99.10% accuracy, precision, and recall, by using 20 features. However, the study did not report the F-score.

We have proposed three frameworks, GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost, each of which outperforms all existing models by achieving a 10-fold CV mean accuracy of 99.89% on the PCD dataset. Our models have also achieved the highest mean F-score (99.92%) and precision (99.84%) compared with all previous works. Furthermore, each of the proposed frameworks outperforms the existing models by attaining 100% recall. Although each of the proposed models has achieved the same accuracy, F-score, precision, and recall, they utilize different subsets of features. Based on feature reduction, GJO-XGBoost outperforms all other models, including the proposed ones, by achieving top performance by using only the two most important features: WBC Count and Platelet Count. FOX-XGBoost reduces the features from eight to four by selecting Sex, Hemoglobin, WBC Count, and Platelet Count. Finally, SLO-XGBoost extracts only three features: Hemoglobin, WBC Count, and Platelet Count. Therefore, GJO-XGBoost outperforms all existing works in terms of performance while also achieving the highest level of feature reduction among the proposed frameworks.

Figure 4a shows that WBC Count is a feature that is highly correlated with dengue fever. SHAP analysis has also revealed that the WBC Count is the most influential feature when classification is performed without optimization. When optimization is implemented, WBC Count is also identified as the most impactful feature in each model. This highlights the significance of WBC Count in classifying dengue fever. The value of the WBC Count is crucial to making a correct or incorrect prediction. Figure 5 presents the intersection of WBC Count values for dengue-positive and dengue-negative classes, revealing that WBC Count values lower than the intersection point (3999.74) are associated with the dengue-positive class. However, the intersectional level for Platelet Count is 1,33,465.86. According to Figure 5, there is an overlapping area under the dengue fever-positive and -negative curves. So, determining an exact marginal value between the dengue fever-positive and -negative cases for Platelet Count is tough. Another study also described similar findings, where the mean value for dengue infection was

4600 / μ

L for WBC Count [29]. Our frameworks also support this relationship by predicting index 795 (WBC Count: 3600) as dengue-positive, even though the actual label is dengue-negative. SHAP analysis strongly suggests that lower WBC Count values are associated with dengue-positive cases, whereas higher WBC Count values are linked to dengue-negative cases.

So, DiCE counterfactuals can assist specialists in closely monitoring feature values, particularly WBC and Platelet Counts, by showing how their current levels should be transformed to reach the target values for overcoming the current status of dengue fever. In addition, DiCE can help doctors make treatments more precise by identifying the required adjustments in WBC and platelet levels, as not all patients require the same levels. The DiCE explanation in Table 22 for index 795 demonstrates how an incorrect prediction can be corrected, particularly by increasing the WBC Count value. However, the direct implementation of the DiCE outcome requires more and deeper studies. This further indicates that higher WBC Count values are associated with dengue-negative cases. Ref. [15] also mentions the clinical importance of WBC Count in dengue fever.

Each of the proposed frameworks has achieved the highest accuracy despite using a minimal set of features. Additionally, our models exhibit lower execution times for both training and testing. However, the optimization process is a one-time operation and tends to require more time. Therefore, the use of a minimal set of features not only contributes to achieving high classification accuracy and reducing model execution time but also offers practical benefits in real-world applications. Specifically, it can significantly lower the cost of diagnosis and clinical testing, as fewer medical examinations are required. Moreover, this efficiency can accelerate the treatment process by enabling faster decision-making, ultimately improving patient outcomes with minimal time complexity.

3.10. Future Work

In this research work, we achieved an outstanding performance of 99.89% mean accuracy by using 10-fold cross-validation. This impressive result highlights the potential for deploying our model in real-time dengue fever prediction systems within diagnostic centers. Based on this achievement, smart and rapid dengue detection equipment can be developed to support early and accurate diagnosis. For future work, we plan to adapt and evaluate our model in countries with a high burden of dengue, such as Brazil, Peru, and Colombia, where early detection can significantly improve public health responses. Additionally, we aim to develop low-cost, intelligent diagnostic tools suitable for low-income countries such as Bangladesh, where healthcare infrastructure is often limited and affordable solutions are crucial.

Since there is no existing research on this specific dataset, comparisons with other datasets alone are insufficient. Therefore, we have planned to compare our results with future studies that utilize this dataset. Additionally, if we obtain any datasets with similar features, we will also evaluate our model on those to assess its generalizability.

4. Conclusions

Dengue fever is a viral disease mainly spread by female Aedes mosquitoes, particularly Aedes aegypti. From 1980 up to 11 May 2025, over 37.7 million dengue cases and more than 20,000 deaths were reported in the Americas, and 3.9 billion people in 129+ countries are at risk of dengue fever. This study proposes three SBMHAs wrapped with an XGBoost classifier to enhance dengue fever prediction. The proposed frameworks are GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost. The GJO-XGBoost achieved a 10-fold cross-validation mean accuracy of 99.89%, an F-score of 99.92%, a precision of 99.84%, and a perfect recall of 100% by using only two features: WBC Count and Platelet Count. However, FOX-XGBoost and SLO-XGBoost showed the same performance by using the four (Sex, Hemoglobin, WBC Count, and Platelet Count) and three (Hemoglobin, WBC Count, and Platelet Count) most impactful features, respectively. GJO-XGBoost takes the lowest average training and test time at 50.13 ms and 10.88 ms, respectively. SHAP analysis showed that WBC Count is the most impactful feature for dengue fever classification, with the highest SHAP values in each model. SHAP also indicated that a lower value of WBC Count is a good indicator of dengue fever. Additionally, the DiCE explanation indicated that the misclassification of index 795 can be corrected by increasing the value of WBC Count.

We used the PCD dataset, which contains 931 usable samples, relatively small in size. Therefore, the limited dataset size is one of the main limitations of our study. Additionally, due to the varying feature sets across other publicly available dengue datasets, we were unable to directly implement the proposed models on them for comparative evaluation. We intend to investigate the performance of our models further if similar datasets become available in the future.

Author Contributions

Conceptualization, P.S., J.-J.T., and A.-A.N.; Formal analysis, P.S. and A.-A.N.; Funding acquisition, J.-J.T.; Investigation, P.S. and A.-A.N.; Methodology, P.S. and A.-A.N.; Software, P.S. and J.-J.T.; Supervision, J.-J.T. and A.-A.N.; Validation, P.S.; Visualization, P.S., J.-J.T., and A.-A.N.; Writing—original draft, P.S.; Writing—review and editing, J.-J.T. and A.-A.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research study received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

In this study, we utilized a publicly available dataset from the Mendeley Data repository. Dataset title: Predictive Clinical Dataset for Dengue Fever Using Vital Signs and Blood Parameters. Dataset DOI: https://doi.org/10.17632/xrsbyjs24t.1, accessed on 15 July 2025.

Acknowledgments

We acknowledge the use of large language models (LLMs), such as Chat GPT (Version: GPT-4) and DeepSeek (Version: DeepSeek-V3) for enhancing the clarity and grammatical quality of the manuscript. These tools were employed to refine sentence structures without altering the technical content.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. Dengue and Severe Dengue. 2024. Available online: https://www.who.int/news-room/fact-sheets/detail/dengue-and-severe-dengue (accessed on 15 July 2025).
Damtew, Y.T.; Tong, M.; Varghese, B.M.; Anikeeva, O.; Hansen, A.; Dear, K.; Zhang, Y.; Morgan, G.; Driscoll, T.; Bi, P.; et al. Effects of high temperatures and heatwaves on dengue fever: A systematic review and meta-analysis. EBioMedicine 2023, 91, 104582. [Google Scholar] [CrossRef]
Pan American Health Organization. Dengue Indicators. 2025. Available online: https://www3.paho.org/data/index.php/en/mnu-topics/indicadores-dengue-en.html (accessed on 15 July 2025).
World Health Organization. Global Dengue Dashboard. 2025. Available online: https://worldhealthorg.shinyapps.io/dengue_global/ (accessed on 13 May 2025).
Bhatt, S.; Gething, P.W.; Brady, O.J.; Messina, J.P.; Farlow, A.W.; Moyes, C.L.; Drake, J.M.; Brownstein, J.S.; Hoen, A.G.; Hay, S.I.; et al. The global distribution and burden of dengue. Nature 2013, 496, 504–507. [Google Scholar] [CrossRef]
Abdualgalil, B.; Abraham, S.; Ismael, W.M. Early diagnosis for dengue disease prediction using efficient machine learning techniques based on clinical data. J. Robot. Control 2022, 3, 257–268. [Google Scholar] [CrossRef]
Silitonga, P.; Dewi, B.E.; Bustamam, A.; Al-Ash, H.S. Evaluation of dengue model performances developed using artificial neural network and random forest classifiers. Procedia Comput. Sci. 2021, 179, 135–143. [Google Scholar] [CrossRef]
Rajathi, N.; Kanagaraj, S.; Brahmanambika, R.; Manjubarkavi, K. Early detection of dengue using machine learning algorithms. Int. J. Pure Appl. Math. 2018, 118, 3881–3887. [Google Scholar]
Gambhir, S.; Malik, S.K.; Kumar, Y. PSO-ANN based diagnostic model for the early detection of dengue disease. New Horizons Transl. Med. 2017, 4, 1–8. [Google Scholar] [CrossRef]
Kuo, C.Y.; Yang, W.W.; Su, E.C.Y. Improving dengue fever predictions in Taiwan based on feature selection and random forests. BMC Infect. Dis. 2024, 24 (Suppl. S2), 334. [Google Scholar] [CrossRef]
Iqbal, N.; Islam, M. Machine learning for dengue outbreak prediction: A performance evaluation of different prominent classifiers. Informatica 2019, 43, 363–371. [Google Scholar] [CrossRef]
Riya, N.J.; Chakraborty, M.; Khan, R. Artificial intelligence based early detection of dengue using CBC data. IEEE Access 2024, 12, 112355–112367. [Google Scholar] [CrossRef]
Chowdhury, S.U.; Sayeed, S.; Rashid, I.; Alam, M.G.R.; Masum, A.K.M.; Dewan, M.A.A. Shapley-additive-explanations-based factor analysis for dengue severity prediction using machine learning. J. Imaging 2022, 8, 229. [Google Scholar] [CrossRef]
Hamdani, H.; Hatta, H.R.; Puspitasari, N.; Septiarini, A.; Henderi, H. Dengue classification method using support vector machines and cross-validation techniques. IAES Int. J. Artif. Intell. 2022, 11, 1119. [Google Scholar] [CrossRef]
Rosenberger, K.D.; Khanh, L.P.; Tobian, F.; Chanpheaktra, N.; Kumar, V.; Lum, L.C.S.; Sathar, J.; Sandoval, E.P.; Marón, G.M.; Thu, H.V.T.; et al. Early diagnostic indicators of dengue versus other febrile illnesses in Asia and Latin America (IDAMS study): A multicentre, prospective, observational study. Lancet Glob. Health 2023, 11, e361–e372. [Google Scholar] [CrossRef]
Islam, O.; Mahmud, A. A benchmark dataset for analyzing hematological responses to dengue fever in Bangladesh. Data Brief 2024, 57, 111030. [Google Scholar] [CrossRef]
Benaim, A.R.; Almog, R.; Gorelik, Y.; Hochberg, I.; Nassar, L.; Mashiach, T.; Khamaisi, M.; Lurie, Y.; Azzam, Z.S.; Beyar, R. Analyzing medical research results based on synthetic data and their relation to real data results: Systematic comparison from five observational studies. JMIR Med. Inform. 2020, 8, e16492. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Brent, R.P. An algorithm with guaranteed convergence for finding a zero of a function. Comput. J. 1971, 14, 422–425. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Sarker, P.; Tiang, J.J.; Nahid, A.A. Metaheuristic-Driven Feature Selection for Human Activity Recognition on KU-HAR Dataset Using XGBoost Classifier. Sensors 2025, 25, 5303. [Google Scholar] [CrossRef]
Agrawal, P.; Abutarboush, H.F.; Ganesh, T.; Mohamed, A.W. Metaheuristic algorithms on feature selection: A survey of one decade of research (2009–2019). IEEE Access 2021, 9, 26766–26791. [Google Scholar] [CrossRef]
Chopra, N.; Ansari, M.M. Golden jackal optimization: A novel nature-inspired optimizer for engineering applications. Expert Syst. Appl. 2022, 198, 116924. [Google Scholar] [CrossRef]
Mohammed, H.; Rashid, T. FOX: A FOX-inspired optimization algorithm. Appl. Intell. 2023, 53, 1030–1050. [Google Scholar] [CrossRef]
Masadeh, R.; Mahafzah, B.A.; Sharieh, A. Sea lion optimization algorithm. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 388–395. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Shapley, L.S. A value for n-person games. Contrib. Theory Games 1953, 2, 307–317. [Google Scholar]
Mothilal, R.K.; Sharma, A.; Tan, C. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 607–617. [Google Scholar]
Chaloemwong, J.; Tantiworawit, A.; Rattanathammethee, T.; Hantrakool, S.; Chai-Adisaksopha, C.; Rattarittamrong, E.; Norasetthada, L. Useful clinical features and hematological parameters for the diagnosis of dengue infection in patients with acute febrile illness: A retrospective study. BMC Hematol. 2018, 18, 20. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Yearly dengue trends in America (2014–2024): total, confirmed cases, and deaths [3].

Figure 2. Global dengue confirmed cases from 2010 to March 2025 [4].

Figure 3. Methodology of the proposed framework.

Figure 4. (a) Correlation of the dataset’s attributes. and (b) t-SNE plot of the dataset.

Figure 5. Intersection of features based on classes 0 and 1.

Figure 6. Structure of XGBoost algorithm.

Figure 7. Exploration and exploitation in GJO [23].

Figure 8. Red Fox hunting behavior and mathematical model development [24].

Figure 9. Vocalization and encircling process of SLO [25].

Figure 10. Preparing 10-fold CV sets.

Figure 11. Presentation of DiCE activity.

Figure 12. Epoch vs. fitness of the optimizers.

Figure 13. Distribution of different performance metrics of the models.

Figure 14. Models’ mean performance (XGB = XGBoost).

Figure 15. GJO-XGBoost confusion matrices for test fold sets.

Figure 16. ROC curves for all models.

Figure 17. PR AUCs obtained for all optimized models at fold 1.

Figure 18. Mean execution time complexity for the models (XGB = XGBoost).

Figure 19. ROC curves of all optimized models using the imputed dataset (XGB = XGBoost).

Figure 20. Calibration curves of all optimized models at fold 1.

Figure 21. Fitness vs. epoch plot for GJO-XGBoost with internal nested 5-fold CV.

Figure 22. No-optimizer SHAP analysis.

Figure 23. Mean SHAP values for GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost (XGB = XGBoost).

Figure 24. Violin and beeswarm plots of FOX-XGBoost model (XGB = XGBoost).

Figure 25. SHAP scatter plot for all four selected features in FOX-XGBoost.

Figure 26. Individual prediction of index 127 for all models.

Figure 27. Individual prediction of index 322 for all models.

Figure 28. Individual prediction of index 795 for all models.

Table 1. Summary of previously used dengue datasets and the performance (Att. = Attributes, XGB = XGBoost).

SL	Dataset Name	Samples	Att.	Model	Accuracy (%)
01	Universitas Indonesia Dengue Dataset [7]	130	8	RF	57.69
02	Karuna Medical Hospital Kerala Dataset [8]	100	11	RF	83.30
03	Delhi Multiple Hospital Dataset [9]	110	16	PSO-ANN	87.27
04	Taiwan Dengue Fever Dataset [10]	805	12	RF	89.94
05	Discharged Patient Report Dataset [11]	75	9	Logit Boost	92.00
06	CBC Dengue Dataset Bangladesh [12]	320	15	SC	96.88
07	Vietnam Dengue Clinical Dataset [13]	2301	23	XGB	98.60
08	Taiz Dengue Surveillance Dataset [6]	6694	22	ET	99.03
09	Dirgahayu Hospital Dengue Dataset [14]	110	21	SVM	99.10

Table 2. Description of the PCD dataset (MV. = missing values).

Feature	Unit	Symbol	Datatype	MV.	Role	Min	Max
Age	Year	$f_{0}$	Integer	No	Feature	3	120
Sex	Male/Female/Child	$f_{1}$	Categorical	No	Feature	–	–
Hemoglobin	g/dL	$f_{2}$	Continuous	No	Feature	11	25
WBC Count	× $10^{3} / μ$ L	$f_{3}$	Continuous	Yes (2.39%)	Feature	2000	10,900
Differential Count	%	$f_{4}$	Binary	No	Feature	–	–
RBC Panel	–	$f_{5}$	Binary	No	Feature	–	–
Platelet Count	× $10^{3} / μ$ L	$f_{6}$	Continuous	Yes (1.69%)	Feature	10,000	500,000
PDW	%	$f_{7}$	Continuous	Yes (1.89%)	Feature	1	215
Final Output	–	–	Binary	Yes (1.40%)	Target	–	–

Table 3. Descriptive statistics of features by class (0 and 1).

Feature	Class 0 (300)			Class 1 (631)
	Min	Max	Ave	Min	Max	Ave
Age	3.0	120.0	47.533	3.0	99.0	40.036
Hemoglobin	11.0	25.0	13.727	11.0	16.6	13.729
WBC Count	3600.0	10,900.0	7462.667	2000.0	3700.0	2849.921
Platelet Count	17,800.0	500,000.0	216,231.667	10,000.0	190,340.0	65,979.769
PDW	1.0	65.6	30.380	9.0	215.0	19.448

Table 4. Boundary conditions for XGBoost classifier hyperparameters.

Hyperparameter	Name	Type	Range
Number of Estimators	`n_estimators`	Integer	100 to 300
Learning Rate	`learning_rate`	Float	0.001 to 0.05
Max Depth	`max_depth`	Integer	3 to 7
Minimum Child Weight	`min_child_weight`	Integer	1 to 10
Feature Range	`features`	Binary Vector (length = 8)	Binary (0 or 1)

Table 5. Fold-wise optimized hyperparameters, selected features, and fitness progression.

Optimizer	Fold	Features	$n_{best}$	$l_{best}$	$m_{best}$	$c_{best}$	Fitness@1	Fitness@100	Sat. Fitness	Sat. Epoch
GJO	1	$f_{3}, f_{6}$	249	0.0455	3	1	1.0	1.0	1.0	1
	2	$f_{3}, f_{6}$	239	0.0455	3	1	1.0	1.0	1.0	1
	3	$f_{3}, f_{6}$	249	0.0455	3	1	1.0	1.0	1.0	1
	4	$f_{3}, f_{6}$	249	0.0455	3	1	1.0	1.0	1.0	1
	5	$f_{3}, f_{6}$	249	0.0455	4	1	1.0	1.0	1.0	1
	6	$f_{3}, f_{6}$	248	0.0455	4	1	1.0	1.0	1.0	1
	7	$f_{3}$	114	0.0455	3	1	1.0	1.0	1.0	1
	8	$f_{3}$	114	0.0455	3	1	1.0	1.0	1.0	1
	9	$f_{3}$	114	0.0453	3	1	1.0	1.0	1.0	1
	10	$f_{3}, f_{6}$	248	0.0455	4	1	1.0	1.0	1.0	1
FOX	1	$f_{1}, f_{2}, f_{3}, f_{6}$	145	0.05	7	1	1.0	1.0	1.0	1
	2	$f_{1}, f_{2}, f_{3}, f_{6}$	151	0.05	7	2	1.0	1.0	1.0	1
	3	$f_{1}, f_{2}, f_{3}, f_{6}$	151	0.05	7	2	1.0	1.0	1.0	1
	4	$f_{1}, f_{2}, f_{3}, f_{6}$	143	0.05	7	1	0.9989	1.0	1.0	19
	5	$f_{1}, f_{2}, f_{3}, f_{6}$	143	0.05	7	1	0.9989	1.0	1.0	19
	6	$f_{1}, f_{2}, f_{3}, f_{6}$	148	0.05	7	2	1.0	1.0	1.0	1
	7	$f_{1}, f_{3}, f_{6}$	146	0.05	7	8	1.0	1.0	1.0	1
	8	$f_{1}, f_{3}, f_{6}$	146	0.05	7	8	1.0	1.0	1.0	1
	9	$f_{1}, f_{3}, f_{6}$	146	0.05	7	8	1.0	1.0	1.0	1
	10	$f_{1}, f_{2}, f_{3}, f_{6}$	143	0.05	7	1	0.9989	1.0	1.0	19
SLO	1	$f_{2}, f_{3}, f_{6}$	258	0.0396	6	1	0.9989	1.0	1.0	4
	2	$f_{2}, f_{3}, f_{6}$	258	0.0396	6	1	0.9989	1.0	1.0	4
	3	$f_{2}, f_{3}, f_{6}$	258	0.0396	6	1	0.9988	1.0	1.0	4
	4	$f_{2}, f_{3}, f_{6}$	258	0.0396	6	1	0.9989	1.0	1.0	4
	5	$f_{2}, f_{3}, f_{6}$	258	0.0396	6	1	0.9989	1.0	1.0	4
	6	$f_{2}, f_{3}, f_{6}$	258	0.0396	6	1	0.9989	1.0	1.0	4
	7	$f_{2}, f_{3}, f_{6}$	108	0.0047	3	3	1.0	1.0	1.0	1
	8	$f_{2}, f_{3}, f_{6}$	108	0.0047	3	3	1.0	1.0	1.0	1
	9	$f_{2}, f_{3}, f_{6}$	108	0.0047	3	3	1.0	1.0	1.0	1
	10	$f_{2}, f_{3}, f_{6}$	258	0.0396	6	1	0.9989	1.0	1.0	4

Table 6. Finalized hyperparameters and features using MV.

Optimizer	$n_{best}^{f}$	$l_{best}^{f}$	$m_{best}^{f}$	$c_{best}^{f}$	$F_{best}^{f}$	Number of Features
GJO	249	0.0455210569158742	3	1	$f_{3}, f_{6}$	2
FOX	143	0.05	7	1	$f_{1}, f_{2}, f_{3}, f_{6}$	4
SLO	258	0.0395697655228806	6	1	$f_{2}, f_{3}, f_{6}$	3

Table 7. Multiple baseline (with default settings) classifier model’s mean results of 10-fold validation (all features used; Acc. = accuracy, Pre. = precision, and Rec. = recall).

	Training (%)				Testing (%)
Models	Acc.	F-Score	Pre.	Rec.	Acc.	F-Score	Pre.	Rec.
XGBoost	100.00	100.00	100.00	100.00	99.65	99.59	99.74	99.45
LightGBM	100.00	100.00	100.00	100.00	99.82	99.79	99.84	99.74
CatBoost	100.00	100.00	100.00	100.00	99.83	99.81	99.84	99.78
Random Forest	100.00	100.00	100.00	100.00	99.83	99.81	99.77	99.84
Extra Trees	100.00	100.00	100.00	100.00	99.83	99.81	99.79	99.83
Gradient Boosting	100.00	100.00	100.00	100.00	99.83	99.81	99.84	99.78
Decision Tree	100.00	100.00	100.00	100.00	99.83	99.81	99.84	99.78
KNN	99.25	99.13	99.23	99.03	98.96	98.80	98.86	98.75
SVM	98.53	98.32	98.13	98.54	98.17	97.93	97.61	98.31

Table 8. All models’ results using CV.

	Training (%)				Testing (%)
Fold	Accuracy	F-Score	Precision	Recall	Accuracy	F-Score	Precision	Recall
1	100	100	100	100	100	100	100	100
2	100	100	100	100	100	100	100	100
3	100	100	100	100	100	100	100	100
4	100	100	100	100	100	100	100	100
5	100	100	100	100	100	100	100	100
6	100	100	100	100	100	100	100	100
7	100	100	100	100	99.64	99.72	99.45	100
8	100	100	100	100	99.64	99.74	99.48	100
9	100	100	100	100	99.64	99.75	99.50	100
10	100	100	100	100	100	100	100	100

Table 9. Standard deviation of model’s test performance.

Model	Accuracy	F-Score	Precision
GJO-XGBoost	0.173135	0.127108	0.25354
FOX-XGBoost	0.173135	0.127108	0.25354
SLO-XGBoost	0.173135	0.127108	0.25354

Table 10. Models’ mean results using fold validation (Feat. = features, Acc. = accuracy, Pre. = precision, and Rec. = recall).

		Training (%)				Testing (%)
Model	Feat.	Acc.	F-Score	Pre.	Rec.	Acc.	F-Score	Pre.	Rec.
GJO-XGBoost	2	100	100	100	100	99.89	99.92	99.84	100
FOX-XGBoost	4	100	100	100	100	99.89	99.92	99.84	100
SLO-XGBoost	3	100	100	100	100	99.89	99.92	99.84	100

Table 11. Information of index 795 with selected features for frameworks.

Framework	Features	Sex ( $f_{1}$ )	Hemoglobin ( $f_{2}$ )	WBC Count ( $f_{3}$ )	Platelet Count ( $f_{6}$ )
GJO-XGBoost	$f_{3}$ , $f_{6}$	–	–	3600	190,000
FOX-XGBoost	$f_{1}$ , $f_{2}$ , $f_{3}$ , $f_{6}$	2	14.9	3600	190,000
SLO-XGBoost	$f_{2}$ , $f_{3}$ , $f_{6}$	–	14.9	3600	190,000

Table 12. Fold-wise AUC scores and mean AUC for GJO-XGBoost, FOX-XGBoost, and SLO-XGBoost models.

Model	Folds 1–6	Fold 7	Fold 8	Fold 9	Fold 10	Mean AUC
GJO-XGBoost	1.00	0.994898	0.999823	0.99375	1.0	0.998847
FOX-XGBoost	1.00	0.994898	0.999823	0.99375	1.0	0.998847
SLO-XGBoost	1.00	0.994898	0.999823	0.99375	1.0	0.998847

Table 13. Fold-wise PR AUCs of different models in percentage.

Model	Folds 1–6	Fold 7	Fold 8	Fold 9	Fold 10	Mean PR AUC
GJO-XGB	100	99.45	99.48	99.50	100	99.84
FOX-XGB	100	99.45	99.48	99.50	100	99.84
SLO-XGB	100	99.45	99.48	99.50	100	99.84

Table 14. Training, test, and optimization times for each fold (STD = standard deviation).

	GJO-XGBoost			FOX-XGBoost			SLO-XGBoost
Fold	TrT (ms)	TsT (ms)	OpT (s)	TrT (ms)	TsT (ms)	OpT (s)	TrT (ms)	TsT (ms)	OpT (s)
1	48.39	10.80	338.53	337.38	14.72	161.37	52.45	46.02	303.33
2	51.50	9.21	326.74	38.93	12.53	152.46	46.07	10.23	315.57
3	51.05	9.06	324.27	33.61	9.14	153.16	51.76	9.60	310.85
4	47.82	8.81	330.94	30.88	9.49	149.95	55.59	9.25	314.97
5	42.78	12.14	330.77	72.17	25.23	145.73	53.37	6.72	322.13
6	40.24	11.12	329.35	118.01	52.76	169.48	47.47	8.88	312.08
7	78.44	18.09	206.90	33.18	9.32	146.87	54.18	8.77	270.95
8	44.17	9.43	212.28	29.41	9.23	133.57	45.31	9.11	298.62
9	46.72	10.21	212.95	28.50	10.18	136.84	42.68	12.52	273.67
10	50.18	9.92	331.49	64.79	8.72	165.02	55.71	12.35	313.30
Mean	50.13	10.88	294.42	78.69	16.13	151.44	50.46	13.34	303.55
STD	10.60	2.74	57.90	95.23	13.81	11.55	4.68	11.61	17.70

Table 15. Pair-wise Mann–Whitney U test results for optimization time (seconds).

Comparison	U-Statistic	p-Value	Significant
FOX vs. GJO	100.00	0.0002	Yes
GJO vs. SLO	70.00	0.1405	No
SLO vs. FOX	0.00	0.0002	Yes

Table 16. Models’ mean results of fold validation on the imputed dataset (Feat. = features, Acc. = accuracy, Pre. = precision, and Rec. = recall).

		Training (%)				Testing (%)
Model	Feat.	Acc.	F-Score	Pre.	Rec.	Acc.	F-Score	Pre.	Rec.
GJO-XGBoost	2	99.90	99.93	100.0	99.85	99.67	99.75	99.85	99.66
FOX-GJO	4	99.90	99.93	100.0	99.85	99.63	99.73	99.85	99.61
SLO-XGBoost	3	99.90	99.93	100.0	99.85	99.63	99.73	99.85	99.61

Table 17. Mean test performance comparison of XGBoost classifier with different encoders.

Model	Accuracy	F-Score	Precision	Recall
Only XGBoost (Label Encoding)	99.65%	99.74%	99.48%	100%
Only XGBoost (One-Hot Encoding)	99.75%	99.82%	99.63%	100%

Table 18. Mean test performance comparison of FOX-XGBoost with different encoders.

Model	Accuracy	F-Score	Precision	Recall
FOX-XGBoost (Label Encoding)	99.89%	99.92%	99.84%	100%
FOX-XGBoost (One-Hot-Encoding)	99.89%	99.92%	99.84%	100%

Table 19. Mean Brier Score Loss (BSL) values of all optimized models.

Metric	GJO-XGBoost	FOX-XGBoost	SLO-XGBoost
Mean BSL	0.0014	0.0014	0.0014

Table 20. Finalized features and hyperparameters for GJO-XGBoost using internal nested 5-fold CV.

Features	n_estimators	learning_rate	max_depth	min_child_weight
Index: 3, 6	117	0.0378144791468909	4	1

Table 21. Mean test performance of GJO-XGBoost using internal nested 5-fold CV (Tr = training, Te = test, Acc. = accuracy, F = F-score, Pre = precision, and Rec = recall).

Tr Acc	Tr F	Tr Pre	Tr Rec	Te Acc	Te F	Te Pre	Te Rec
99.89	99.92	99.84	100	99.75	99.82	99.63	100

Table 22. Diverse counterfactual set for index 795 (new outcome: 0).

Framework	CF Number	Sex ( $f_{1}$ )	Hemoglobin ( $f_{2}$ )	WBC Count ( $f_{3}$ )	Platelet Count ( $f_{6}$ )
GJO-XGBoost	1	–	–	9485.8	328,247.8
	2	–	–	6522.0	328,391.2
	3	–	–	8253.6	94,799.9
	4	–	–	8878.2	190,000.0
	5	–	–	8561.3	190,000.0
	Mean	–	–	8339.82	226,487.98
FOX-XGBoost	1	2	15.6	7758.2	190,000.0
	2	2	14.9	5165.6	190,000.0
	3	0	14.9	7245.9	190,000.0
	4	2	14.9	8431.5	190,000.0
	5	2	14.9	8424.9	190,000.0
	Mean	–	15.04	7405.22	190,000.0
SLO-XGBoost	1	–	14.9	7836.6	190,000.0
	2	–	16.0	8347.0	190,000.0
	3	–	14.9	5855.3	190,000.0
	4	–	14.9	5537.4	190,000.0
	5	–	15.0	8030.4	190,000.0
	Mean	–	15.14	7121.34	190,000.0

Table 23. Comparison of existing frameworks and the proposed frameworks.

SL	Dataset Name	Model	Accuracy (%)	F-Score (%)	Precision (%)	Recall (%)	Features
01	[7]	RF	57.69	–	–	–	7
02	[8]	RF	83.30	–	–	–	10
03	[9]	PSO-ANN	87.27	–	–	–	15
04	[10]	RF	89.94	–	–	94.79	60
05	[11]	Logit Boost	92.00	92.00	95.00	90.00	8
06	[12]	SC	96.88	96.46	97.73	95.45	14
07	[13]	XGBoost	98.60	96.00	98.00	94.00	22
08	[6]	ET	99.03	99.04	98.92	99.17	21
09	[14]	SVM	99.10	–	99.10	99.10	20
The Proposed Frameworks
01	PCD Dataset	GJO-XGBoost	99.89	99.92	99.84	100	2
02		FOX-XGBoost	99.89	99.92	99.84	100	4
03		SLO-XGBoost	99.89	99.92	99.84	100	3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sarker, P.; Tiang, J.-J.; Nahid, A.-A. Dengue Fever Detection Using Swarm Intelligence and XGBoost Classifier: An Interpretable Approach with SHAP and DiCE. Information 2025, 16, 789. https://doi.org/10.3390/info16090789

AMA Style

Sarker P, Tiang J-J, Nahid A-A. Dengue Fever Detection Using Swarm Intelligence and XGBoost Classifier: An Interpretable Approach with SHAP and DiCE. Information. 2025; 16(9):789. https://doi.org/10.3390/info16090789

Chicago/Turabian Style

Sarker, Proshenjit, Jun-Jiat Tiang, and Abdullah-Al Nahid. 2025. "Dengue Fever Detection Using Swarm Intelligence and XGBoost Classifier: An Interpretable Approach with SHAP and DiCE" Information 16, no. 9: 789. https://doi.org/10.3390/info16090789

APA Style

Sarker, P., Tiang, J.-J., & Nahid, A.-A. (2025). Dengue Fever Detection Using Swarm Intelligence and XGBoost Classifier: An Interpretable Approach with SHAP and DiCE. Information, 16(9), 789. https://doi.org/10.3390/info16090789

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dengue Fever Detection Using Swarm Intelligence and XGBoost Classifier: An Interpretable Approach with SHAP and DiCE

Abstract

1. Introduction

2. Methodology

2.1. Dataset

2.2. Extreme Gradient Boost Classifier

2.3. Swarm-Based Metaheuristic Algorithms

2.3.1. Golden Jackal Optimization

2.3.2. Fox Optimizer

2.3.3. Sea Lion Optimization

2.4. Proposed Framework Development

2.4.1. The Making of 10-Fold Datasets

2.4.2. Defining the Framework Problem

2.4.3. Voting

2.4.4. Final Model

2.5. Performance Evaluation

2.6. SHAP Explanation

2.7. Diverse Counterfactual Explanations

3. Results

3.1. Optimizer Outcome

3.2. Classifier Outcome

3.3. Analysis with Imputation of Data Samples

3.4. One-Hot Encoding Analysis

3.5. Calibration Analysis

3.6. Implementation of Internal Nested Cross-Validation

3.7. SHAP Analysis

3.8. DiCE Analysis

3.9. Discussion

3.10. Future Work

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI