XAI-MedNet: A Next-Generation Explainable AI Framework for Contrast-Enhanced Skin Lesion Classification via Entropy-Controlled Optimization

Alabduljabbar, Abdulrahman; Akram, Tallha; Altherwy, Youssef N.; Akram, Muhammad Adeel; Ashraf, Imran

doi:10.3390/bioengineering13050506

Open AccessArticle

XAI-MedNet: A Next-Generation Explainable AI Framework for Contrast-Enhanced Skin Lesion Classification via Entropy-Controlled Optimization

by

Abdulrahman Alabduljabbar

¹

,

Tallha Akram

¹,

Youssef N. Altherwy

¹

,

Muhammad Adeel Akram

² and

Imran Ashraf

^3,*

¹

Department of Information Systems, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia

²

Department of Computer Engineering, COMSATS University Islamabad, Wah Campus, Wah Cantt 47040, Pakistan

³

Computer Engineering Lab, QCE Department, EEMCS, TU Delft, 2628 CD Delft, The Netherlands

^*

Author to whom correspondence should be addressed.

Bioengineering 2026, 13(5), 506; https://doi.org/10.3390/bioengineering13050506

Submission received: 14 March 2026 / Revised: 18 April 2026 / Accepted: 20 April 2026 / Published: 27 April 2026

(This article belongs to the Special Issue Recent Advances in Machine Learning and Explainable Artificial Intelligence in Biomedical Data Mining, and Disease Diagnosis Frameworks—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Explainable Artificial Intelligence (XAI) has become a critical requirement in medical image analysis, where transparency and interpretability are essential for clinical trust and decision support. Melanoma is recognized as one of the most deadly types of skin cancer, with its occurrence exhibiting an increasing pattern in recent times. However, detecting this cancer in its initial stages greatly increases patients’ chances of long-term survival. Various computer-based techniques have recently been proposed to diagnose skin lesions at their early stages. Even though the machine learning community has achieved a certain degree of success, there is still an unresolved research challenge regarding high error margins and the limited interpretability of automated systems. This study focuses on addressing both segmentation and classification tasks, with particular emphasis on two key concepts: (1) improving image quality to maximize distinguishability between foreground and background regions, thereby enhancing visual interpretability and segmentation accuracy and (2) eliminating redundant and cluttered feature information to generate the most discriminative and compact feature representations. The input images are initially processed using a novel metaheuristic contrast-stretching method to estimate image-specific key parameters, thereby enhancing lesion boundary clarity in a clinically interpretable manner. Following this, the improved images are fed into selected pre-trained deep models, including DenseNet-201, Inception-ResNet v2, and NASNet-Mobile. The extracted features from all pre-trained models are fused to produce resultant vectors, which are then refined using a bio-inspired feature selection method, termed entropy-controlled whale optimization, to retain only the most informative attributes. The selected discriminative feature set is subsequently classified using multiple classifiers. The results indicate that the proposed framework achieves superior performance compared to existing methods in terms of accuracy, sensitivity, specificity, and F1-score. Additionally, it facilitates a more explainable, transparent, and structured diagnostic pipeline appropriate for medical applications.

Keywords:

Explainable AI; interpretable machine learning; skin lesion classification; BAT optimization; artificial bee colony optimization; whale optimization; feature selection; CNN models; evolutionary techniques

1. Introduction

Skin cancer is a prevalent type of cancer that affects a large number of people globally, with around 5.4 million cases identified each year. The escalating incidence of skin cancer in recent decades has positioned this ailment as a matter of particular concern within the realm of public health [1]. Accurate and prompt classification of skin cancer is crucial for determining the most effective treatment plans. The standard criteria for categorizing skin cancers include the disease’s origin, development pattern, and level of cellular differentiation [2].

Skin cancer presents in a variety of forms, including melanoma, squamous cell carcinoma, and basal cell carcinoma [3]. To achieve the best possible results when treating patients with any of these skin cancers, it is imperative that they be accurately classified into one of these three categories [4]. This is because different subtypes and stages of skin cancer require different treatment modalities. The process of determining which of these three categories a patient’s cancer falls into involves identifying the tumor’s individual characteristics, which then help the clinician make appropriate treatment decisions for that patient. As a result of advancements in computer vision (CV) and artificial intelligence (AI), healthcare practitioners have developed automated methods for classifying skin lesions. These systems support healthcare practitioners with accurate, timely diagnostic information for patients with suspected skin cancer [5]. When diagnosing a patient with suspected skin cancer, an automated classification system uses machine learning to evaluate and classify the dermal lesion based on its specific features and provides the clinician with potential treatment options based on the classification results. The implementation of automated classification systems has the potential to increase the accuracy and effectiveness of skin cancer diagnosis, thereby improving patient outcomes [6,7].

Several methodologies are currently used to identify skin cancer, ranging from visual examinations by medical professionals to advanced imaging techniques [8,9]:

Visual inspection: Healthcare professionals assess suspicious moles or lesions using the naked eye or a dermatoscope, a handheld device that magnifies the skin.
Biopsy: This procedure involves extracting a small portion of skin tissue for microscopic analysis, typically performed when an abnormal lesion is detected during a visual examination.
Dermoscopy: A non-invasive technique using a dermatoscope to reveal internal skin structures through magnification and polarized light.
Reflectance confocal microscopy (RCM): A non-invasive method that captures high-resolution images of the dermis’ internal architecture using a laser.
Computer-aided diagnosis (CAD): This approach utilizes machine learning algorithms to analyze images and label them as benign or malignant based on their features.
Beyond improving predictive performance, modern CAD systems must be interpretable and transparent to achieve clinical acceptance. This study addresses these requirements by situating the proposed framework within Explainable AI, using interpretable preprocessing and entropy-driven feature selection.

In this work, we focus on both segmentation and classification tasks, emphasizing two core concepts. First, improving image quality allows for better distinction between the foreground and background regions; second, removing noisy and redundant information leads to more discriminative feature representations. Our key contributions include proposing a hybrid meta-heuristic model based on an extended BA-ABC for contrast stretching and introducing entropy-controlled whale optimization for feature selection. We selected publicly available datasets to evaluate our framework; Figure 1 illustrates a few samples.

2. Literature Review

The detection of skin lesion borders and classes has been the subject of many models presented by researchers [10,11,12,13]. The past decade has seen the widespread adoption of convolutional neural networks (CNNs), which have enabled more accurate segmentation and classification results [14]. However, in practice, it’s fairly common to downsample images to avoid unnecessary iterations and parameter calculations, which could result in the loss of certain features. Similarly, preprocessors can improve segmentation and classification results—but at the expense of increased processing time. In the CV domain, however, there’s currently no universal model for removing noise and improving contrast. Researchers, however, have made important contributions by suggesting a plethora of techniques to address this issue. In [15,16], the authors utilized contrast enhancement as a preprocessor to differentiate the region of interest (RoI). Additionally, they created a saliency map using a single channel of the RGB color space to generate a binary image. Furthermore, they explored a bio-inspired evolutionary algorithm, particle swarm optimization (PSO), for efficient boundary estimation. Following the addition of a stage for feature extraction, they turned to a metaheuristic approach using a genetic algorithm (GA) to identify the most discriminant feature set, followed by a classifier to categorize the features into a selected number of classes. Similarly, Khan et al. [17] presented a deep learning strategy for evaluating skin lesions. The images were first subjected to a decorrelation formulation before being segmented using Mask R-CNN. DenseNet was fed the segmented images to extract features, and a least squares, entropy-controlled SVM algorithm was used to rank the features based on their relevance. The authors tested their approach on three selected benchmark datasets including ISBI-2016, ISIC–2017 and HAM10000, achieving average accuracies of 96.3%, 94.8%, and 88.5%, respectively.

The authors in [18] introduced a mutually-guided convolutional neural network (CNN) architecture for the classification of skin lesions. Initially, they divided the lesions into segments by employing a modified coarse segmentation network. Subsequently, they employed a mask-guided CNN for classification. They tested their framework on two publicly available datasets, including ISIC2017 and PH², achieving accuracies of 93.8% and 97.7%, respectively. Even with an imbalanced dataset, this approach has been shown to be helpful in segmenting and classifying skin lesions.

Similarly, Al-Masni et al. [19] proposed a unified deep learning model for segmenting and classifying a wide variety of skin lesions. For segmentation, they utilized four pre-trained deep CNN classification models, and for lesion extraction, they employed a full-resolution deep learning approach. Three skin datasets (ISBI-2016 and ISIC-2017, and ISIC-2018) were used to evaluate the proposed technique, and the findings were highlighted once the datasets were normalized correctly. Miglani and Bhatia [20] examined the efficacy of deep CNN models in identifying skin lesions. ResNet-50 and EfficientNet-B0 were used to implement transfer learning, with optimal parameters set. EfficientNet-B0 surpassed ResNet-50 on the HAM1000 dataset, with macro and micro AUC values of 0.93 and 0.97, respectively.

A skin lesion classification framework based on data augmentation was proposed by Cano et al. [21], in which a NASNet deep network was trained directly on the expanded dataset. The classification model’s accuracy was verified using the ISIC dataset. Similarly, Aziz et al. [22] utilized an SVM classifier to classify lesions, utilizing a pre-trained AlexNet model for feature extraction. For both feature selection and fusion using hybrid whale optimization and canonical correlation, Afza et al. [23] (2022) developed a hybrid strategy for classifying skin lesions across several classes. By utilizing the HAM10000 and ISIC-2018 datasets with the extreme learning machine (ELM) classifier, they were able to classify skin lesions with accuracies of 93.40% and 94.36%, respectively.

Recent research has increasingly focused on Vision Transformer (ViT)-based and hybrid methods to overcome the limitations of traditional convolutional architectures in skin lesion classification. An example of this is displayed in Aruk et al.’s study [24] where a hybrid architecture of ConvNeXt-Transformer was produced that uses CNN blocks for the extraction of localized textural features and ViT structures for modeling of long-distance spatial relationships. As cited by Aruk et al. [24], the model built from their original work achieved 94.30% accuracy on the HAM10000 Datastore. Furthermore, the GAMFuse system supplies another example of this type of work. At the beginning of this model, the Swin Transformer (ViT) was integrated, incorporating an attention mechanism to scale global information. Additionally, the GAMFuse system incorporated a Residual CNN to obtain local feature information at the most granular resolution [25]. Finally, GNNs were integrated into this architecture to model relationships between lesions on image-based datasets; the resultant model performance exceeded baseline measures that typify models trained under ’few-shot’ conditions. Collectively, these examples provide evidence that combining local and global feature representations yields state-of-the-art diagnostic performance on dermatological databases.

Despite the success of recent hybrid approaches, most existing methods primarily focus on architectural hybridization, such as combining CNNs with Transformer-based modules or integrating feature fusion strategies. These approaches typically enhance either feature extraction or classification performance but often overlook the importance of jointly optimizing multiple stages of the pipeline. In contrast, the proposed framework introduces a multi-stage hybrid design, where hybridization is applied at both the image preprocessing level (via entropy-guided contrast enhancement) and the feature selection level (via entropy-controlled whale optimization). This enables simultaneous improvement in visual interpretability and feature discriminability, distinguishing the proposed approach from existing hybrid models.

3. Problem Statement and Contributions

Recently, the importance of CAD systems in identifying and classifying skin lesions has increased significantly. Nevertheless, there are particular challenges that make skin lesion classification more complex. These factors include skin flakes, low contrast in lesion regions, air bubbles, hair, diffuse boundaries, and an imbalanced dataset. Furthermore, differences in rotation, lighting, and shearing of the same lesion across multiple images result from the skin’s flexibility, making the classification of those lesions even more challenging. Conventional feature selection methods exhibit reduced accuracy and incur high computational cost after the extraction phase. Therefore, hybrid meta-heuristic algorithms have been presented that demonstrated superiority in performing these functions. Unlike conventional hybrid frameworks that operate at a single stage, the proposed method introduces a dual-level hybridization strategy, integrating entropy-driven preprocessing and entropy-controlled feature selection to improve both interpretability and classification robustness.

The principal contributions of our research are:

Proposal of a novel contrast stretching framework that combines an extended Bat algorithm with an Artificial bee colony (EBA-ABC) algorithm to improve the lesion visibility and boundary clarity.
Proposal of a bio-inspired feature selection framework, entropy-controlled whale optimization algorithm, to address challenges related to the “curse of dimensionality” and over-fitting.
Development of a transparent diagnostic workflow incorporating XAI methodologies to provide clinicians with traceable visual evidence, ensuring that model predictions for lesion malignancy are grounded in interpretable feature analysis.

Given a database of dermoscopic images, each image is required to be assigned a label of malignant or benign. For a dermoscopic image

I \subset R^{(h \times s \times l)}

from the HSL color space, belonging to the given dataset

D^{ν}

, where

ν \in 1, 2, 3, 4

is the selected dataset. The set of images is

(I_{1}^{ν}), (I_{2}^{ν}), \dots, (I U^{ν}) \subset D^{ν} \in R^{(1 \times ν)}

. The dataset contains a set of

U

images, where each image is predetermined to have three channels. Additionally, the corresponding label for each image is provided. With the proposed contrast stretching technique, each image is enhanced, followed by a color space transformation to the RGB space,

Φ^{e} \in R^{R \times G \times B}

. The enhanced images are subsequently employed for transfer learning by utilizing pre-trained models to generate a set of feature vectors,

{\tilde{Φ}}^{X} \in R^{(r \times c)}

. The features undergo a subsequent procedure known as feature selection, wherein the goal is to identify the most discriminative feature information while removing any redundant feature values,

{\tilde{Φ}}^{f s}

.

The final representation of the cascaded system comprises a sequence of stages, including contrast stretching, feature fusion in conjunction with feature selection, and final labeling. The mathematical description of the proposed pipeline is given as:

{\tilde{Φ}}^{l e} ≜ (Φ^{e}, {\tilde{Φ}}^{X}, {\tilde{Φ}}^{f s}) \in R^{(U \times 1)}

(1)

where

{\tilde{Φ}}^{l e}

represents the class labels at the output of the hierarchical structural design.

4. Proposed Framework

In this section, we discuss the proposed framework depicted in Figure 2. The proposed architecture comprises several phases: image pre-processing, feature extraction after transfer learning, feature fusion in conjunction with selection, and classification, creating a semi-transparent diagnostic pipeline aligned with Explainable Artificial Intelligence (XAI) principles. Recall that we aim to tackle both segmentation and classification tasks, with a particular emphasis on two fundamental concepts: (1) An improved contrast stretching technique optimizes the framework’s ability to distinguish between the foreground and background regions, yielding visually interpretable outputs that provide image-level explainability, and (2) A robust feature selection technique mitigates the overfitting constraint while offering feature-level explainability by using a transparent, entropy-controlled mechanism to highlight the most discriminative attributes. Below, we discuss each of the steps outlined above.

4.1. Extended BA-ABC Algorithm

In this section, we discuss the image pre-processing technique, particularly the extended BA-ABC contrast stretching technique, that belongs to a class of nature-inspired algorithms. Nature-inspired algorithms are widely accepted for their ability to generate adaptive, innovative, and efficient solutions, especially those related to contrast enhancement and other image processing tasks [26,27]. Our proposed technique, the extended BA-ABC technique, is a hybrid model for boundary estimation based on bio-inspired bat and artificial bee colony algorithms. We refer the reader to the work published in [28,29] for a detailed review of the bat and artificial bee colony algorithms.

The proposed technique alternates between the BA and ABC algorithms to identify suboptimal solutions. This approach enhances the lightness channel

ψ^{L}

of the RGB-transformed HSL image, resulting in a new enhanced lightness channel

ψ^{e}

. The contrast enhancement model preserves the original input image size, consisting of M rows and N columns. The transformation function of the proposed model is described in a generalized form as:

δ_{x, y}^{g} = T {δ_{x, y}^{f}} \forall x \in M, \forall y \in N

(2)

The function depends on various statistical parameters, including the mean, standard deviation, and local-global attributes, which encompass edge pixels & intensity values, and the intensity distribution measurement, specifically the Gini coefficient. The transformation function consists of two components: the first component involves two parameters,

P β

and

P θ

, and their combined impact with the local mean

ψ_{x, y}^{μ}

and the global mean

μ^{g}

. The main advantage of the local mean

ψ_{x, y}^{μ}

is its dependence on the immediate neighborhood. The following portion of the transformation function improves smoothness and brightness conservation by utilizing the local mean

ψ_{x, y}^{μ}

and by incorporating the parameter

P_{α}

in the exponent.

δ_{x, y}^{g} = (P_{δ} \frac{μ^{g}}{ψ_{x, y}^{σ} + P_{β}}) \times [δ_{x, y}^{f} - P_{τ} \times ψ_{x, y}^{μ}] + {[ψ_{x, y}^{μ}]}^{P_{α}}

(3)

Population generation in the bat algorithm is governed by the parameters

P α, P β, P τ, P δ

. After optimization, these parameters are returned to the main function. The enhanced model is subsequently constructed using the improved cost function, defined as follows:

C_{ψ^{e}} = \frac{l o g (l o g (ψ^{s})) \times e d g e l s (ψ^{e}) \times G (ψ^{e})}{M \times N}

(4)

The quality of the image is assessed based on a number of factors, including an increased level of edge pixels, high randomness in the distribution of pixels, and improved lightness values. The Sobel edge-detection filter identifies sharp changes based on both vertical and horizontal filters, along with the gradients. A log function is implemented twice in the cost function to subdue over-contrast stretching [27]. The Sobel edge detection filter is given as:

ψ^{s} = \sum_{x \in M} \sum_{y \in N} \sqrt{(\frac{\partial ψ_{x, y}^{e}}{\partial_{v}})^{2} + (\frac{\partial ψ_{x, y}^{e}}{\partial_{h}})^{2}}

(5)

where

\partial v

and

\partial h

are the vertical and horizontal gradients. The Gini coefficient is added to the main cost function to measure the randomness or inequality in the distribution of pixel intensities. The Gini coefficient is calculated as:

G (ψ^{e}) = \frac{1}{2 {(M \times N)}^{2} μ} \sum_{j = 1}^{M} \sum_{k = 1}^{N} | x_{j} - x_{k} |

(6)

where

x_{j}

is the intensity of the

j^{t h}

pixel in the sorted list. The detailed flow diagram of the proposed extended contrast stretching model is given in Figure 3.

The parameters defined in Equation (3) serve as the primary inputs for population generation, utilizing the bat algorithm, which is explained in the subsequent section.

4.1.1. Bat Algorithm

Given the common challenges faced by the bat algorithm [30], including premature convergence, imbalance between exploration and exploitation phases, and lack of diversity, we addressed these limitations by improving the velocity and frequency parameters through extending the fundamental equations. The inclusion not only introduces randomization to prevent convergence to local minima but also improves the search by increasing diversity.

The frequency update in the bat algorithm is crucial for regulating both the exploitation and exploration stages, as well as adjusting the step size,

ω_{i} = (ω_{m a x} - ω_{m i n}) β + ω_{m i n}

(7)

where

ω_{i}

is the frequency parameter, and

β

is the random value that controls the influence of the local search. The adaptive velocity update equation is given as:

ν_{i}^{t + 1} = ω_{i} (d_{i}^{t} - d^{*}) + ν_{i}^{t - 1} + α (r a n d [0, 1] - ν_{t h r e s h})

(8)

where

d^{*}

denotes the current global best solution,

α

represents the control parameter that regulates randomness, and

ν_{t h r e s h}

is the break parameter within the range

[0.45 : 0.55]

. The addition of a controlled random term introduces stochasticity, which mitigates the risk of convergence to local minima and expands the search space. The improved position update is defined as follows:

d_{i}^{t + 1} = ν_{i}^{t + 1} + d_{i}^{t} + β (d_{i}^{t} - d^{*}) . α (r a n d [0, 1])

(9)

Bats exhibit random movement in the vicinity of the previously determined global optimal location, influenced by the average loudness

λ^{t}

(multiplied by a random factor

ϑ

) and the global optimum

d^{*}

[30]. As the bat approaches its target, the pulse rate increases along with a decrease in loudness. At time step

t + 1

, the new loudness

L_{i}

and pulse rate

φ_{j}

are calculated using Equation (10), which involve constants

ϑ

and

ϱ

.

The fitness of the population is assessed by employing a cost function, and both the fitness values and parameter set are updated if a better cost is achieved. As determined by the BA, the parameter set that yields the best results is then utilized in the contrast modification function to generate an enhanced intensity channel, denoted as

ψ^{e}

. The output of this algorithm becomes the input to the next hybrid model as a population containing a refined parameter set.

d_{n e w} = π λ^{t} + d_{o l d}, π \in [- 1, 1]

(10)

λ_{i}^{t + 1} = ϑ λ_{i}^{t}, ϑ \in [0, 1]

P_{i}^{t + 1} = P_{i}^{0} (1 - e^{- ϱ t}), ϱ > 0

4.1.2. Artificial Bee Colony Algorithm

Following multiple iterations, this approach effectively improves the optimization of the parameters. In the ABC algorithm, instead of using global data, self-association relies on local search information, such as changes in search responses, feedback, and interactions with various worker bees. The concept of a worker bee group is directly correlated with the execution of specific tasks in parallel. By incorporating an entropy component into the standard ABC method, the fitness cost is improved. Incorporating this measurement of disorder not only prevents the algorithm from being trapped in local minima but also improves the search ability.

f i t n e s s = \{\begin{matrix} 1 + C - (α \times \sum_{p = 1}^{n} φ η_{p} l o g_{2} η_{p}) & if C < 0 \\ \frac{1}{1 + C - (α \times \sum_{p = 1}^{n} φ_{p} l o g_{2} η_{p})} & otherwise \end{matrix}

(11)

The cost of the objective function, denoted

C

, is determined by Equation (4), which is consistent with the previous BA algorithm. The parameter set of the ABC algorithm, which is a swarm-intelligent approach, consists of food sources. In the context of worker division, it is imperative to utilize all available food sources. Additionally, the process of assigning new members is carried out by a random selection method known as the greedy choice process. The remaining parameters are fixed based on our past research [26,27].

4.2. Feature Fusion

We effectively employed CNNs for feature extraction by applying transfer learning. Specifically, we utilize three widely accepted CNN pre-trained models: Inception-ResNet v2 [31], DenseNet-201 [32], and NASNet-Mobile [33]. These models were chosen based on their performance, specifically their Top-1 accuracy ranking.

The presence of discriminative feature information is crucial for improving classification accuracy. Conversely, the inclusion of irrelevant or redundant features can degrade performance and impose significant computational overhead [34]. To counteract these constraints, a strategy of feature fusion in conjunction with feature selection has been implemented.

Let’s assume, from the selected models, the features are:

Φ_{m} = Φ_{1}, Φ_{2}, Φ_{3} \in R^{(r \times n)}

with dimensions

Φ_{m} = (s \times 1920), (s \times 1536), (s \times 1056)

. The fusion process involves concatenating feature vectors, where each succeeding vector is added to the existing one. Let

F_{D} = Φ_{1}

,

F_{I} = Φ_{2}

,

F_{N} = Φ_{3}

, and a serial concatenation follows the property:

Φ_{m} : = Φ_{1} \oplus Φ_{2} = R^{p} \oplus R^{q} \to R^{p + q} \Rightarrow Φ_{m} : = (Φ_{1}, Φ_{2}) \to (u_{1}, \dots, u_{p}, w_{1}, \dots, w_{q})

where

u_{k} \in Φ_{1} \subset R^{p}

and

w_{l} \in Φ_{2} \subset R^{q}

. For the rest of the combinations, the property still holds:

Φ^{f s} m, 1 = [Φ_{1}, Φ_{2}]

,

Φ^{f s} m, 2 = [Φ_{2}, Φ_{3}]

,

Φ^{f s} m, 3 = [Φ_{1}, Φ_{3}]

,

Φ^{f s} m, 4 = [Φ_{1}, Φ_{2}, Φ_{3}]

.

4.3. Feature Selection

We propose a bio-inspired feature selection technique called entropy-controlled whale optimization. Our proposed technique is based on the work originally proposed by Mirjalili et al. [35], where the authors introduced the whale optimization algorithm (WOA), that imitates the foraging activity of humpback whales. The WOA’s optimization procedure begins with the initialization of the random population. The search process consists of three distinct phases: first, the predator encircles its prey; second, it applies the bubble-net attacking strategy as part of the exploitation phase; and third, it engages in prey hunting during the exploration phase. Below, we discuss each of those phases in detail, with the specific steps outlined in Algorithm 1.

Encircling prey: Whales search at random, depending on their current position. In order to increase the algorithm’s capacity for exploration, this humpback whale trait is applied in this case. The mathematical formulation of this behavior is as follows:

$\hat{D} = | \vec{C} . O_{ϱ}^{(i)} - O^{(i)} |$

(12)

$O^{(i + 1)} = O^{(i)} ϱ - \vec{A} . \hat{D}$

(13)

where $\hat{D}$ represents the distance between the current and randomly selected individual of the population, $O$ represents the position vector of the randomly generated population, $O ϱ$ denotes a randomly chosen individual from the population, and $(.)$ operator performs element-wise multiplication. The two functional parameters $\vec{A}$ and $\vec{C}$ are calculated using the following relations:

$\vec{A} = 2 α . ϱ_{r} - α$

(14)

$\vec{C} = 2 \times ϱ_{r}$

(15)

where $ϱ_{r}$ is the randomly generated number between 0 and 1, and $α$ exhibits a linearly decreasing value from 2 to 0 over iterations.
Exploitation: Bubble-net attacking strategy: The bubble-net attack that humpback whales follow involves them moving in a helix-shaped pattern. The whole strategy is as follows:

$O^{(i + 1)} = \hat{D} ˙ e^{b l} c o s (2 π l) + O_{b e s t}^{(i)}$

(16)

$\hat{D} = | O^{(i)} b e s t - O^{(i)} |$

(17)

where $O^{(i)} b e s t$ represents the current best solution, l is a random number $[- 1 : 1]$ , and b denotes the geometry of the logarithmic spiral.
In addition to swimming in a spiral-shaped pattern around the prey, humpback whales also swim in a circle that is gradually getting smaller. The likelihood of selecting the shrinking encircling mechanism or the spiral model that updates the whale positions during optimization is fixed at 50%.

$O^{(i + 1)} = \{\begin{matrix} O_{ϱ}^{(i)} - \vec{A} . \hat{D} & i f (γ < τ), \\ {\hat{D}}^{*} . e^{b l} c o s (2 π l) + O_{b e s t}^{(i)} & Otherwise . \end{matrix}$

(18)

Based on the likelihood of choosing either a spiral or a shrinking circle model, $τ$ is chosen to be 0.5.

Algorithm 1 Entropy-Controlled Whale Optimization for Feature Selection

1:: Input:
2:: Feature set $O = {O_{1}, O_{2}, \dots, O_{N}}$
3:: Maximum iterations $I t e r_{max}$ , Population size P
4:: Spiral constant b, Probability threshold $τ = 0.5$
5:: Output:
6:: Reduced feature set $O^{*}$
7:: Initialization:
8:: 1. Generate initial population $O = [O_{1}, \dots, O_{P}]$
9:: 2. Initialize parameters:
10:: $α \leftarrow 2$ , $ϱ_{r} \sim U (0, 1)$ , $l \sim U (- 1, 1)$ , $ρ \sim U (0, 1)$
11:: 3. Compute initial fitness (Equation (21)):
12:: $f i t^{'} \leftarrow - \sum_{p = 1}^{n} η_{p} {log}_{2} η_{p}$
13:: 4. Identify $O_{best}^{(0)}$
14:: Optimization Loop:
15:: while $t \leq I t e r_{max}$ do
16:: for each search agent $O_{j} \in O$ do
17:: Update parameters (Equations (14) and (15)):
18:: $α \leftarrow 2 - \frac{2 t}{I t e r_{max}}$
19:: $\vec{A} \leftarrow 2 α ϱ_{r} - α$
20:: $\vec{C} \leftarrow 2 ϱ_{r}$
21:: Generate new $ρ, l$
22:: Exploitation phase:
23:: if $ρ < τ$ then
24:: if $| \vec{A} | \leq 1$ then
25:: Shrinking encircling (Equation (13)):
26:: $O_{j}^{(t + 1)} \leftarrow O_{best}^{(t)} - \vec{A} \cdot | \vec{C} \cdot O_{best}^{(t)} - O_{j}^{(t)} |$
27:: else
28:: Spiral update (Equation (16)):
29:: $O_{j}^{(t + 1)} \leftarrow | O_{best}^{(t)} - O_{j}^{(t)} | \cdot e^{b l} cos (2 π l) + O_{best}^{(t)}$
30:: end if
31:: else
32:: Exploration phase:
33:: Randomly select $O_{rand}$ (Equations (19) and (20))
34:: $\hat{D} \leftarrow | \vec{C} \cdot O_{rand} - O_{j}^{(t)} |$
35:: $O_{j}^{(t + 1)} \leftarrow O_{rand} - \vec{A} \cdot \hat{D}$
36:: end if
37:: Boundary check: $O_{j}^{(t + 1)} \leftarrow clip (O_{j}^{(t + 1)}, O_{min}, O_{max})$
38:: end for
39:: Compute fitness for all agents (Equation (21))
40:: Update $O_{best}^{(t)}$
41:: $t \leftarrow t + 1$
42:: end while
43:: Return: $O^{*} \leftarrow O_{best}$

Exploration: Search for prey: A similar methodology, leveraging the fluctuations of the $\vec{A}$ vector, can be employed to locate prey. Accordingly, by assigning random values to $\vec{A}$ that exceed 1 or fall below −1, we induce a deliberate deviation in the trajectory of the search agent, effectively distancing it from the reference whales. In the exploration phase, unlike the exploitation phase, the agent’s search position is adjusted based on a randomly chosen search agent rather than the best one identified so far.
This strategy accentuates the importance of exploration, thereby enabling the WOA algorithm to undertake a more exhaustive and comprehensive search throughout the solution space. The following is the mathematical relationship:

$\hat{D} = | \vec{C} . O_{b e s t}^{(i)} - O^{(i)} |$

(19)

$O^{(i + 1)} = O_{b e s t}^{(i)} - \vec{A} . \hat{D}$

(20)

The current technique calculates the fitness value based on the entropy of the probability distribution, which measures the total amount of information. The population vector used in the entropy computation provides the widest possible range of information. The fitness is determined by calculating the Shannon entropy.

f i t^{i} = - \sum_{p = 1}^{n} η_{p} l o g_{2} η_{p}

(21)

where

η_{p}

is the probability associated with the selected vector.

5. Results and Discussion

The novelty of the proposed framework resides in the image preprocessing and feature selection phases. Therefore, the results section is structured to include both the segmentation and classification phases.

In this section, we present the datasets used in this study, which include four key datasets: PH² [27], ISBI-2016 [27], ISIC- 2017 [36], and HAM10000 [37]. A detailed description of each dataset is provided in Table 1, which includes the number of classes, the number of samples per class, and the training/testing ratio. To ensure the full reproducibility of the XAI-MedNet architecture, the experimental setup was strictly constrained. All model training and evaluations were executed on a high-performance workstation equipped with an Intel Core i9 CPU, 128 GB RAM, and an NVIDIA RTX 3090 GPU (24 GB VRAM) using PyTorch (2.3)/Python (3.10) environments. To eliminate stochastic variability across independent runs and guarantee deterministic outputs, the random seeds for the network weight initializations, data shuffling, and metaheuristic population generation were fixed (e.g., random seed = 42).

5.1. Segmentation Framework

In order to assess the efficacy of the proposed contrast stretching approach for image segmentation, we evaluated it using two state-of-the-art methods: boundary-aware transformers (BAT) and a comprehensive attention network (CA-net). The pre-processing step enhances segmentation results by reducing noise and maximizing the contrast between foreground and background regions.

5.1.1. Parameter Setting and Performance Measure

The segmentation step essentially utilizes the same parameters as our earlier work [26]. The upper and lower limits are set to

[1.6 0.5 0.8 1.5]

and

[000 0.5]

, respectively. The details of the training and testing percentages are provided in Table 1. MATLAB R2023a is used to execute the proposed extended BA-ABC algorithm, while Python-based boundary estimation algorithms are used to evaluate the efficacy of the comparison algorithms. The Jaccard index and Dice coefficient are performance measures used to verify segmentation results.

J a c c a r d I n d e x = \frac{T P}{T P + F P + F N}

(22)

D i c e c o e f f i c i e n t = \frac{2 T P}{2 T P + F P + F N}

(23)

5.1.2. BAT

The BAT is the first segmentation model used to verify our pre-processing architecture. Here, we utilized the Python training environment, with the same training settings specified by [26]. The whole process begins by downscaling the images to dimensions (

224 \times 224 \times 3

), and a mini-batch size of 8. The network encoder underwent 300 epochs of tuning, and in the case of non-significant validation loss, the learning rate is reduced to

50 %

.

Figure 4 demonstrates a visual comparison of BAT segmentation results using standard datasets, and after applying the proposed contrast stretching algorithm. Similarly, Table 2 displays the results of testing the BAT model, evaluated according to specific performance metrics. With the updated skin lesion datasets, there is a substantial improvement in both the Jaccard Index and the Dice coefficient.

5.2. CA-Net

For CA-Net [26], which was also implemented using the Python environment, the adaptive moment estimation (Adam) optimizer was first utilized for training purposes with a 16 mini-batch size and 300 epochs of network training. After 256 epochs, the learning rate was halved. A visual comparison of CA-Net results after processing the original and pre-processed images is shown in Figure 5. The comprehensive attention CNN segmentation model’s experimental findings using three datasets are shown in two portions of Table 3. The preprocessed datasets produced better results for IoU and F1.

5.3. Classification Results

For skin lesion classification, the simulations are performed using three enhanced datasets and the HAM10000 dataset in its original form. The evaluation of the proposed framework is conducted utilizing two distinct setups: (1) classification results obtained using the basic setup without integrating feature fusion and selection algorithms and (2) classification results achieved by combining feature fusion with the proposed feature selection techniques.

5.3.1. Parameter Setting and Performance Measure

The classification challenge involved the utilization of three distinct families of classifiers, including k-nearest neighbors (KNN), support vector machine (SVM), and neural networks (NN). Furthermore, in order to ensure an in-depth evaluation, we performed an additional assessment of the proposed framework in combination with a range of classifiers.

The models’ learning parameters are described in Table 4. The pre-trained backbone networks (DenseNet-201, Inception-ResNet v2, and NASNet-Mobile) were fine-tuned using the AdamW optimizer with a weight decay of

1 \times 10^{- 4}

to mitigate overfitting. The initial learning rate was set to

2 \times 10^{- 4}

and modulated via a Cosine Annealing scheduler. The models were trained for a maximum of 150 epochs using a mini-batch size of 32, with an early stopping mechanism triggered after 15 epochs of stagnant validation loss. For the EBA-ABC and WOA algorithms, the maximum number of iterations was fixed at 100, with a population size of 50 and 30 agents, respectively. Similarly, the selection of classifiers was based on an in-depth analysis of prior empirical research [2,4,12,38], which consistently demonstrates superior performance compared to other groups of classifiers in this application domain.

Our primary objective here is to ensure the presence of highly discriminative feature information from the selected deep models. The serial concatenation process increases the likelihood of feature correlation due to the presence of redundant feature information. Therefore, an improved feature selection approach in conjunction with the feature fusion method eliminates the redundant features while retaining the discriminative feature information. The output dimensions and reduction percentage following the implementation of the feature fusion and selection methods are presented in Table 5.

Figure 6 compares feature vector sizes from simple concatenation and the proposed method. In the first approach, features were combined directly, increasing vector size and potentially retaining useful information. In contrast, the proposed entropy-controlled whale optimization was applied to select the most discriminative features, which were then used for classification.

Based on the results presented in Table 5, we performed simulations to determine the testing accuracies in Table 6. Three categories of classifiers are employed, chosen based on their enhanced performance in several applications. The primary performance criterion in Table 6 is accuracy, and additional performance metrics support the findings. Upon evaluating the combinations of feature vectors, it is evident that utilizing the extracted features from all pre-trained models results in enhanced accuracy, as observed in the selected cases. Based on the statistics, it can be observed that in both cases, simple fusion and the proposed framework,

[F_{D}

−

F_{N}]

yield unsatisfactory classification results. The classifiers have exhibited reduced accuracies in this combination. However, when it comes to

[F_{D}

−

F_{I}]

, the classifiers have generated significantly improved classification outcomes. Furthermore, comparing the above combinations with

[F_{I}

−

F_{N}]

, the classification accuracies show a noticeable increase. This combination has achieved the highest accuracies on the ISBI-2016 and ISIC-2017 datasets, with values of 94.56% and 95.55%, respectively. The results obtained by fusing all the features of the selected pre-trained models,

[F_{D}

−

F_{I} - F_{N}]

, are indeed appealing, particularly in the context of the proposed framework. The classifiers have achieved higher accuracies for different datasets. Specifically, the accuracy for the PH² dataset is 98.60%, for ISBI-2016 it is 96.25%, for ISIC-2017 it is 95.85%, and for HAM10000 it is 96.03%. The results clearly demonstrate the effectiveness of feature fusion combined with the proposed feature selection method.

In the context of the significance of supplementary metrics, Table 7 presents an additional set of metrics for the selected classifiers. Based on the statistics, it is evident that the Q-SVM has consistently outperformed other models on various datasets, achieving an average classification rate of 95.78%. In comparison, TL-NN achieved 95.68%, M-NN achieved 95.45%, W-KNN achieved 94.93%, and ESD achieved 94.88%. In the case of the ISBI-2016 and HAM10000 datasets, the Q-SVM classifier has been able to achieve exceptional classification accuracies of 96.25% and 96.03%, respectively. The achieved sensitivity in both cases is 0.944 and 0.960, which clearly demonstrates the ability to accurately detect positive cases and effectively reduce false negatives.

Q-SVM achieved high specificity scores of 0.982 and 0.993 on both datasets, reflecting a notable reduction in false positives. The F1 scores of 0.963 and 0.960 indicate a strong balance between precision and recall, confirming the model’s ability to make accurate and comprehensive positive predictions. Other classifiers show similar trends to Q-SVM. For example, TL-NN attained a peak accuracy of 98.60% on the PH² dataset, with a sensitivity of 0.970, specificity of 0.973, and an F1 score of 0.971, highlighting its effectiveness even under class imbalance.

The achieved results unambiguously demonstrate that the classifiers can successfully distinguish between benign and malignant samples with significantly higher accuracy, which is an essential requirement for a wide range of medical applications. The performance of the M-NN classifier on the ISIC-2017 dataset was outstanding, achieving an accuracy of 95.85%, sensitivity of 0.956, specificity of 0.960, and an F1 score of 0.958. The W-KNN and ESD classifiers also demonstrated higher accuracy compared to other selected classifiers.

To rigorously substantiate the individual contributions of each module within our overall system, the results of an ablation study comparing the base architecture against the modified architecture of each module are presented in Table 8. These two datasets were chosen in order to verify that each enhancement module generalizes well for various types of clinical problems with respect to both binary (e.g., ISBI-2016) and multi-class (e.g., ISIC-2017) classification tasks. The consistent 3–4% gain across all backbone networks suggests the robustness of the EBA-ABC enhancement modules, and the final accuracy of the combined Proposed Framework (greater than 94% for both datasets) indicates the need for a complete pipeline solution to achieve state-of-the-art skin lesion pathology classification performance.

5.3.2. Statistical Significance

Rigorous statistical validation ensures the high accuracy required for automated medical image analysis. Statistical significance must be demonstrated to adequately demonstrate that algorithmic improvements are valid, trustworthy, and therefore clinically acceptable. The one-way Analysis of Variance (ANOVA) performed on the PH² dataset, Table 9, yields a highly significant F-statistic of

F (2, 12) = 23.10

with a corresponding p-value of 0.0001. It provides robust evidence to reject the null hypothesis (

H_{0}

) that the mean accuracies of the TL-NN, M-NN, and ESD classifiers are statistically equivalent. This substantial F-ratio, Figure 7, is derived from a high between-group mean square (

M S_{b e t w e e n} = 4.8137

) relative to the residual within-group variance (

M S_{w i t h i n} = 0.2083

), indicating that the observed performance variations are primarily driven by the architectural efficacy of the models rather than stochastic training noise or data sampling bias.

Since the empirical F-statistic resides deep within the rejection region—far exceeding the critical threshold of

F_{c r i t} = 3.89

at a significance level of

α = 0.05

—the results confirm that the Trilayered Neural Network (TL-NN) achieves a statistically superior diagnostic boundary, as shown in Figure 7. The narrow variance observed in the box plot further suggests that the TL-NN model possesses high stability and generalization capability across the PH² dataset’s morphological features compared to its counterparts.

Similarly, in the case of the ISBI-2016 dataset, the one-way ANOVA yields a highly significant F-statistic of

F (2, 12) = 55.89

with a corresponding p-value of <0.0001, providing robust evidence to reject the null hypothesis (

H_{0}

), Table 10. The high between-group variance (

M S_{between} = 7.8242

) relative to the within-group variance (

M S_{within} = 0.1400

) indicates that, despite a wider performance range, the Q-SVM classifier establishes a measurably superior and stable diagnostic boundary compared to the ESD and M-NN architectures for this dataset, as shown in Figure 8.

The one-way ANOVA conducted on the ISIC-2017 dataset yielded a highly significant F-statistic of

F (2, 12) = 42.20

with a corresponding p-value of <0.0001, as shown in Table 11. This result strongly rejects the null hypothesis (

H_{0}

), confirming that the mean classification accuracies of the M-NN, TL-NN, and ESD models are statistically distinct. The substantial between-group variance (

M S_{between} = 19.7645

) relative to the within-group variance (

M S_{within} = 0.4683

) demonstrates that the M-NN architecture achieves a measurably superior diagnostic performance. Furthermore, the confidence interval box plots reveal that despite the wider variance in the M-NN model’s extreme bounds, its median and overall accuracy profile remain significantly elevated above both baseline comparators, Figure 9.

The one-way ANOVA executed on the HAM10000 dataset resulted in a highly significant F-statistic of

F (2, 12) = 17.68

with an associated p-value of

0.0003

, shown in Table 12. This provides robust statistical evidence to reject the null hypothesis (

H_{0}

), confirming that the mean accuracies among the Q-SVM, TL-NN, and ESD models are fundamentally distinct. The analysis yielded a strong between-group variance (

M S_{between} = 7.1540

) compared to the within-group variance (

M S_{within} = 0.4047

). Notably, the Q-SVM architecture demonstrated an extended upper boundary, peaking at an accuracy of

97.69 %

, Figure 10. Despite this wider overall variance, the Q-SVM’s performance distribution remains decisively elevated above the comparative architectures, establishing a statistically superior classification capability for complex skin lesion morphology in the HAM10000 cohort.

5.3.3. Assessment of Variability and External Validation

To rigorously evaluate the robustness of the proposed framework and assess performance variability across different runs, a k-fold cross-validation strategy (

k = 5

) was implemented. Rather than relying on a single deterministic train-test split, the datasets were partitioned into five mutually exclusive folds. The XAI-MedNet framework demonstrated exceptional stability across all independent runs. For instance, on the ISIC-2017 dataset, the proposed framework achieved a mean accuracy of

95.85 % \pm 0.42 %

, indicating that the model’s convergence is highly stable and minimally affected by the specific distribution of the training samples. Furthermore, to validate the generalizability of the proposed framework across diverse clinical environments (external validation), a cross-dataset testing protocol was conducted. The framework, exclusively trained on the extensive ISIC-2017 dataset, was deployed to classify the independent PH² and ISBI-2016 datasets without any subsequent retraining or weight fine-tuning. The external validation results show 94.10% accuracy for PH² and 93.87% for the ISBI-2016 datasets, indicating minimal performance degradation and confirming that the entropy-controlled feature selection method successfully isolates generalized pathological markers rather than dataset-specific artifacts.

5.3.4. Explainable AI (XAI): Evaluation of the Proposed Framework

Clinician confidence and the accurate predictive performance of artificial intelligence systems in health diagnostics rely on both high-performing predictions from the appropriate AI system and transparency and interpretability for the end-user. Results show that the proposed framework successfully implements the principles of XAI, emphasizing transparency at the input image and feature output levels.

Visual and Feature-Level Interpretability: The effectiveness of the EBA-ABC contrast enhancement module is validated visually through network attention mapping. Figure 11 illustrates the qualitative visual comparison of the decision-making processes of the various models in this study through Gradient-weighted Class Activation Mapping (Grad-CAM). Different backbones have different preferences for how they extract features from the input data: for instance, DenseNet-201 (Column 3) localizes internal dense regions of textural variation, while Inception-ResNet v2 (Column 4) captures broader morphological boundaries and uses multi-scale asymmetry as part of its feature extraction strategy. It is also noted that individual maps of the same architecture may exhibit both fragmented and diffuse activations in the lesion’s background skin. Alternatively, the map produced by the proposed concatenated model (Column 5) offers a more spatially coherent representation of activation patterns. For EBA-ABC visually enhanced images, the proposed concatenation model produced a more accurate representation of the core pathological morphology of each lesion, with minimal background noise interference. The activation heatmaps reveal that, while the model’s focus on original images is often dispersed across background noise, the enhanced images force the network’s attention to be highly localized on the actual lesion area. This provides visually interpretable outputs that align directly with dermatological assessment criteria.
The proposed framework addresses the lack of interpretability of traditional deep feature fusion methods via its use of an Entropy-controlled whale optimization algorithm (WOA), which allows for quantifying the information contributions made by each individual feature set based on Shannon’s entropy (i.e., a mathematical measure of uncertainty). For each feature in the final predictive output, the framework affords the opportunity to have an objective basis for determining which features to include or exclude, ultimately achieving significant improvements by reducing redundancy and mitigating overfitting by isolating only the most discriminative features from the inputs provided to the predictive model. While Grad-CAM visualizations provide intuitive and clinically relevant insights into model attention, it is important to acknowledge that such techniques are inherently qualitative and post-hoc. They do not necessarily capture causal relationships between input features and model predictions and may exhibit sensitivity to architectural variations and input perturbations. To address this limitation, the proposed framework complements visual explanations with entropy-controlled feature selection, which provides a quantitative measure of feature importance based on information theory. This combination enables a more structured and reliable interpretability mechanism. Nevertheless, the exploration of more causally grounded and robust explainability techniques remains an important direction for future work.
Clinical Relevance: To bridge the gap between theoretical performance and practical application, the proposed XAI-MedNet framework is designed to function seamlessly as a “human-in-the-loop” Clinical Decision Support System (CDSS). In a real-world dermatological workflow, the system operates as a secondary evaluator rather than an autonomous diagnostician. Upon clinical image acquisition, the framework first presents the dermatologist with the contrast-enhanced image, visually isolating the lesion boundaries. Subsequently, it outputs a malignancy probability alongside feature-level interpretation maps. This dual-level transparency allows the clinician to instantly cross-reference the algorithm’s internal focus with established dermoscopic criteria, empowering them to confidently validate the prediction or dismiss irrelevant artifacts prior to making a final biopsy decision.

For a fair comparison, Table 13 demonstrates existing literature for skin lesion classification on multiple datasets. Based on the results obtained, it is clear that the proposed method handles large feature vectors with exceptional accuracy.

6. Conclusions

Melanoma is a deadly form of skin cancer with an increasing incidence rate. Early diagnosis greatly improves patient survival. This study addresses both segmentation and classification, focusing on image pre-processing and feature selection. Pre-processing aims to remove noise and enhance contrast between the lesion and background. To address the “curse of dimensionality,” only the most relevant features are selected to improve discrimination. However, the framework failed to perform exceptionally well with an increased number of classes. One possible explanation is the increased correlation rate between the features, and another possibility is class imbalance. Furthermore, in the case of feature selection, the WOA algorithm faced the problem of premature convergence and an imbalance between exploration and exploitation. Therefore, as future work, the image samples will be balanced across classes using data augmentation techniques or by implementing generative adversarial networks. Moreover, the selection process will be optimized by considering dynamic parameter adjustment and hybridization. Furthermore, integrating entropy-based feature selection with visually interpretable contrast enhancement enhances the framework’s explainability, promoting transparent decision-making in medical imaging applications.

While the current study utilizes publicly available benchmark repositories, the evaluation across four distinct datasets from varying international clinical centers provides a robust form of cross-institutional validation. However, we acknowledge that these retrospective data sources may not fully represent the ’noise’ of unstandardized real-world clinical environments. Future work will focus on testing the framework in real clinical settings using non-standardized images and diverse skin types to better reflect real-world conditions. Although Grad-CAM provides useful visual explanations, it is still qualitative and limited. Therefore, we plan to explore more reliable explainability methods, such as causal and concept-based approaches. These improvements will help make the system more trustworthy and useful in clinical practice.

Author Contributions

Conceptualization, A.A. and T.A.; Methodology, A.A. and T.A.; Software, T.A., Y.N.A., M.A.A. and I.A.; Validation, A.A. and T.A.; Formal analysis, Y.N.A., M.A.A. and I.A.; Investigation, A.A., Y.N.A., M.A.A. and I.A.; Resources, A.A. and Y.N.A.; Data curation, M.A.A.; Writing—original draft, A.A. and T.A.; Writing—review & editing, A.A., T.A., Y.N.A., M.A.A. and I.A.; Visualization, A.A., T.A. and I.A.; Supervision, I.A.; Project administration, T.A., M.A.A. and I.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to Prince Sattam bin Abdulaziz University for funding this research work through the project number (PSAU/2024/03/31876).

Institutional Review Board Statement

Publicly accessible datasets were used in this study. No additional human-involved studies were involved, the Institutional Review Board Statement is not applicable.

Informed Consent Statement

Publicly accessible datasets were used in this study. No additional human-involved studies were involved, the Informed Consent Statement is not applicable.

Data Availability Statement

The datasets utilized in this study-including HAM10000, ISIC-2017, ISBI-2016, and PH²-were retrieved from the Kaggle repository and the official International Skin Imaging Collaboration (ISIC) archives. These datasets are publicly accessible and open to the research community for academic use.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lomas, A.; Leonardi-Bee, J.; Bath-Hextall, F. A systematic review of worldwide incidence of nonmelanoma skin cancer. Br. J. Dermatol. 2012, 166, 1069–1080. [Google Scholar] [CrossRef] [PubMed]
Ajmal, M.; Khan, M.A.; Akram, T.; Alqahtani, A.; Alhaisoni, M.; Armghan, A.; Althubiti, S.A.; Alenezi, F. BF2SkNet: Best deep learning features fusion-assisted framework for multiclass skin lesion classification. Neural Comput. Appl. 2023, 35, 22115–22131. [Google Scholar] [CrossRef]
Sanyal, S.; Andrew, R.; Graham-Durand, D.; Kotbagi, S.; Austrie, B. Keratinizing Squamous Cell Carcinoma Masquerading as Basal Cell Carcinoma Constituting a Diagnostic Pitfall: A Case Report with Etiopathogenetic Discourse and Mohs Micrographic Surgical Management. Cureus 2026, 18, e104371. [Google Scholar] [CrossRef] [PubMed]
Akram, T.; Junejo, R.; Alsuhaibani, A.; Rafiullah, M.; Akram, A.; Almujally, N.A. Precision in Dermatology: Developing an Optimal Feature Selection Framework for Skin Lesion Classification. Diagnostics 2023, 13, 2848. [Google Scholar] [CrossRef]
Alruwaili, M.; Mohamed, M. An integrated deep learning model with EfficientNet and ResNet for accurate multi-class skin disease classification. Diagnostics 2025, 15, 551. [Google Scholar] [CrossRef] [PubMed]
Rashad, N.M.; Abdelnapi, N.M.; Seddik, A.F.; Sayedelahl, M. Automating skin cancer screening: A deep learning. J. Eng. Appl. Sci. 2025, 72, 6. [Google Scholar] [CrossRef]
Gabani, V.; Navamani, T.; Shyamala, K.; Rajpal, V.K.V. Multimodal skin lesion classification for early cancer diagnosis using deep learning. Front. Physiol. 2026, 17, 1717517. [Google Scholar] [CrossRef]
Rajpara, S.; Botello, A.; Townend, J.; Ormerod, A. Systematic review of dermoscopy and digital dermoscopy/artificial intelligence for the diagnosis of melanoma. Br. J. Dermatol. 2009, 161, 591–604. [Google Scholar] [CrossRef]
Akram, T.; Alsuhaibani, A.; Khan, M.A.; Khan, S.U.; Naqvi, S.R.; Bilal, M. Dermo-Optimizer: Skin Lesion Classification Using Information-Theoretic Deep Feature Fusion and Entropy-Controlled Binary Bat Optimization. Int. J. Imaging Syst. Technol. 2024, 34, e23172. [Google Scholar] [CrossRef]
Pixels to Classes: Intelligent Learning Framework for Multiclass Skin Lesion Localization and Classification. Comput. Electr. Eng. 2021, 90, 106956. [CrossRef]
Khan, M.A.; Sharif, M.; Akram, T.; Bukhari, S.A.C.; Nayak, R.S. Developed Newton-Raphson based deep features selection framework for skin lesion recognition. Pattern Recognit. Lett. 2020, 129, 293–303. [Google Scholar] [CrossRef]
Khan, M.A.; Muhammad, K.; Sharif, M.; Akram, T.; Albuquerque, V.H.C.d. Multi-Class Skin Lesion Detection and Classification via Teledermatology. IEEE J. Biomed. Health Inform. 2021, 25, 4267–4275. [Google Scholar]
Malik, S.; Akram, T.; Ashraf, I.; Rafiullah, M.; Ullah, M.; Tanveer, J. A Hybrid Preprocessor DE-ABC for Efficient Skin-Lesion Segmentation with Improved Contrast. Diagnostics 2022, 12, 2625. [Google Scholar]
Roman, A.; Rahman, M.M.; Haider, S.A.; Akram, T.; Naqvi, S.R. Integrating Feature Selection and Deep Learning: A Hybrid Approach for Smart Agriculture Applications. Algorithms 2025, 18, 222. [Google Scholar] [CrossRef]
Khan, M.A.; Akram, T.; Sharif, M.; Saba, T.; Javed, K.; Lali, I.U.; Tanik, U.J.; Rehman, A. Construction of saliency map and hybrid set of features for efficient segmentation and classification of skin lesion. Microsc. Res. Tech. 2019, 82, 741–763. [Google Scholar] [CrossRef] [PubMed]
Khan, M.A.; Sharif, M.; Akram, T.; Damaševičius, R.; Maskeliūnas, R. Skin Lesion Segmentation and Multiclass Classification Using Deep Learning Features and Improved Moth Flame Optimization. Diagnostics 2021, 11, 811. [Google Scholar] [CrossRef] [PubMed]
Khan, M.A.; Akram, T.; Zhang, Y.D.; Sharif, M. Attributes based skin lesion detection and recognition: A mask RCNN and transfer learning-based deep learning framework. Pattern Recognit. Lett. 2021, 143, 58–66. [Google Scholar] [CrossRef]
Xie, Y.; Zhang, J.; Xia, Y.; Shen, C. A Mutual Bootstrapping Model for Automated Skin Lesion Segmentation and Classification. IEEE Trans. Med. Imaging 2020, 39, 2482–2493. [Google Scholar] [CrossRef]
Mehta, D.; Primiero, C.; Betz-Stablein, B.; Nguyen, T.D.; Gal, Y.; Bowling, A.; Haskett, M.; Sashindranath, M.; Bonnington, P.; Mar, V.; et al. Multi-task AI models in dermatology: Overcoming critical clinical translation challenges for enhanced skin lesion diagnosis. J. Eur. Acad. Dermatol. Venereol. 2025, 39, 2121–2133. [Google Scholar] [CrossRef]
Miglani, V.; Bhatia, M. Skin lesion classification: A transfer learning approach using efficientnets. In Advanced Machine Learning Technologies and Applications: Proceedings of AMLTA 2020; Springer: Singapore, 2020; pp. 315–324. [Google Scholar]
Cano, E.; Mendoza-Avilés, J.; Areiza, M.; Guerra, N.; Mendoza-Valdés, J.L.; Rovetto, C.A. Multi skin lesions classification using fine-tuning and data-augmentation applying NASNet. PeerJ Comput. Sci. 2021, 7, e371. [Google Scholar] [PubMed]
Aziz, S.; Bilal, M.; Khan, M.U.; Amjad, F. Deep learning-based automatic morphological classification of leukocytes using blood smears. In Proceedings of the 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Istanbul, Turkey, 12–13 June 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
Afza, F.; Sharif, M.; Khan, M.A.; Tariq, U.; Yong, H.S.; Cha, J. Multiclass skin lesion classification using hybrid deep features selection and extreme learning machine. Sensors 2022, 22, 799. [Google Scholar] [CrossRef]
Aruk, I.; Pacal, I.; Toprak, A.N. A novel hybrid ConvNeXt-based approach for enhanced skin lesion classification. Expert Syst. Appl. 2025, 283, 127721. [Google Scholar] [CrossRef]
Noman, A.; Beiji, Z.; Zhu, C.; Al-Habib, M.; Al-asri, A. GAMFuse: Graph-based Adaptive Multiscale Feature Fusion for few-shot skin lesion classification. Biomed. Signal Process. Control 2026, 119, 109990. [Google Scholar] [CrossRef]
Malik, S.; Akram, T.; Awais, M.; Khan, M.A.; Hadjouni, M.; Elmannai, H.; Alasiry, A.; Marzougui, M.; Tariq, U. An Improved Skin Lesion Boundary Estimation for Enhanced-Intensity Images Using Hybrid Metaheuristics. Diagnostics 2023, 13, 1285. [Google Scholar] [CrossRef] [PubMed]
Malik, S.; Islam, S.R.; Akram, T.; Naqvi, S.R.; Alghamdi, N.S.; Baryannis, G. A novel hybrid meta-heuristic contrast stretching technique for improved skin lesion segmentation. Comput. Biol. Med. 2022, 151, 106222. [Google Scholar] [CrossRef] [PubMed]
Optimized Binary Bat algorithm for classification of white blood cells. Measurement 2019, 143, 180–190. [CrossRef]
Ahmed, B.; Akram, T.; Naqvi, S.R.; Alsuhaibani, A.; Altherwy, Y.N.; Masud, U. A Novel Deep Learning Framework with Meta-Heuristic Feature Selection for Enhanced Remote Sensing Image Classification. IEEE Access 2024, 12, 91974–91998. [Google Scholar] [CrossRef]
Yang, X.S. A New Metaheuristic Bat-Inspired Algorithm. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2010); Springer: Berlin/Heidelberg, Germany, 2010; pp. 65–74. [Google Scholar]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Iandola, F.; Moskewicz, M.; Karayev, S.; Girshick, R.; Darrell, T.; Keutzer, K. Densenet: Implementing efficient convnet descriptor pyramids. arXiv 2014, arXiv:1404.1869. [Google Scholar]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 1492–1500. [Google Scholar]
Li, Y.; Li, T.; Liu, H. Recent advances in feature selection and its applications. Knowl. Inf. Syst. 2017, 53, 551–577. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Hasan, M.K.; Elahi, M.T.E.; Alam, M.A.; Jawad, M.T.; Martí, R. DermoExpert: Skin lesion classification using a hybrid convolutional neural network through segmentation, transfer learning, and augmentation. Inform. Med. Unlocked 2022, 28, 100819. [Google Scholar] [CrossRef]
Alenezi, F.; Armghan, A.; Polat, K. Wavelet transform based deep residual neural network and ReLU based Extreme Learning Machine for skin lesion classification. Expert Syst. Appl. 2023, 213, 119064. [Google Scholar] [CrossRef]
Akram, T.; Lodhi, H.M.J.; Naqvi, S.R.; Naeem, S.; Alhaisoni, M.; Ali, M.; Haider, S.A.; Qadri, N.N. A multilevel features selection framework for skin lesion classification. Hum.-Centric Comput. Inf. Sci. 2020, 10, 12. [Google Scholar] [CrossRef]
Haque, M.M.; Akter, R.; Akib, A.; Hasib, A. A Deep Learning Approach for Automated Skin Lesion Diagnosis with Explainable AI. arXiv 2026, arXiv:2601.00964. [Google Scholar] [CrossRef]
Padhy, S.; Dash, S.; Kumar, N.; Singh, S.P.; Kumar, G.; Moral, P. Temporal integration of ResNet features with LSTM for enhanced skin lesion classification. Results Eng. 2025, 25, 104201. [Google Scholar] [CrossRef]
Song, L.; Wang, H.; Wang, Z.J. Decoupling multi-task causality for improved skin lesion segmentation and classification. Pattern Recognit. 2024, 133, 108995. [Google Scholar] [CrossRef]
Alhudhaif, A.; Almaslukh, B.; Aseeri, A.O.; Guler, O.; Polat, K. A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks. Chaos Solitons Fractals 2023, 170, 113409. [Google Scholar] [CrossRef]
Benyahia, S.; Meftah, B.; Lézoray, O. Multi-features extraction based on deep learning for skin lesion classification. Tissue Cell 2022, 74, 101701. [Google Scholar] [CrossRef]
Nakai, K.; Chen, Y.W.; Han, X.H. Enhanced deep bottleneck transformer model for skin lesion classification. Biomed. Signal Process. Control 2022, 78, 103997. [Google Scholar] [CrossRef]
Ding, S.; Wu, Z.; Zheng, Y.; Liu, Z.; Yang, X.; Yang, X.; Yuan, G.; Xie, J. Deep attention branch networks for skin lesion classification. Comput. Methods Programs Biomed. 2021, 212, 106447. [Google Scholar] [CrossRef] [PubMed]
Calderón, C.; Sanchez, K.; Castillo, S.; Arguello, H. BILSK: A bilinear convolutional neural network approach for skin lesion classification. Comput. Methods Programs Biomed. Update 2021, 1, 100036. [Google Scholar] [CrossRef]
Hameed, N.; Shabut, A.M.; Ghosh, M.K.; Hossain, M.A. Multi-class multi-level classification algorithm for skin lesions classification using machine learning techniques. Expert Syst. Appl. 2020, 141, 112961. [Google Scholar] [CrossRef]

Figure 1. Comparative visualization of skin lesion diversity across benchmark dermoscopy datasets: (Top-Left) PH², (Top-Right) ISBI-2016, (Bottom-Left) ISIC-2017, and (Bottom-Right) HAM10000.

Figure 2. The proposed system architecture for interpretable skin lesion classification, integrating contrast enhancement, multi-CNN feature fusion, and entropy-controlled feature selection with Explainable AI (XAI).

Figure 3. Procedural workflow of the proposed contrast enhancement technique, combining HSL color space transformation with Improved Bat and Entropy-Controlled ABC optimization.

Figure 4. Comparative demonstration of EBA-ABC preprocessing for improving the alignment of BAT segmentation masks with clinical ground truth.

Figure 5. Comparative demonstration of EBA-ABC preprocessing for improving the alignment of CA-net segmentation masks with clinical ground truth.

Figure 6. Comparative analysis of original vs. selected feature counts-showcasing the significant reduction attained through the proposed feature selection framework.

Figure 7. Performance evaluation on the PH² dataset: (Left) Confidence intervals for three top-performing classifiers; (Right) F-distribution test for the leading classifier.

Figure 8. Performance evaluation on the ISBI-2016 dataset: (Left) Confidence intervals for three top-performing classifiers; (Right) F-distribution test for the leading classifier.

Figure 9. Performance evaluation on the ISIC-2017 dataset: (Left) Confidence intervals for three top-performing classifiers; (Right) F-distribution test for the leading classifier.

Figure 10. Performance evaluation on the HAM10000 dataset: (Left) Confidence intervals for three top-performing classifiers; (Right) F-distribution test for the leading classifier.

Figure 11. Qualitative visual comparison of skin lesion interpretability using Gradient-weighted Class Activation Mapping (Grad-CAM). The columns, from left to right, display: (1) Original dermoscopic skin lesion images; (2) Images enhanced via the proposed Ext. BA-ABC technique; (3) Grad-CAM localizations generated by the NASNet backbone; (4) Grad-CAM localizations generated by the DenseNet-201 backbone; (5) Grad-CAM localizations generated by the Inception-ResNet v2 backbone; and (6) The proposed model’s interpretive Grad-CAM.

Table 1. Summary of class distribution, total dataset sizes, and the corresponding 80:20 partition ratios for the training and testing phases across all evaluated skin lesion datasets.

Dataset	Class	Samples	Total	Training	Testing
PH²	Benign	160	200	160	40
PH²	Malignant	40	200	160	40
ISBI-2016	Benign	1006	1279	1023	256
ISBI-2016	Malignant (Melanoma)	273	1279	1023	256
ISIC-2017	Nevus	1372	2000	1600	400
	Melanoma	374
	Seborrheic Keratosis	254
HAM10000	Melanocytic Nevi (nv)	6705	10,015	8012	2003
	Melanoma (mel)	1113
	Benign Keratosis-like Lesions (bkl)	1099
	Basal Cell Carcinoma (bcc)	514
	Actinic Keratoses (akiec)	327
	Vascular Lesions (vasc)	142
	Dermatofibroma (df)	115

Table 2. Comparative IoU and F1-scores for the Bat Framework-illustrating the influence of the preprocessing stage on the selected datasets.

Algorithm	Status	Dataset	IoU (%)	F1-Score (%)
Bat Framework	Without Preprocessor	PH²	85.4	93.1
		ISBI-2016	85.2	92.0
		ISIC-2017	86.2	92.6
	With Preprocessor	PH²	86.2	93.3
		ISBI-2016	87.3	93.9
		ISIC-2017	87.9	93.7

Table 3. Comparative IoU and F1-scores for the CA-net model-illustrating the influence of the preprocessing stage on the selected datasets.

Algorithm	Status	Dataset	IoU (%)	F1-Score (%)
Bat Framework	Without Preprocessor	PH²	86.2	93.8
		ISBI-2016	85.8	92.4
		ISIC-2017	89.2	94.3
	With Preprocessor	PH²	87.5	94.2
		ISBI-2016	87.7	93.6
		ISIC-2017	88.9	94.1

Table 4. Training parameters for DenseNet201, Inception-ResNet v2, and NasNet-Mobile models.

Parameter	Value	Parameter	Value
Training function	SGD	Stride	1
Activation Function	ReLu	Execution Environment	Auto
Mini Batch Size	32	Max. Epochs	150
Validation Frequency	15	Learning parameter	1 × 10⁻⁴
Loss function	Cross Entropy	DropOut Rate	0.1

Table 5. Comparison of feature vector sizes and reduction ratios achieved through the proposed selection method on four benchmark datasets.

Vector Fusion	Input Dimension	Output Dimension	Red. Percentage (%)
PH²
$[F_{D}$ $F_{I}]$	140 × 3456	140 × 592	83
$[F_{I}$ $F_{N}]$	140 × 2592	140 × 612	77
$[F_{D}$ $F_{N}]$	140 × 2976	140 × 571	81
$[F_{D}$ $F_{I}$ $F_{N}]$	140 × 4512	140 × 970	79
ISBI-2016
$[F_{D}$ $F_{I}]$	900 × 3456	900 × 795	77
$[F_{I}$ $F_{N}]$	900 × 2592	900 × 804	69
$[F_{D}$ $F_{N}]$	900 × 2976	900 × 803	73
$[F_{D}$ $F_{I}$ $F_{N}]$	900 × 4512	900 × 948	79
ISIC-2017
$[F_{D}$ $F_{I}]$	1400 × 3456	1400 × 1106	68
$[F_{I}$ $F_{N}]$	1400 × 2592	1400 × 1011	61
$[F_{D} F_{N}]$	1400 × 2976	1400 × 1071	64
$[F_{D}$ $F_{I}$ $F_{N}]$	1400 × 4512	1354 × 925	70
HAM10000
$[F_{D}$ $F_{I}]$	7000 × 3456	7000 × 1210	65
$[F_{I}$ $F_{N}]$	7000 × 2592	7000 × 1063	59
$[F_{D}$ $F_{N}]$	7000 × 2976	7000 × 1161	61
$[F_{D}$ $F_{I}$ $F_{N}]$	7000 × 4512	1400 × 1489	67

Table 6. Quantitative comparison of OA (%) between simple fusion and the proposed framework using multiple classifiers and distinct vector fusion strategies.

Vector Fusion	OA (%)
	Simple Feature Fusion					Proposed Framework
	TL-NN	Q-SVM	M-NN	ESD	W-KNN	TL-NN	Q-SVM	M-NN	ESD	W-KNN
PH²
$[F_{D}$ $F_{I}]$	95.16	96.37	91.41	92.16	93.64	96.00	97.00	90.10	91.15	96.20
$[F_{I}$ $F_{N}]$	89.16	90.26	87.18	94.12	93.17	95.16	93.25	91.26	91.20	93.66
$[F_{D}$ $F_{N}]$	81.20	88.61	82.00	87.21	83.00	91.62	89.40	87.60	93.76	92.11
$[F_{D}$ $F_{I}$ $F_{N}]$	93.16	92.00	94.62	97.75	93.17	98.60 *	95.20	98.05	96.60	91.38
ISBI-2016
$[F_{D}$ $F_{I}]$	87.10	81.64	90.40	93.45	86.15	90.18	88.70	90.64	88.42	92.18
$[F_{I}$ $F_{N}]$	90.65	87.28	90.45	86.58	89.45	94.56	92.95	88.67	91.48	82.64
$[F_{D}$ $F_{N}]$	83.16	78.16	81.40	83.10	84.25	84.55	89.70	88.70	84.62	84.55
$[F_{D}$ $F_{I}$ $F_{N}]$	91.30	94.55	90.52	92.18	88.64	92.88	96.25 *	93.78	94.17	94.85
ISIC-2017
$[F_{D}$ $F_{I}]$	87.34	81.50	88.64	84.62	83.62	87.20	93.86	87.52	88.50	88.34
$[F_{I}$ $F_{N}]$	88.50	83.50	83.78	87.52	88.64	95.55	91.64	90.42	89.56	92.24
$[F_{D}$ $F_{N}]$	71.80	67.25	77.90	79.65	83.40	83.17	79.30	80.52	78.64	83.89
$[F_{D}$ $F_{I}$ $F_{N}]$	88.42	85.49	93.68	90.47	91.63	93.67	91.42	95.85 *	91.88	94.50
HAM10000
$[F_{D}$ $F_{I}]$	82.10	84.34	80.62	87.91	86.42	88.76	88.59	90.13	83.64	82.33
$[F_{I}$ $F_{N}]$	84.88	86.48	87.98	83.14	84.48	83.64	85.56	81.46	88.59	83.64
$[F_{D}$ $F_{N}]$	81.33	78.64	80.64	65.47	68.55	77.48	71.66	81.69	78.41	65.23
$[F_{D}$ $F_{I}$ $F_{N}]$	88.56	81.42	94.16	90.16	79.47	94.01	96.03 *	93.62	93.91	94.17

* shows the best classification accuracy achieved with the given configuration.

Table 7. Evaluation of the proposed framework’s efficacy using different classifiers across benchmark datasets in terms of multiple performance indicators.

Classifier	Dataset				Performance Measure
Classifier	I	II	III	IV	Accuracy (%)	Sen	Spe	FNR	FPR	F1
Q-SVM	✓				97.01	0.970	0.973	0.030	0.029	0.971
		✓			96.25	0.944	0.982	0.055	0.017	0.963
			✓		93.86	0.935	0.941	0.064	0.058	0.938
				✓	96.03	0.960	0.993	-	-	0.960
TL-NN	✓				98.60	1.000	0.971	0.000	0.028	0.985
		✓			94.56	0.939	0.959	0.060	0.048	0.945
			✓		95.55	0.957	0.953	0.042	0.046	0.955
				✓	94.01	0.940	0.990	-	-	0.940
M-NN	✓				98.05	0.976	0.984	0.023	0.015	0.980
		✓			93.77	0.934	0.940	0.065	0.059	0.938
			✓		95.85	0.956	0.960	0.043	0.039	0.958
				✓	94.16	94.17	0.990	-	-	0.941
ESD	✓				97.75	0.975	0.979	0.024	0.020	0.977
		✓			94.18	0.935	0.947	0.064	0.052	0.942
			✓		93.68	0.933	0.939	0.066	0.060	0.937
				✓	93.91	0.927	0.989	-	-	0.939
W-KNN	✓				96.20	0.964	0.959	0.035	0.040	0.961
		✓			94.85	0.938	0.959	0.061	0.040	0.949
			✓		94.50	0.949	0.940	0.050	0.059	0.944
				✓	94.17	0.940	0.990	-	-	0.940

Table 8. Ablation analysis showing the impact of the proposed EBA-ABC enhancement module on standard baseline models, culminating in the final Proposed Framework performance for the ISBI-2016 and ISIC-2017 datasets.

Dataset	Backbone/Configuration	Performance Metrics
Dataset	Backbone/Configuration	Accuracy (%)	Sensitivity	Specificity	F1-Score
ISBI-2016	DenseNet-201
	Baseline Model	89.50	0.878	0.902	0.886
	Baseline + Enhancement	92.40	0.910	0.931	0.918
	Inception-ResNet v2
	Baseline Model	88.80	0.869	0.895	0.879
	Baseline + Enhancement	91.60	0.902	0.923	0.908
	NASNet-Mobile
	Baseline Model	86.90	0.845	0.878	0.855
	Baseline + Enhancement	89.80	0.881	0.904	0.889
	Proposed Framework	96.25	0.944	0.982	0.963
ISIC-2017	DenseNet-201
	Baseline Model	88.40	0.865	0.893	0.874
	Baseline + Enhancement	91.20	0.898	0.920	0.906
	Inception-ResNet v2
	Baseline Model	87.70	0.858	0.886	0.868
	Baseline + Enhancement	90.60	0.890	0.914	0.898
	NASNet-Mobile
	Baseline Model	85.50	0.834	0.866	0.845
	Baseline + Enhancement	88.50	0.869	0.895	0.877
	Proposed Framework	95.85	0.956	0.960	0.956

Table 9. Statistical validation of classification accuracy using one-way ANOVA for the PH² dataset.

Source	SS	df	MS	F-Statistic	Prob > F
Between Classifiers	9.6274	2	4.8137	23.1046	0.0001
Within Classifiers (Error)	2.5001	12	0.2083	–	–
Total	12.1275	14	–	–	–

Table 10. Statistical validation of classification accuracy using one-way ANOVA for the ISBI-2016 dataset.

Source	SS	df	MS	F-Statistic	Prob > F
Between Classifiers	15.6484	2	7.8242	55.8885	<0.0001
Within Classifiers (Error)	1.6800	12	0.1400	–	–
Total	17.3284	14	–	–	–

Table 11. Statistical validation of classification accuracy using one-way ANOVA for the ISIC-2017 dataset.

Source	SS	df	MS	F-Statistic	Prob > F
Between Classifiers	39.5290	2	19.7645	42.2048	<0.0001
Within Classifiers (Error)	5.6196	12	0.4683	–	–
Total	45.1486	14	–	–	–

Table 12. Statistical validation of classification accuracy using one-way ANOVA for the HAM10000 dataset.

Source	SS	df	MS	F-Statistic	Prob > F
Between Classifiers	14.3080	2	7.1540	17.6766	0.0003
Within Classifiers (Error)	4.8566	12	0.4047	–	–
Total	19.1646	14	–	–	–

Table 13. Performance comparison of the proposed framework with existing state-of-the-art skin lesion classification methods.

Author (Year)	Problem Type	Dataset	Accuracy
Haque et al. [39] (2026)	Classification	HAM10000	91.15%
Padhy et al. [40] (2025)	-	HAM10000	94.23%
Aruk et al. [24] (2025)	-	HAM10000	94.30%
Hu et al. [41] (2024)	-	ISIC-2017	94.0%
Song et al. [41] (2023)	-	ISIC-2017	95.6%
Alenezi et al. [37] (2023)	-	HAM10000	95.73%
Alhudhaif et al. [42] (2023)	-	HAM10000	95.94%
Benyahia et al. [43] (2022)	-	PH²	98.70%
Nakai et al. [44] (2022)	-	ISIC-2017, HAM10000	92.1%, 95.84%
M.K.Hasan et al. [36] (2022)	-	ISBI-2016 & ISIC-2017	92% & 93.1%
Ding et al. [45] (2021)	-	ISIC-2017	92.2 (AUC)%
Calderón et al. [46] (2021)	-	HAM10000	93.21%
Khan et al. [11] (2020)	-	ISBI-2016 & ISIC-2017	94.5% & 93.4%
Hameed et al. [47] (2020)	-	ISBI-2016	96.15%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alabduljabbar, A.; Akram, T.; Altherwy, Y.N.; Akram, M.A.; Ashraf, I. XAI-MedNet: A Next-Generation Explainable AI Framework for Contrast-Enhanced Skin Lesion Classification via Entropy-Controlled Optimization. Bioengineering 2026, 13, 506. https://doi.org/10.3390/bioengineering13050506

AMA Style

Alabduljabbar A, Akram T, Altherwy YN, Akram MA, Ashraf I. XAI-MedNet: A Next-Generation Explainable AI Framework for Contrast-Enhanced Skin Lesion Classification via Entropy-Controlled Optimization. Bioengineering. 2026; 13(5):506. https://doi.org/10.3390/bioengineering13050506

Chicago/Turabian Style

Alabduljabbar, Abdulrahman, Tallha Akram, Youssef N. Altherwy, Muhammad Adeel Akram, and Imran Ashraf. 2026. "XAI-MedNet: A Next-Generation Explainable AI Framework for Contrast-Enhanced Skin Lesion Classification via Entropy-Controlled Optimization" Bioengineering 13, no. 5: 506. https://doi.org/10.3390/bioengineering13050506

APA Style

Alabduljabbar, A., Akram, T., Altherwy, Y. N., Akram, M. A., & Ashraf, I. (2026). XAI-MedNet: A Next-Generation Explainable AI Framework for Contrast-Enhanced Skin Lesion Classification via Entropy-Controlled Optimization. Bioengineering, 13(5), 506. https://doi.org/10.3390/bioengineering13050506

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

XAI-MedNet: A Next-Generation Explainable AI Framework for Contrast-Enhanced Skin Lesion Classification via Entropy-Controlled Optimization

Abstract

1. Introduction

2. Literature Review

3. Problem Statement and Contributions

4. Proposed Framework

4.1. Extended BA-ABC Algorithm

4.1.1. Bat Algorithm

4.1.2. Artificial Bee Colony Algorithm

4.2. Feature Fusion

4.3. Feature Selection

5. Results and Discussion

5.1. Segmentation Framework

5.1.1. Parameter Setting and Performance Measure

5.1.2. BAT

5.2. CA-Net

5.3. Classification Results

5.3.1. Parameter Setting and Performance Measure

5.3.2. Statistical Significance

5.3.3. Assessment of Variability and External Validation

5.3.4. Explainable AI (XAI): Evaluation of the Proposed Framework

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI