Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods

Zahid, Maryem; Rziza, Mohammed; Alaoui, Rachid

doi:10.3390/biomedinformatics5030041

Open AccessArticle

Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods

by

Maryem Zahid

^1,*,

Mohammed Rziza

¹

and

Rachid Alaoui

²

¹

LRIT Laboratory, Faculty of Sciences, Mohammed V University in Rabat, BP 1014, Rabat 100190, Morocco

²

Higher School of Technology, Mohammed V University in Rabat, BP 227, Salé 11000, Morocco

^*

Author to whom correspondence should be addressed.

BioMedInformatics 2025, 5(3), 41; https://doi.org/10.3390/biomedinformatics5030041

Submission received: 7 June 2025 / Revised: 9 July 2025 / Accepted: 14 July 2025 / Published: 16 July 2025

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a hybrid method for skin lesion classification combining deep learning features with conventional descriptors such as HOG, Gabor, SIFT, and LBP. Feature extraction was performed by extracting features of interest within the tumor area using suggested fusion methods. We tested and compared features obtained from different deep learning models coupled to HOG-based features. Dimensionality reduction and performance improvement were achieved by Principal Component Analysis, after which SVM was used for classification. The compared methods were tested on the reference database skin cancer-malignant-vs-benign. The results show a significant improvement in terms of accuracy due to complementarity between the conventional and deep learning-based methods. Specifically, the addition of HOG descriptors led to an accuracy increase of 5% for EfficientNetB0, 7% for ResNet50, 5% for ResNet101, 1% for NASNetMobile, 1% for DenseNet201, and 1% for MobileNetV2. These findings confirm that feature fusion significantly enhances performance compared to the individual application of each method.

Keywords:

skin lesion; feature fusion; deep learning-based features; classical methods-based features; PCA; SVM

1. Introduction

Melanoma is one of the deadliest types of skin cancer, and timely detection is essential to reduce mortality. Skin lesion analysis has gained in popularity as a diagnostic method. However, traditional image analysis approaches face significant constraints in terms of sensitivity, specificity, and overall performance. With the advent of machine learning and deep learning, new approaches have been proposed to improve skin lesion classification. Traditional approaches such as Histogram of Oriented Gradients (HOG) [1], Scale-Invariant Feature Transform (SIFT) [2], Gabor, and Local Binary Pattern (LBP) have proven their ability to extract structural features from skin lesions. In contrast, convolutional neural networks (CNNs) such as EfficientNetB0, ResNet50-101 [3], NASNetMobile [4], DenseNet201 [5], and MobileNetV2 [6] can automatically and efficiently extract deep hierarchical features from raw images, often leading to superior classification performance. While these approaches have shown promise individually, there is growing interest in hybrid models that leverage both handcrafted and learned features to maximize classification accuracy. The main objective of this study is to investigate whether combining features extracted from traditional methods and deep learning models can enhance skin lesion detection and classification performance compared to using either method alone. In this work, we combine the strengths of both approaches by using the best features obtained from traditional approaches and those obtained by deep learning. To manage feature dimensionality and reduce redundancy, we apply Principal Component Analysis (PCA), and we use Support Vector Machine (SVM) for the final classification. We evaluate the performance of our hybrid model using common metrics such as accuracy, recall, precision, and F1-score, on a publicly available skin lesion image dataset. The study in this paper reveals the power of hybrid techniques, demonstrating that a combination of deep learning-based techniques and conventional feature extraction techniques can enhance skin lesion detection and classification accuracy. This paper is structured as follows:

Section 2: Presents an overview of the related work in skin lesion classification.
Section 3: Describes the proposed methodology, conventional feature extraction, feature extraction based on deep learning, and fusion feature.
Section 4: Presents the experimental findings, with a comprehensive discussion.
Section 5: Presents the conclusion of this study and suggests directions for future research.

2. Related Work

In this section, we present methods for classifying medical images, particularly those of skin lesions, starting with classical approaches. These classical approaches are based on various feature extraction methods such as Histogram of Oriented Gradients, Scale-Invariant Feature Transform, wavelet transform, Gabor filters, Local Binary Patterns, and Speeded-Up Robust Features (SURF), which capture discriminative information essential for the classification of dermoscopic images. We then discuss deep learning-based approaches, including architectures such as VGG16, GRU, ResNet50, ResNet101, DenseNet201, NASNetMobile, MobileNetV2, and EfficientNetB0, developed to improve early detection of skin cancer. Finally, we present methods based on model fusion, which involve combining the best classical method with deep learning approaches. This fusion exploits the strengths of each method, and the results show that it significantly improves classification accuracy.

The Histogram of Oriented Gradients descriptor has been widely applied to capture edge information. Singh et al. (2022) [7] proposed a methodology that uses HOG features from retinal fundus images and compared the performance of five machine learning techniques to classify images as either glaucomatous or healthy. In addition to HOG, other local feature descriptors like LBP and SURF have proven effective in biomedical applications. LBP is particularly suitable for capturing fine-grained textures in skin images, making it relevant for lesion detection. SURF, on the other hand, provides robust keypoint detection and description, which can enhance diagnostic accuracy by identifying distinct visual patterns in dermoscopic images. Gabor filters have also been widely used for texture extraction due to their ability to capture texture information effectively. Stockmeier et al. (2009) [8] applied filter-based image classification methods to skin lesion images obtained by two different recording systems. The aim was to distinguish between different malignant and benign diseases. With the rise of artificial intelligence, traditional techniques have been gradually complemented and, in many cases, surpassed by deep learning models. Ajel et al. (2023) [9] offer dermatologist-level skin cancer classification using a residual network (ResNet-50) as a convolutional neural network that maps images to class labels to automatically detect benign and malignant skin images. Subsequent research has examined more sophisticated models like NASNetMobile, which uses a two-phase training approach to improve predictive accuracy and stabilize convolutional feature extraction. To inspect skin lesions and identify skin cancer, Qureshi et al. (2025) [10] use NASNetMobile as one of twelve pre-trained deep learning models. This method uses a two-stage training approach, optimizing the entire model after first freezing the layers of the base model to train newly added dense layers. During feature extraction, this method improves CNN convolution stability, which raises prediction accuracy. Iqbal et al. (2020) [11] proposed an automated multi-class skin lesion classification approach using deep convolutional neural networks such as ResNet50 and EfficientNetB0. Their study highlights the importance of model architecture selection and preprocessing strategies in enhancing diagnostic performance. Furthermore, Iqbal et al. (2022) [12] demonstrated the effectiveness of deep CNNs in extracting discriminative features from complex biomedical images. These findings support our proposed hybrid approach, which combines handcrafted and deep features to improve robustness in skin lesion classification. Recent innovations include the integration of generative adversarial networks (GANs) into classification pipelines. Zhao et al. (2021) [13] proposed a new skin lesion image classification framework based on a skin augmentation style-based GAN (SLAStyleGAN), according to the basic architecture of style-based GANs and Dense Net201. Lightweight architectures such as MobileNetV2 have been introduced. Srinivasu et al. (2021) [14] proposed a computerized process of classifying skin disease through deep learning-based MobileNetV2 and Long Short-Term Memory (LSTM), which proved to be efficient, have better accuracy, and work on lightweight computational devices. EfficientNetB0 is a convolutional neural network optimized for image classification tasks. It uses a compound scaling technique to balance the depth, width, and resolution of the network, improving both accuracy and efficiency. The stem layer processes the input image before passing it to deeper layers for feature extraction. Zhao et al. (2024) [15] showed that EfficientNetB0 is particularly beneficial for medical classification tasks like skin lesion detection. Rehman et al. (2024) [16] explored the potential of transformer-based models for image quality assessment (IQA) and demonstrated that Vision Transformers (ViTs) outperform traditional CNNs in both full-reference and no-reference IQA tasks. This superiority is attributed to their capacity to effectively capture global dependencies.

Other fusion-based skin cancer image classification methods have been developed and continue to evolve, with new approaches for improving skin lesion classification. In this context, Zaw et al. (2024) [17] combined Inception-V3, ResNet-50, and VGG16 models for multi-class classification of lesions (melanoma, BCC, SCC). Their method addresses class imbalance using oversampling and data augmentation techniques, resulting in superior performance compared to individual models. Ren et al. (2021) [18] proposed a new fusion mechanism for skin lesion segmentation by incorporating attention modules to extract detailed information from skin images to achieve early detection of skin lesions. Aditi et al. (2019) [19] introduced a hybrid classification method combining Long Short-Term Memory with CNNs, demonstrating robustness for diverse classification tasks. Alzakari et al. (2024) [2] proposed LesionNet, combining SIFT descriptors with a customized convolutional neural network. Similarly, Thirumaladevi et al. (2024) [20] improved the VGG16 model by fusing deep-layer features, demonstrating significant improvements using the ISIC dataset. Xu et al. (2024) [21] illustrated the potential of AI-based image analysis to detect fine morphological patterns in complex biological structures. This reinforces the broad applicability of hybrid image analysis techniques across medical domains beyond dermatology, further supporting their interdisciplinary significance. Hybrid approaches that combine classical and advanced techniques have gained significant attention in the healthcare domain. Mirmozaffari et al. (2022) [22] proposed a novel hybrid model integrating Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA) to assess hospital efficiency during and post-COVID-19. Although their work focused on efficiency assessment rather than image classification, the idea of leveraging the complementarity of parametric and non-parametric methods is highly relevant to our approach and deserves to be explored in our future work.

These research advancements highlight evolving methods of medical image analysis, where classical methods and deep learning techniques complement each other to improve early detection of skin cancer.

3. Methodology

In this section, we present in detail our methodology (Figure 1) developed for skin lesion classification based on a hybrid feature fusion approach. The process begins with preprocessing the dataset by resizing all dermoscopic images to 128 × 128 pixels to ensure a uniform input size. The images were then normalized by scaling the pixel values to the [0, 1] range using a rescaling factor of 1/255. This normalization helps to stabilize distribution of the input data and can reduce the impact of illumination variations, which are common in dermoscopic images.

The dataset consists of two main folders: one for training and another for testing. Each of these folders contains two subfolders corresponding to the two classes: malignant and benign. The training folder includes a total of 2637 images. We applied an 80/20 split to this training folder to create separate training and validation sets; specifically, 80% of the training images were used for training and 20% for validation. The testing folder, which contains 660 images (300 malignant and 360 benign), was kept unchanged and used entirely for evaluating the model’s generalization performance.

The central phase of the process comprises feature extraction, which is carried out using two complementary approaches. In the initial phase, traditional methods such as HOG, SIFT, LBP, and Gabor filters are used to capture feature information in the region of interest. In the second phase, deep learning models such as EfficientNetB0, ResNet50-101, NASNetMobile, Inception v3, DenseNet201, and MobileNetV2 are used to extract high-level features from database images. Descriptors from the two families of descriptions are then blended with each other to build on their complementarity. A dimensionality reduction process is applied to the blended vector to prevent redundancy and computational complexity. This vector is finally implemented to train and evaluate different classifiers such as SVM to achieve diagnostic lesion prediction. This modular architecture makes it possible not only to separately analyze the effectiveness of each type of method, but also to demonstrate the important contribution of their combination.

3.1. Classical Methods

Prior to deep learning approaches, feature extraction in image classification systems was mainly carried out using traditional methods. The aim of these traditional methods was to acquire discriminating information from visual features such as shape, texture, contours, or gradients in images. Their effectiveness is highly dependent on the quality of preprocessing and the choice of descriptors, which must be correctly adapted to the nature of the data and the classification problem to be solved. In this section, we describe the main aspects of the traditional methods used in our work, presenting their operating principle and their relevance to skin lesion classification.

3.1.1. HOG-Based Method

Histogram of Oriented Gradients is employed to describe the orientation and distribution of edges or local intensity gradients in an image. The idea is to divide the image into small connected areas called cells and compute a histogram of gradient directions for every pixel in each cell. The HOG descriptor is the sum of these histograms. For our implementation, we used a cell size of 8 × 8 pixels, a block size of 2 × 2 cells, nine orientation bins, and L2-Hys normalization. In clinical imaging, HOG assists in discriminative feature extraction from dermoscopy images to enhance skin lesion detection accuracy. Farkhod et al. (2024) [1] improved the HOG method image classification by applying it to using every color channel, thereby enhancing texture feature extraction. HOG offers excellent capability in highlighting structural differences, making it particularly suitable for medical image analysis, as it helps identify irregular borders and morphological patterns that distinguish malignant from benign lesions. The gradients and edge directions that were extracted using the HOG algorithm are shown in Figure 2.

3.1.2. Gabor Filter Feature Extraction

The Gabor descriptor bases its existence on sinusoidal filters that are modulated by a Gaussian function to allow the texture information of an image to be extracted. Sensitive to specific frequencies and orientations, it is particularly effective in detecting finer and directional patterns and is thus best suited for the analysis of biomedical or texture images. Its core function is the identification of oriented structures and local textural variations to facilitate easy performance of operations such as image classification or segmentation. We used Gabor filters with four orientations and two frequencies, applying 31 × 31 kernels and a sigma value of 4. Figure 3 illustrates the results obtained using the Gabor filter on our dataset.

3.1.3. SIFT-Based Method

The SIFT descriptor is a feature extraction algorithm that finds local keypoints an image and extracts a robust description that is scale-, rotation-, and, to some extent, illumination-invariant. In this study, it was employed for the analysis of skin lesion images to extract the texture, shape, and internal structure of lesions effectively. In our implementation, we retained up to 150 keypoints per image with a contrast threshold of 0.04 and an edge threshold of 10. According to Alzakari et al. (2024) [2], the SIFT approach is particularly well suited to medical image analysis. Figure 4 illustrates the results obtained using the SIFT method on the dataset.

3.1.4. LBP-Based Method

The LBP (Local Binary Pattern) is a non-parametric descriptor that combines statistical and structural features. It describes texture using micro-primitives and their statistical placement rules. More precisely, it analyzes the textured image using a particular kernel function, establishing a statistical relationship between a central pixel and its neighbors. This relationship allows calculation of a transformation value by capturing local structural patterns, converting the image into digital labels (decimal values). Its simplicity, robustness, and computational speed have attracted great interest in the scientific community. LBP has thus been widely used, tested, and validated in several applications, like facial image analysis, image retrieval, and texture classification. See Figure 5 for a comparison between the original skin lesion images and their Local Binary Pattern (LBP) representations.

3.2. Deep Learning Methods

After traditional methods, we introduce the most important deep learning models, namely convolutional neural networks. These automatically learn hierarchical feature representations from raw images directly, identifying complex patterns, textures, and semantic information at various levels. Their ability to learn high visual variability renders them particularly valuable in medical imaging, especially for the purposes of diagnosing skin lesions. In this section, we present the deep methods utilized within our work, namely EfficientNetB0, ResNet50-101, NASNetMobile, Inception v3, DenseNet201, and MobileNetV2, and their architecture, functionality, and ability to learn visually informative descriptors for lesion classification.

3.2.1. EfficientNetB0-Based Method

EfficientNetB0, proposed by Tan et al. (2019) [23], is a convolutional neural network model that achieves efficient high accuracy with depth, width, and resolution scaling uniformly. It contains 237 layers consisting of mobile inverted bottleneck convolution (MBConv) blocks and squeeze-and-excitation modules, which enhance feature extraction as well as channel attention. The model takes a 224 × 224 image and has approximately 5.3 million parameters. In our implementation of the EfficientNetB0-based method, the model was initialized with pre-trained ImageNet weights and configured with “include_top = False” to exclude the classification layers. Images were resized to 128 × 128 pixels and normalized to the [0, 1] range. Convolutional outputs were flattened to one-dimensional feature vectors using a Flatten () layer. These vectors were then reduced in dimensionality using Principal Component Analysis (PCA) and classified with the SVM classifier optimized with GridSearchCV. EfficientNetB0 is well known for its exceptional performance in feature extraction and image classification tasks, with a relatively low computational cost.

3.2.2. ResNet-Based Method

ResNet-50 and ResNet-101 are members of the Residual Network (ResNet) family introduced by He et al. (2015) [3]. They use residual connections to simplify the training of deep networks by avoiding the vanishing gradient problem. ResNet-50, which has 50 layers and approximately 25.6 million parameters, offers the best performance and simplicity trade-off. On the other hand, ResNet-101 contains 101 layers and 44.6 million parameters, enabling more abstract and deeper features to be drawn. In this technique, the model was used as a fixed feature extractor by loading pre-trained weights and excluding fully connected classification layers (include_top = False). Input images were resized to 128 × 128 pixels and normalized to the [0, 1] interval. Convolutional outputs were flattened to feature vectors using a Flatten () layer. These features were then reduced dimensionally using PCA and classified using an SVM. These models are highly suited for image classification tasks, especially medical imaging tasks, where the ability to learn multi-scale distinguishing representation would help in identifying and classifying skin lesions.

3.2.3. NASNetMobile-Based Method

The NASNetMobile based approach is a convolutional neural network architecture by Zoph et al. (2018) [4] within the Neural Architecture Search (NAS) framework. NASNetMobile was created by employing a reinforcement learning process that searched for the best architectural building blocks. The model is 88 layers deep and uses two kinds of modular cells—normal cells and reduction cells—so that the network can learn hierarchical representations efficiently. Despite having only 5.3 million parameters, NASNetMobile is designed for lightweight use on mobile devices while maintaining competitive accuracy. We used the model as a fixed feature extractor with pre-trained weights and without the fully connected classification layers (include_top = False). Dermoscopic images were pre-processed by resizing to 128 × 128 pixels and normalizing pixel values in the range [0, 1]. The convolutional feature maps were reshaped to one-dimensional feature vectors with a Flatten () layer. After that, dimensionality reduction was achieved using Principal Component Analysis (PCA), and the features were classified using a Support Vector Machine (SVM). Its primary objective is to propose a compact but high-performance image classification architecture that is especially well-suited for resource-limited settings like mobile medical diagnostic equipment.

3.2.4. DenseNet201-Based Method

DenseNet201, proposed by Huang et al. [5] in 2017, is a 201-layer deep convolutional neural network with dense connections. In this network, every layer receives the output of all preceding layers as input, which reinforces feature propagation and improves the vanishing gradient problem. The model takes 224 × 224-sized images as input and contains about 20 million parameters. DenseNet201′s dense connectivity supports the reuse of features across layers in an efficient manner, enhancing the efficiency of learning and overall image classification performance. In our study, we used DenseNet201 as a pre-trained fixed feature extractor without the fully connected classification layers. We resized the dermoscopic images to 128 × 128 pixels and normalized the pixel values to the [0, 1] range. The convolutional feature maps were transformed into one-dimensional feature vectors using a Flatten () layer. After that, we applied PCA for dimensionality reduction, followed by SVM for feature classification.

3.2.5. MobileNetV2-Based Method

MobileNetV2, proposed by Sandler et al. [6] in 2018, is a lightweight convolutional neural network for mobile and embedded vision applications. It contains 53 layers formed by bottleneck depthwise separable convolutional blocks, which largely decrease the number of parameters and computational demand. We used the model as a pre-trained fixed feature extractor without fully connected classification layers. The input images were resized to 128 × 128 pixels and normalized to the [0, 1] range. Feature maps of the convolutional layers were transformed into one-dimensional feature vectors using a Flatten () layer. Dimensionality reduction was performed using Principal Component Analysis (PCA), and then the extracted features were classified with a SVM classifier.

3.3. Fusion Methods

In skin lesion classification, feature fusion is a critical factor to enhance the performance of image classification models. The figure below (Figure 6) shows a hybrid approach that combine features from classical methods and deep learning methods. These heterogenous feature vectors are first concatenated to form a large and diverse representation of the image.

In our implementation, we performed early fusion via feature-level concatenation, where HOG features and deep features extracted using CNN models were concatenated into a single vector for each image. Prior to fusion, each image was processed independently to extract both feature types: HOG features were extracted from the grayscale version of each image, using nine orientations, 8 × 8 pixels per cell, and 2 × 2 cells per block, and normalized with L2-Hys. CNN features were obtained using a pre-trained model followed by global average pooling. Once extracted, the two feature vectors were concatenated directly. After fusion, PCA was applied to the concatenated feature vector, preserving 95% of the total variance. This allowed us to balance the dimensionality of the combined representation and reduce computational complexity for the SVM classifier. Such an early fusion strategy (feature fusion before classification) is especially beneficial in applications where complementary information from different feature types is required for accurate and reliable classification. We selected the best classical method, HOG, and then combined it with deep learning methods by performing fusion through concatenation.

4. Experimental Results and Discussion

This section presents key considerations for evaluating the performance of our hybrid feature fusion system for skin lesion classification. Feature extraction using both deep learning and conventional approaches was carried out on the Kaggle platform with GPU P100 acceleration, using TensorFlow (version 2.17.1) to enable efficient processing of large image datasets. The experiments focused on comparing classification accuracy. The SVM classifier was optimized using GridSearchCV, resulting in a robust and well-generalizing model with the best hyperparameters (C = 10, kernel = RBF, gamma = scale). This choice is justified by extensive testing, which confirmed the superior performance of this approach over other classifiers, including Random Forest, K-Nearest Neighbors, Logistic Regression, and Gradient Boosting. The results indicate that feature fusion improves the model’s performance compared to using individual types of features.

4.1. Dataset and Distribution

The experiments were conducted on a publicly available skin cancer malignant vs benign dataset from Kaggle [24], which contains dermoscopic images that have been divided into two classes: benign and malignant (Table 1). As Table 1 indicates, the dataset includes 3297 dermoscopic images, with 2637 used for training and 660 for testing. The class distribution is approximately balanced, with 1197 malignant and 1440 benign images in the training set and 300 malignant and 360 benign images in the test set. This binary classification dataset provides a diverse set of lesion types, enabling classification models to be evaluated under clinically plausible conditions. Figure 7 provides a visual representation of a sample of the skin lesion database used in this study.

In order to verify the distribution of negative and positive data, we compared the number of images for each class for the test and training data. The diagram of image numbers per class (Figure 8) demonstrated that both classes are balanced. Consequently, we did not need to use a data augmentation method to generate data to balance the database.

4.2. Evaluation Metrics

To evaluate the performance of our fusion method and individual approaches, we use several essential metrics, defined as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(1)

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F ₁ = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

where TP = True Positive, TN = True Negative, FP = False Positive, and FN = False Negative

4.3. Results of Classical Methods

In this section, we present the classification results obtained using traditional feature extraction techniques, namely HOG, Gabor filters, SIFT, and LBP, applied to skin lesion images. These methods were evaluated individually, and their performance was measured in terms of classification accuracy for the two classes: benign and malignant. As shown in Table 2, HOG achieved the highest individual accuracy, at 80%, followed by Gabor filters (79%), SIFT (78%), and LBP (76%). These results demonstrate the varying effectiveness of traditional methods, with HOG proving the most discriminative.

HOG is widely regarded as the most effective traditional feature descriptor compared to LBP and SIFT. This is mainly because HOG captures crucial edge and shape information by encoding the distribution of gradient orientations, which is essential for identifying irregular borders and morphological patterns that distinguish malignant from benign lesions. In contrast, LBP primarily focuses on local texture patterns, while SIFT detects distinctive keypoints invariant to scale and rotation. However, these features are less effective at capturing the global lesion structure, which is critical for accurate clinical diagnosis. This is why HOG consistently outperforms LBP and SIFT in our experiments. In the proposed hybrid fusion method, we intend to combine HOG with CNN-based approaches to enhance classification performance for skin lesion analysis.

4.4. Results of Deep Learning Methods

This section presents the classification results obtained using deep learning-based feature extraction methods on skin lesion images. Table 3 summarizes the evaluation of several pre-trained convolutional neural network architectures, including EfficientNetB0, ResNet50, ResNet101, NASNetMobile, DenseNet201, and MobileNet.

The features obtained using CNN were followed by dimensionality reduction using Principal Component Analysis (PCA). We applied PCA with a 95% variance retention threshold to determine the optimal number of components. This choice offered a favorable trade-off between reducing dimensionality and maintaining classification performance. Lower thresholds resulted in reduced accuracy, while higher thresholds led to increased computational cost without significant performance gains. The reduced feature vectors were then classified using a Support Vector Machine (SVM) classifier.

DenseNet201 and MobileNet achieved the highest classification accuracy at 88%, followed by NASNetMobile at 85%. ResNet101 reached 83%, while EfficientNetB0 and ResNet50 both scored 82%. These results demonstrate the strong discriminative ability of deep features for distinguishing between benign and malignant lesions. We note that DenseNet201 and MobileNet stand out as the top-performing individual models in this study.

4.5. Results of Fusion Methods

This section demonstrates the effectiveness of fusing the best conventional method, HOG, with deep learning-based feature extraction methods for skin lesion classification. More specifically, the combination of the HOG descriptor, known for its ability to capture local texture details, with the best deep learning models significantly improves classification accuracy. Table 4 shows the comprehensive results for each fusion model, including overall accuracy, precision, recall, F1-score, and ROC-AUC. The best overall accuracy of 89% was attained by the combinations of DenseNet201 + HOG, MobileNetV2 + HOG, and ResNet50 + HOG. These fusion strategies improved ResNet50 by 6% and MobileNetV2 and Dense-Net201 by 1% compared to individual deep learning models. In terms of class-specific performance, these models also achieved excellent results in both benign and malignant lesion categories, with precision, recall, and F1-scores consistently at or above 0.88. Furthermore, all top-performing models achieved a high ROC-AUC score of 0.95, indicating strong discriminative power. These results confirm that the proposed fusion approach outperforms individual methods, providing more robust and accurate differentiation between benign and malignant lesions.

4.6. Confusion Matrix and ROC Curves

The confusion matrix is a crucial evaluation tool that provides detailed insights into the performance of a classification model by showing the true positive, false positive, true negative, and false negative counts. The confusion matrix is a table layout for visualizing model performance, also referred to as an error matrix. The diagonal line, highlighted in dark blue, represents the number of true positives and true negatives. It can be used to determine various quantitative figures such as precision, recall, and accuracy (Figure 9, Figure 10, Figure 11 and Figure 12). The confusion matrices and the ROC curves are illustrated below for HOG-, ResNet50-, Densnet201-, and MobileNet-based models (Figure 9, Figure 10, Figure 11 and Figure 12).

4.7. Performance Analysis

In this work, we have presented three confusion matrices are presented to highlight the performances of different methods. The HOG confusion matrix shows decent classification performance. This indicates that, although HOG is good at local texture feature extraction, it lacks the deep semantic understanding needed to distinguish complex lesion patterns. The fusion of HOG and MobileNetV2 significantly reduces misclassifications, i.e., false negatives (benign lesions identified as malignant), which is extremely critical in medical diagnosis. This improvement shows the complementarity of deep semantic features (MobileNetV2) and local texture (HOG). Of the three approaches, fusion with DenseNet201 performs best overall, with the maximum number of correctly classified benign and malignant lesions. This model shows high sensitivity and generalization, demonstrating that the combination of DenseNet’s deep features with HOG local information enhances the model to classify lesions based on refined differences between lesions. The analysis of confusion matrices validates that hybrid models combining HOG with deep learning structures, especially DenseNet201 and MobileNetV2, show high sensitivity and specificity improvements and are therefore more reliable for clinical decisions.

5. Conclusions

We conclude that deep learning features combined with conventional extraction methods such as HOG, Gabor, SIFT, and LBP significantly improve the performance of skin lesion classification. Conventional methods offer the advantage of providing interpretable and localized information on texture and shape, but they struggle to capture the high-level semantic features required for complex medical imaging. In contrast, deep learning architectures such as EfficientNetB0, ResNet50, ResNet101, NASNetMobile, DenseNet201, and MobileNetV2 demonstrate strong capabilities in automatic feature extraction. The proposed fusion approach followed by PCA has improved classification results in terms of accuracy. These findings underscore the complementary strengths of handcrafted and deep features and suggest promising research directions focused on optimizing fusion strategies and enhancing their generalizability across diverse medical datasets and imaging modalities. In our future work, we propose to extend our approach to address multiclass classification of skin lesions, which better reflects real-world diagnostic challenges, and to evaluate its performance on various publicly available datasets such as PH2 and ISIC. We propose to integrate attention mechanisms to improve skin lesion classification by focusing on relevant regions in the image. We also plan to use GAN-based data augmentation techniques to increase training data and enhance classification performance.

Author Contributions

Conceptualization, M.Z. and M.R.; methodology, M.Z. and M.R.; software, M.Z. and M.R.; validation, M.R. and R.A.; formal analysis, M.Z. and M.R.; investigation, M.Z. and M.R.; resources, M.Z. and M.R.; data curation, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, M.Z. and M.R.; visualization, M.Z.; supervision, M.R. and R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets used in this work are publicly available.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Farkhod, F. Investigation of Ways to Improve the HOG Method in the Classification of Histological Images by Machine Learning Methods. Int. J. Multidiscip. Res. 2024, 6, 5. [Google Scholar] [CrossRef]
Alzakari, S.A.; Ojo, S.; Wanliss, J.; Umer, M.; Alsubai, S.; Alasiry, A.; Marzougui, M.; Innab, N. LesionNet: An Automated Approach for Skin Lesion Classification Using SIFT Features with Customized Convolutional Neural Network. Front. Med. 2024, 11, 1487270. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
Sandler, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Singh, L.K.; Pooja, N.; Garg, H.; Khanna, M. Histogram of Oriented Gradients (HOG)-Based Artificial Neural Network (ANN) Classifier for Glaucoma Detection. Int. J. Swarm Intell. Res. 2022, 13, 1–32. [Google Scholar] [CrossRef]
Stockmeier, H.G.; Bäcker, H.; Bäumler, W.; Lang, E.W. BSS-Based Feature Extraction for Skin Lesion Image Classification. Lect. Notes Comput. Sci. 2009, 5441, 467–474. [Google Scholar] [CrossRef] [PubMed]
Ajel, A.R.; Al-Dujaili, A.Q.; Hadi, Z.G.; Humaidi, A.J. Skin Cancer Classifier Based on Convolution Residual Neural Network. Int. J. Power Electron. Drive Syst./Int. J. Electr. Comput. Eng. 2023, 13, 6240–6248. [Google Scholar] [CrossRef]
Qureshi, M.; Sethi, N.M.J.; Hussain, N.S.S. Artificial Intelligence-Based Skin Lesion Analysis and Skin Cancer Detection. Pak. J. Eng. Technol. 2025, 7, 183–191. [Google Scholar] [CrossRef]
Iqbal, I.; Younus, M.; Walayat, K.; Kakar, M.U.; Ma, J. Automated Multi-Class Classification of Skin Lesions through Deep Convolutional Neural Network with Dermoscopic Images. Comput. Med. Imaging Graph. 2021, 88, 101843. [Google Scholar] [CrossRef] [PubMed]
Iqbal, I.; Walayat, K.; Kakar, M.U.; Ma, J. Automated Identification of Human Gastrointestinal Tract Abnormalities Based on Deep Convolutional Neural Network with Endoscopic Images. Intell. Syst. Appl. 2022, 16, 200149. [Google Scholar] [CrossRef]
Zhao, C.; Shuai, R.; Ma, L.; Liu, W.; Hu, D.; Wu, M. Dermoscopy Image Classification Based on StyleGAN and DenseNet201. IEEE Access 2021, 9, 8659–8679. [Google Scholar] [CrossRef]
Srinivasu, P.N.; SivaSai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of Skin Disease Using Deep Learning Neural Networks with MobileNet V2 and LSTM. Sensors 2021, 21, 2852. [Google Scholar] [CrossRef] [PubMed]
Zhao, H.; Wu, Y.; Lu, Y. A Study on Skin Lesions Classification Based on Improved EfficientNetB0 Network. In Proceedings of the 2024 International Conference on New Trends in Computational Intelligence (NTCI), Qingdao, China, 18–20 October 2024. [Google Scholar] [CrossRef]
Rehman, M.U.; Nizami, I.F.; Ullah, F.; Hussain, I. IQA Vision Transformed: A Survey of Transformer Architectures in Perceptual Image Quality Assessment. IEEE Access 2024, 12, 183369–183399. [Google Scholar] [CrossRef]
Zaw, K.P.; Mon, A. Enhanced Multi-Class Skin Lesion Classification of Dermoscopic Images Using an Ensemble of Deep Learning Models. J. Comput. Theor. Appl. 2024, 2, 256–267. [Google Scholar] [CrossRef]
Ren, Y.; Yu, L.; Tian, S.; Cheng, J.; Guo, Z.; Zhang, Y. Serial Attention Network for Skin Lesion Segmentation. J. Ambient Intell. Humaniz. Comput. 2021, 13, 799–810. [Google Scholar] [CrossRef]
Aditi, N.; Nagda, M. K.; Eswaran, P. Image Classification Using a Hybrid LSTM-CNN Deep Neural Network. Int. J. Eng. Adv. Technol. 2019, 8, 1342–1348. [Google Scholar] [CrossRef]
Thirumaladevi, S.; Veeraswamy, K.; Sailaja, M.; Shaik, S. Synergistic Feature Fusion for Accurate Skin Cancer Classification. Int. J. Pharm. Res. Technol. 2024, 14, 79–86. [Google Scholar]
Xu, X.; Wang, W.; Liu, Y.; Bäckemo, J.; Heuchel, M.; Wang, W.; Yan, N.; Iqbal, I.; Kratz, K.; Lendlein, A. Substrates Mimicking the Blastocyst Geometry Revert Pluripotent Stem Cell to Naivety. Nat. Mater. 2024, 23, 1748–1758. [Google Scholar] [CrossRef]
Mirmozaffari, M.; Yazdani, R.; Shadkam, E.; Khalili, S.M.; Tavassoli, L.S.; Boskabadi, A. A Novel Hybrid Parametric and Non-Parametric Optimisation Model for Average Technical Efficiency Assessment in Public Hospitals during and Post-COVID-19 Pandemic. Bioengineering 2022, 9, 7. [Google Scholar] [CrossRef] [PubMed]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
Kaggle. Skin Cancer: Malignant vs Benign. Available online: https://www.kaggle.com/datasets/fanconic/skin-cancer-malignant-vs-benign (accessed on 14 April 2023).

Figure 1. Proposed methods flowchart.

Figure 2. Original images and corresponding HOG representations.

Figure 3. Original skin lesion images and their Gabor filters.

Figure 4. SIFT keypoint detection in skin lesion images.

Figure 5. Original skin lesion images and their Local Binary Pattern (LBP) representations.

Figure 6. Fusion of deep learning and classical features.

Figure 7. Sample images from the skin-cancer-malignant-vs-benign dataset.

Figure 8. Class distribution of train and test images in the dataset.

Figure 9. Confusion matrix (a) and ROC curve (b) for the HOG-based method.

Figure 10. Confusion matrix (a) and ROC curve (b) for the ResNet50-HOG based method.

Figure 11. Confusion matrix (a) and ROC curve (b) for the Densnet201-HOG based method.

Figure 12. Confusion matrix (a) and ROC curve (b) for the MobileNet-HOG method.

Table 1. Distribution of images in the Skin Lesion Dataset (Kaggle).

Class	Train Images	Test Images
Benign	1440	360
Malignant	1197	300
Total	2637	660

Table 2. Performance comparison of classical classification models.

Model	Accuracy	Class	Precision	Recall	F1-Score	ROC-AUC
HOG	80%	Benign	0.82	0.80	0.81	0.87
HOG	80%	Malignant	0.77	0.79	0.78	0.87
Gabor	79%	Benign	0.77	0.86	0.81	0.87
Gabor	79%	Malignant	0.81	0.70	0.75	0.87
SIFT	78%	Benign	0.80	0.78	0.79	0.85
SIFT	78%	Malignant	0.75	0.77	0.76	0.85
LBP	76%	Benign	0.76	0.81	0.79	0.82
LBP	76%	Malignant	0.76	0.69	0.72	0.82

Table 3. Comparative analysis of classification models for skin lesion detection.

Model	Accuracy	Class	Precision	Recall	F1-Score	ROC-AUC
EfficientNetB0	82%	Benign	0.86	0.80	0.83	0.91
EfficientNetB0	82%	Malignant	0.78	0.84	0.81	0.91
Resnet50	82%	Benign	0.85	0.82	0.84	0.91
Resnet50	82%	Malignant	0.79	0.82	0.81	0.91
ResNet101	83%	Benign	0.88	0.81	0.84	0.92
ResNet101	83%	Malignant	0.79	0.86	0.82	0.92
NASNetMobile	85%	Benign	0.84	0.89	0.86	0.93
NASNetMobile	85%	Malignant	0.86	0.79	0.82	0.93
Densnet201	88%	Benign	0.88	0.90	0.89	0.95
Densnet201	88%	Malignant	0.87	0.85	0.86	0.95
MobileNet	88%	Benign	0.89	0.88	0.89	0.95
MobileNet	88%	Malignant	0.86	0.87	0.86	0.95

Table 4. Performance of classification models using deep learning architectures with HOG.

Model	Accuracy	Class	Precision	Recall	F1-Score	ROC-AUC
EfficientNetB0 + HOG	87%	Benign	0.90	0.86	0.88	0.95
EfficientNetB0 + HOG	87%	Malignant	0.84	0.88	0.86	0.95
Resnet50 + HOG	89%	Benign	0.90	0.89	0.90	0.95
Resnet50 + HOG	89%	Malignant	0.88	0.89	0.88	0.95
ResNet101 + HOG	88%	Benign	0.88	0.90	0.89	0.95
ResNet101 + HOG	88%	Malignant	0.87	0.86	0.87	0.95
NASNetMobile + HOG	86%	Benign	0.86	0.89	0.87	0.94
NASNetMobile + HOG	86%	Malignant	0.86	0.82	0.84	0.94
Densnet201 + HOG	89%	Benign	0.89	0.90	0.90	0.95
Densnet201 + HOG	89%	Malignant	0.88	0.87	0.87	0.95
MobileNetV2 + HOG	89%	Benign	0.89	0.90	0.90	0.95
MobileNetV2 + HOG	89%	Malignant	0.88	0.87	0.88	0.95

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zahid, M.; Rziza, M.; Alaoui, R. Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods. BioMedInformatics 2025, 5, 41. https://doi.org/10.3390/biomedinformatics5030041

AMA Style

Zahid M, Rziza M, Alaoui R. Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods. BioMedInformatics. 2025; 5(3):41. https://doi.org/10.3390/biomedinformatics5030041

Chicago/Turabian Style

Zahid, Maryem, Mohammed Rziza, and Rachid Alaoui. 2025. "Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods" BioMedInformatics 5, no. 3: 41. https://doi.org/10.3390/biomedinformatics5030041

APA Style

Zahid, M., Rziza, M., & Alaoui, R. (2025). Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods. BioMedInformatics, 5(3), 41. https://doi.org/10.3390/biomedinformatics5030041

Article Menu

Skin Lesion Classification Using Hybrid Feature Extraction Based on Classical and Deep Learning Methods

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Classical Methods

3.1.1. HOG-Based Method

3.1.2. Gabor Filter Feature Extraction

3.1.3. SIFT-Based Method

3.1.4. LBP-Based Method

3.2. Deep Learning Methods

3.2.1. EfficientNetB0-Based Method

3.2.2. ResNet-Based Method

3.2.3. NASNetMobile-Based Method

3.2.4. DenseNet201-Based Method

3.2.5. MobileNetV2-Based Method

3.3. Fusion Methods

4. Experimental Results and Discussion

4.1. Dataset and Distribution

4.2. Evaluation Metrics

4.3. Results of Classical Methods

4.4. Results of Deep Learning Methods

4.5. Results of Fusion Methods

4.6. Confusion Matrix and ROC Curves

4.7. Performance Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI