Hybrid Transform-Based Feature Extraction for Skin Lesion Classification Using RGB and Grayscale Analysis

López-Ávila, Luis Felipe; Álvarez-Borrego, Josué

doi:10.3390/app15115860

Open AccessArticle

Hybrid Transform-Based Feature Extraction for Skin Lesion Classification Using RGB and Grayscale Analysis

by

Luis Felipe López-Ávila

and

Josué Álvarez-Borrego

^*

Centro de Investigación Científica y de Educación Superior de Ensenada, B. C. (CICESE), Carretera Ensenada-Tijuana No. 3918, Zona Playitas, Ensenada 22860, BC, Mexico

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 5860; https://doi.org/10.3390/app15115860

Submission received: 25 April 2025 / Revised: 21 May 2025 / Accepted: 22 May 2025 / Published: 23 May 2025

(This article belongs to the Special Issue Recent Advances in Biomedical Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

A technique for developing a potential automated system for classifying skin lesions.

Abstract

Automated skin lesion classification using machine learning techniques is crucial for early and accurate skin cancer detection. This study proposes a hybrid method combining the Hermite, Radial Fourier–Mellin, and Hilbert transform to extract comprehensive features from skin lesion images. By separating the images into red, green, and blue (RGB) channels and grayscale, unique textural and structural information specific to each channel is analyzed. The Hermite transform captures localized spatial features, while the Radial Fourier–Mellin and Hilbert transforms ensure global invariance to scale, translation, and rotation. Texture information for each channel is also obtained based on the Local Binary Pattern (LBP) technique. The proposed hybrid transform-based feature extraction was applied to multiple lesion classes using the International Skin Imaging Collaboration (ISIC) 2019 dataset, preprocessed with data augmentation. Experimental results demonstrate that the proposed method improves classification accuracy and robustness, highlighting its potential as a non-invasive AI-based tool for dermatological diagnosis.

Keywords:

skin lesion classification; Hermite transform; radial Fourier–Mellin transform

1. Introduction

Early detection of skin cancer is crucial for improving patient outcomes, as it remains a common and potentially deadly disease worldwide. To support dermatologists in diagnosis, automated classification of skin lesions has gained significant attention through the integration of machine learning (ML) and advanced digital image processing. By leveraging mathematical transformations and feature extraction techniques, Machine learning models have demonstrated promising capabilities in identifying diagnostic patterns from digital image data, making them practical tools for classifying skin lesions [1,2,3,4,5,6,7,8].

Integral transforms, such as Fourier, Mellin, and Hilbert, have proven particularly useful in generating feature sets that are invariant to changes in rotation, scale, and translation—properties that are essential for analyzing skin lesions, which exhibit considerable variability in shape, size, and orientation [9,10,11]. Frequency-based patterns, essential for differentiating various lesion textures and structures, have been analyzed using the Fourier transform, which is widely used in image processing [12,13]. The Mellin transform has also been applied to address the issue of varying lesion sizes in images, introducing scale invariance to the extracted features and enhancing their robustness [14,15,16]. Hilbert transform complements this by ensuring rotational invariance, thus producing unique and repeatable signatures, as demonstrated in skin lesion classification studies [17].

In our previous work, we successfully implemented the Radial Fourier–Mellin and Hilbert transform to classify skin lesions accurately. This approach created stable image signatures across eight lesion types, producing vital classification metrics [17]. These results underscore the potential of combining transform-based feature extraction with ML for robust lesion classification, especially when global invariance properties are required.

This study builds on previous work by incorporating the Hermite transform, which is applied independently to the image’s red, green, and blue channels and grayscale. The Hermite transform is well suited for capturing localized spatial features and intricate textures, essential for skin lesion analysis, as different color channels often contain unique structural information relevant to pigmentation, boundary definition, and textural detail [18,19,20]. Although the Hermite transform has primarily been used in areas such as biometric recognition, where fine-grained spatial analysis is essential, recent research suggests that the Hermite transform can be effectively applied to medical imaging for detailed feature extraction [21].

This study aims to enhance the classification of skin lesions by integrating the Hermite transform with the Radial Fourier–Mellin and Hilbert transform across separated RGB channels. It is hypothesized that this hybrid approach will enhance the robustness and accuracy of lesion classification, thereby advancing AI-driven, non-invasive diagnostic tools in dermatology. The following sections outline our methodology, experimental setup, results, and the impact of the Hermite transform on the overall classification performance.

The development of automated skin lesion classification systems has been an area of extensive research in medical image analysis, driven by the critical need for early skin cancer detection. Artificial intelligence (AI) and machine learning (ML) have significantly contributed to dermatology by improving the classification of lesion types using digital images. These approaches enhance diagnostic precision by employing feature extraction methods, multi-channel image analysis, and a range of mathematical transformations.

Several studies have highlighted the effectiveness of Fourier, Mellin, and Hilbert transforms for extracting stable invariant features under changes in rotation, scale, and translation—crucial properties for skin lesion analysis where lesions vary in appearance. The Fourier transform has been extensively utilized in medical imaging to extract global features, capturing frequency-based patterns in lesions for reliable classification [22]. The Mellin transform is often combined with the Fourier transform to improve accuracy by addressing size variations in lesions, introducing scale invariance, and enhancing the overall classification performance [23]. The Hilbert transform complements these by introducing rotational invariance, thus producing unique and repeatable signatures suitable for dermatological classification tasks, as shown in various studies on skin lesions [24].

Recent research has increasingly focused on hybrid approaches that combine multiple transforms to exploit complementary feature extraction strengths. For instance, in dermatology, hybrid techniques integrating texture descriptors, Fourier-based signatures, and ML classifiers have achieved superior classification performance over single-transform approaches by incorporating global and local features [17]. This integration has shown promise for skin lesions, where patterns such as edges, textures, and color gradients contribute significantly to lesion differentiation.

Though less commonly applied in dermatology, the Hermite transform has demonstrated potential in other fields, like facial recognition and fingerprint analysis, for its ability to capture fine-grained spatial and textural details. This transform uses Hermite polynomials to decompose images, effectively characterizing localized structures within complex image regions, such as lesion boundaries [20]. While a few studies have applied the Hermite transform to multi-channel RGB images, evidence suggests that color channel-specific information enhances image analysis, especially for skin lesions where pigmentation and textural details are critical [25,26,27].

This study builds upon these advancements by integrating the Hermite transform with the Radial Fourier–Mellin and Hilbert transform across separated RGB channels to generate comprehensive feature sets for skin lesion classification. We aim to achieve higher accuracy and robust differentiation between lesion types by combining global invariant features and localized structures.

2. Materials and Methods

2.1. Image Dataset

Our dataset comprises digital images of various skin lesion types from the International Skin Imaging Collaboration (ISIC) 2019 dataset. Initial preprocessing involved the removal of images with significant noise (e.g., hair, artifacts) to enhance lesion visibility and ensure quality input for feature extraction. Data augmentation was applied to create a robust and balanced dataset, including rotations (at 45° increments) and scaling (100%, 95%, 90%, 85%, 80%). This preprocessing yielded a dataset of skin lesion images with each lesion type represented equally, reducing class imbalance and improving generalization for classification. Figure 1 illustrates some examples of skin lesion images contained in the dataset.

2.2. RGB Channel and Grayscale Separation

Each preprocessed image was separated into its red, green, and blue color channels and grayscale. This step allowed us to treat each channel independently, capitalizing on each channel’s unique textural and color information. For each color channel, we computed transform-based features that emphasize distinct lesion characteristics across the RGB and grayscale spectrum. Separating the channels enabled us to retain and enhance color-specific information, which is vital for accurately distinguishing skin lesion types.

2.3. Hermite Transform

The Hermite transform is a specific type of polynomial transform and can be regarded as a model for image representation. This transformation serves as a method for signal decomposition and involves two main steps. Initially, the input signal

L (x, y)

is combined with a window function through multiplication,

v (x - p, y - q),

(1)

at the positions

p

and

q

.

The goal is to achieve a comprehensive representation of the signal. This process is repeated at multiple positions spaced evenly across the image, forming a sampling grid

S

. At each

(x, y)

, the pixel coordinates and the input signal are multiplied by the window function, while the original signal is

L (x, y) = \frac{1}{W (x, y)} \sum_{p, q \in S} L (x, y) v (x - p, y - q),

(2)

where

W (x, y) = \sum_{p, q \in S} v (x - p, y - q)

(3)

is a weighting function.

The only requirement is that Equation (3) must be nonzero for all

(x, y)

. Next, the signal within the window is expressed as a weighted sum of

G_{m, n - m} (x, y)

, with degrees

m

and

n - m

relative to

x, y

, respectively. These polynomials are determined by the window function.

\int_{- \infty}^{\infty} \int_{- \infty}^{\infty} v^{2} (x, y) G_{m, n - m} (x, y) \times G_{l, k - l} (x, y) d x d y = δ_{n l} δ_{m k} .

(4)

Here,

n, l = 0, 1, 2, \dots, \infty

and

m, k = 0, 1, 2, \dots, \infty

, while

δ_{n l}, δ_{m k}

represent the Kronecker delta functions, and

\times

denotes element-wise multiplication. The process of converting the input signal into a weighted sum of polynomials, referred to as polynomial coefficients, is called the direct polynomial transform. These polynomial coefficients,

L_{m, n - m} (p, q)

, are obtained by convolving the original image with the analysis filters.

D_{m, n - m} (x, y) = G_{m, n - m} (x, y) v^{2} (- x, - y)

(5)

That is, for everything,

(p, q) \in S

,

L_{m, n - m} (p, q) = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} L (x, y) \times D_{m, n - m} (x - p, y - q) d x d y

(6)

where

G_{m, n - m} (p, q) = \frac{1}{\sqrt{2^{n} (n - m)! m!}} \times H_{m} (\frac{x}{σ}) H_{n - m} (\frac{y}{σ}),

(7)

v (x, y) = \frac{1}{\sqrt{\sqrt{π} σ}} e^{- \frac{x^{2} + y^{2}}{2 σ^{2}}},

(8)

where

v (x, y)

is the Gaussian window function,

σ

is the standard deviation of the Gaussian window function and,

H_{n} (x) = {(- 1)}^{n} e^{x^{2}} \frac{d^{n}}{d x^{n}} e^{- x^{2}}, n = 0, 1, 2, \dots

(9)

where

H_{n} (x)

is the n-th Hermite polynomial.

Using the convolution form in Equation (6) it is defined as,

L_{m, n - m} (p, q) = L (x, y) * D_{m, n - m} (x, y) .

(10)

2.4. The Signatures

We generated multiple signature vectors for the Hermite transform of order (1,1) for each RGB channel and grayscale images using a dataset of 362,680 samples created through data augmentation. The augmentation process included five scaling percentages (100%, 95%, 90%, 85%, and 80%) and eight rotation angles (45°, 90°, 135°, 180°, 225°, 270°, 315°, and 360°). These descriptors utilized invariance properties to translation and scaling from the Fourier and Mellin transforms, respectively. The Hilbert transform was applied for rotational invariance. Unique image signatures were computed by summing pixel values within each ring produced by Hilbert masks used as filters. The texture descriptors were then incorporated into the previously generated radial Fourier signatures.

This process resulted in a one-dimensional representation or signature of the skin lesion digital image, as illustrated in Figure 2 and Figure 3. The original image

I m (x, y)

consists of three RGB matrix channels (red, green, and blue). These were separated into their respective primary color channels for the application of the Radial Fourier–Mellin method and uniform Local Binary Pattern (LBP) feature extraction. Additionally, the grayscale skin lesion image was derived as a weighted sum of RGB values using the formula 0.299R + 0.587G + 0.114B.

2.5. Radial Fourier–Mellin Signatures Through Hilbert Transform

To create the Radial Fourier–Mellin signatures, the image was first separated into its RGB channels and grayscale (Figure 3a). After that, the Hermite transform of order

(1,1)

is obtained from each image component, RGB channels, and grayscale (Figure 3b). The Hermite transform was implemented using order (1,1) to balance computational efficiency and spatial detail capture. Preliminary experiments with higher orders (e.g., (2,2), (3,3)) did not yield significant improvements in classification performance but increased the feature vector dimensionality and computational cost. Therefore, order (1,1) was selected to effectively capture first-order spatial derivatives relevant to lesion boundary and texture features. Next, the magnitude of the Fourier–Mellin (FM) transform for each skin lesion digital image, denoted as

I m (x, y)

, was computed using the following equation (Figure 3c).

{| F}_{M} (s, t) | = \iint_{0}^{\infty} |F T [I m (x, y)]| x^{(s - 1)} y^{(t - 1)} dxdy = M \{|F T [I m (x, y)]|\},

(11)

Here,

{| F}_{M} (s, t) |

represents the magnitude of the Mellin transform, which provides scale invariance for objects in the image. This is essential because the skin lesion digital images were captured at varying distances between the lesion and the camera. As a result, lesions appear smaller at greater distances and larger when the distance is shorter. The coordinates

(s, t)

represent the 2D positions of the transformed

(x, y)

pixel coordinates on Mellin’s plane. These original

(x, y)

coordinates correspond to the magnitude of the Fourier transform of the image,

|F T [I m (x, y)]|

, leveraging its translation invariance. Consequently, at this stage, the object (skin lesion) in the image becomes invariant to both translation and scale.

By applying the Hilbert transform, rotational invariance of the skin lesion in the image is also achieved. The Hilbert transform of the image is expressed as:

F \{H_{r} [I m (x, y)]\} = e^{i p θ} F T [I m (x, y)] = e^{i p θ} F (u, v),

(12)

Here,

p

represents the order of the radial Hilbert transform, and

θ

is the angle in the frequency domain corresponding to the pixel coordinates

(x, y)

after their transformation into Fourier plane coordinates

(u, v)

. This angle is calculated as

θ = a c o s (u / \sqrt{u^{2} + v^{2}})

.

Using Euler’s formula, binary ring masks were generated for the RGB channels and the grayscale skin lesion digital image. These masks utilized both the real

H_{R}

and the imaginary

H_{I}

components of the radial Hilbert transform of the image (Figure 2).

H_{R} = R e [H_{r} (u, v)] = \{\begin{array}{l} 1, i f s i n (p θ) > 0 \\ 0, o t h e r w i s e \end{array},

(13)

H_{I} = I m [H_{r} (u, v)] = \{\begin{array}{l} 1, i f c o s (p θ) > 0 \\ 0, o t h e r w i s e \end{array} .

(14)

The binary ring masks generated earlier were used to filter the skin lesion digital images that had been processed with the magnitude of the Fourier–Mellin transform (Figure 3c). The process involved summing the pixel values within each ring, resulting in two distinct signatures for each grayscale skin lesion image

{S g r a y}_{H_{R}} a n d {S g r a y}_{H_{I}}

; and for its RGB channels:

{S R}_{H_{R}}

,

{S R}_{H_{I}}

,

{S G}_{H_{R}}

,

{S G}_{H_{I}}

,

{S B}_{H_{R}}

, and

{S B}_{H_{I}}

(Figure 3d). Finally, each signature is normalized by its maximum value (Figure 3e).

To incorporate texture descriptors, we applied the uniform Local Binary Pattern (LBP) technique (Figure 3f), a widely used tool in computer vision and image processing for texture analysis.

LBP is a simple yet effective descriptor that captures textures, edges, corners, spots, and flat regions. For each

3 \times 3

pixel block, the intensity of the eight surrounding pixels is compared to the intensity of the central pixel, which serves as the threshold. If the intensity of a neighboring pixel is greater than or equal to the central pixel, its position is assigned a value of 1; otherwise, it is assigned 0. After comparing all pixels, a binary sequence is formed.

This binary sequence is then converted into a decimal value by multiplying each position by its corresponding weight (decimal value) and summing the results. The final LBP value is used to label the central pixel. Figure 4 illustrates the LBP calculation for a pixel with

P = 8

neighboring pixels.

To compute the Local Binary Pattern (LBP) for a grayscale image, the following equation is used:

L B P (x_{c}, y_{c}) = \sum_{p = 0}^{P - 1} {s (I}_{p} - I_{c}) 2^{p},

(15)

where

(x_{c}, y_{c})

represents the central pixel’s coordinates,

P

is the number of neighboring pixels,

I_{P}

is the intensity of the neighboring pixel,

I_{c}

is the intensity of the central pixel, and

s

is a step function defined as:

s (x) = \{\begin{matrix} 1, & x \geq 0 \\ 0, & a n o t h e r v a l u e \end{matrix} .

(16)

The uniform LBP (LBP-U) is a variant of the standard LBP that reduces the dimensionality of the characteristic vector and provides rotational invariance. An LBP is considered uniform if it has at most two transitions between 0 and 1 in the binary sequence. For example, patterns like 11111111 (0 transitions), 11111000 (1 transition), and 11001111 (2 transitions) are uniform, whereas 11010110 (6 transitions) and 11001001 (4 transitions) are non-uniform.

In an eight-pixel neighborhood, 256 patterns can be generated, of which 58 are uniform. These uniform patterns are assigned unique labels (1–58), while all non-uniform patterns are grouped under a single label (59). The LBP-U technique was employed in this study.

After calculating the uniform LBP for each pixel, a histogram of LBP values is constructed to represent the texture distribution for both RGB and grayscale images

L B P_{R}

,

L B P_{G}

,

L B P_{B}

, and

L B P_{G r a y}

. These histograms are then concatenated to form 444 components, creating one-dimensional object signatures (Figure 3g). Figure 5 exemplifies this procedure applied to the R channel.

2.6. Signatures Classification

A neural network model was implemented for classification using the Keras Sequential in Python (Keras version via Tensorflow: 2.9.0, Python version: 3.7.4). The model was structured to process the 444-dimensional feature vectors generated by the radial Fourier–Mellin signatures and LBP histogram from each RGB channel and grayscale. This neural network was designed with multiple dense layers to capture complex relationships within the feature space.

2.7. Model Architecture

The neural network architecture consists of six dense layers. The first layer has 100 units with a ReLU activation function and takes the 444-dimensional feature vector as input. Each of the four hidden layers has 100 units with ReLU activation to introduce non-linearities and model complex patterns. The final layer has eight units with a softmax activation function, corresponding to the classification of the eight skin lesion types in the dataset.

2.8. Model Compilation

The model was compiled using the Adam optimizer, which is well suited for this multi-layer architecture due to its adaptive learning rate capabilities. The loss function was set to sparse categorical cross-entropy, suitable for multi-class classification with integer-labeled target classes. Model accuracy was tracked as a performance metric during training.

2.9. Training Procedure

The model was trained on the feature set using 400 epochs, with x_{train} as the feature input and y_{train} as the target lesion labels. This training process allowed the model to learn representations of the different lesion types based on the extracted features.

2.10. Model Performance

The neural network model’s performance was evaluated using accuracy as the primary metric on both the training and test sets. Additionally, to ensure robustness, we calculated and reported recall, FP rate, specificity, precision, accuracy, and F1 score for each lesion class, providing a comprehensive assessment of the model’s classification capabilities across lesion types.

3. Results

We randomly selected images for classification. The classes in the dataset were balanced, with 1840 images used for each type of skin lesion to prevent classification bias. Then, the deep learning model described in the “Classification” section was trained using a data split of 30% for testing and 70% for training.

To assess the classification performance of the proposed methodology, a variety of standard metrics were employed. Recall (Equation (17)) quantifies the proportion of actual positives correctly identified by the model, emphasizing its ability to capture all instances of a given class. False Positive Rate (Equation (18)) reflects the proportion of negative instances misclassified as positive, serving as a complement to specificity. Specificity (Equation (19)) measures the model’s capability to correctly identify negative instances, calculated as one minus the FP rate. Precision (Equation (20)) indicates the percentage of correctly predicted positive instances, demonstrating the reliability of positive predictions. Accuracy (Equation (21)) provides an overall evaluation of the model, representing the ratio of correctly classified instances (both positive and negative) to the total number of instances. Lastly, the F1 score (Equation (22)) offers a balanced measure of precision and recall, expressed as their harmonic mean, making it particularly useful in scenarios with imbalanced datasets. Collectively, these metrics provide a comprehensive view of the model’s classification performance.

R e c a l l (C_{i}) = \frac{{T P}_{i}}{{T P}_{i} + {F N}_{i}},

(17)

F P r a t e (C_{i}) = \frac{{F P}_{i}}{{F P}_{i} + {T N}_{i}},

(18)

S p e c i f i c i t y (C_{i}) = 1 - F P r a t e (C_{i}),

(19)

P r e c i s i o n (C_{i}) = \frac{{T P}_{i}}{{T P}_{i} + {F P}_{i}},

(20)

a c c u r a c y (C_{i}) = \frac{{T P}_{i} + {T N}_{i}}{{T P}_{i} + {T N}_{i} + {F P}_{i} + {F N}_{i}},

(21)

F 1 s c o r e (C_{i}) = \frac{2 \cdot P r e c i s i o n (C_{i}) \cdot R e c a l l (C_{i})}{P r e c i s i o n (C_{i}) + R e c a l l (C_{i})},

(22)

where

${T P}_{i}$ : true positives for class $i$ .
${T N}_{i}$ : true negatives for class $i$ .
${F P}_{i}$ : false positives for class $i$ .
${F N}_{i}$ : false negatives for class $i$ .

The confusion matrix, Figure 6, reveals that the classification model performs well overall, with diagonal solid dominance indicating accurate predictions for most classes. High accuracy is observed for vascular lesions (VASC), dermatofibroma (DF), squamous cell carcinoma (SCC), and actinic keratosis (AK), with minimal misclassifications. However, notable confusion exists between melanocytic nevus (NV) and melanoma (MEL), as well as between MEL and basal cell carcinoma (BCC), suggesting overlapping features among these lesion types. Additionally, benign keratosis (BKL) is occasionally misclassified as NV.

Table 1 presents a selection of correctly and incorrectly classified skin lesion images to provide qualitative insights into the model’s performance. As shown, the model successfully identifies certain lesion types such as nevus (NV), melanoma (MEL), and basal cell carcinoma (BCC), even in visually complex cases. However, it also exhibits misclassifications, notably predicting melanoma in place of vascular lesions (VASC) and benign keratosis (BKL) instead of nevus (NV).

The ROC curves in Figure 7 illustrate the performance of the multiclass classification model across eight skin lesion classes, showing the relationship between the actual positive rate (sensitivity) and the false positive rate. The area under the curve (AUC) values indicates high classification performance for all classes. Actinic keratosis (AK) and dermatofibroma (DF) achieved perfect classification with an AUC of 1.00, indicating the model’s ability to distinguish these classes with no errors. Similarly, vascular lesions (VASC) and squamous cell carcinoma (SCC) demonstrated near-perfect separability with AUC values of 0.99. Benign keratosis (BKL) and basal cell carcinoma (BCC) followed by AUC values of 0.94 and 0.95, respectively, reflecting strong performance despite minor overlaps. Melanoma (MEL) and melanocytic nevus (NV) exhibited slightly lower AUC values of 0.92 and 0.91, suggesting some challenges in differentiation. The micro-average AUC of 0.97 confirms robust overall performance across all classes, underscoring the model’s ability to handle the complexities of multiclass skin lesion classification effectively.

The entire workflow was repeated 30 times to account for the inherent randomness in processes such as balancing class samples, splitting the dataset into training and testing sets, and the stochastic nature of neural network training. This repetition ensures that the results are not biased by any random data partitioning or model initialization instance. By performing the process 30 times, the average of the performance metrics (e.g., accuracy, recall, precision) could be calculated, along with their error intervals at a 95% confidence level. This approach provides robust and statistically reliable metrics, minimizing the impact of random fluctuations and offering a more accurate assessment of the model’s performance.

The metrics presented in Table 2 indicate the model’s classification performance. Recall values ranged from 55.85% ± 2.33 for melanocytic nevus (NV) to 98.62% ± 0.58 for actinic keratosis (AK), showing varying sensitivity in correctly identifying different lesion types. Similarly, false positive (FP) rates were lowest for AK (0.98% ± 0.16) and vascular lesions (VASC) (0.97% ± 0.10), reflecting high precision in these classes.

Specificity remained consistently high across all classes, ranging from 94.75% ± 0.44 for melanoma (MEL) to 99.03% ± 0.10 for VASC, indicating the model’s strong ability to identify true negatives. Precision followed a similar trend, with higher values for classes like AK (93.64% ± 0.95) and VASC (93.37% ± 0.60) and slightly lower for NV (63.92% ± 1.43) and MEL (62.07% ± 1.57).

Overall accuracy averaged 95.33% ± 0.45, with AK, SCC, and DF achieving the highest individual accuracies (above 98%). The F1 score, reflecting the harmonic mean of precision and recall, varied from 59.34% ± 1.44 for NV to 96.05% ± 0.65 for AK. The overall F1 score was 80.82% ± 1.90, highlighting the model’s balanced performance across all classes.

The computational performance of the proposed method was evaluated on a standard laptop equipped with a 1.6 GHz Dual-Core Intel Core i5 processor and 8 GB of 1600 MHz DDR3 RAM. On average, the signature generation for a 320 × 320 JPEG image required approximately 0.2988 s. The training of the neural network described in this study took an average of 488.49 s. Once trained, the network demonstrated efficient inference capability, requiring only 0.00001805 s to process a signature and produce a classification result. These results highlight the method’s potential for practical deployment, particularly in scenarios where real-time or near-real-time analysis is beneficial.

4. Discussion

The results of this study confirm the effectiveness of the proposed hybrid approach for skin lesion classification. By integrating the Hermite transform with the Radial Fourier–Mellin and Hilbert transform, we achieved a balanced extraction of local and global features essential for robust classification. The Hermite transform’s capability to decompose images into localized polynomial coefficients allowed us to analyze fine-grained spatial details within skin lesions, particularly in the RGB channels. This analysis is critical for identifying pigmentation, texture variations, and boundary definitions, which are vital in distinguishing benign lesions from malignant ones.

On the other hand, the Radial Fourier–Mellin transform ensured scale and translation invariance, addressing challenges posed by varying lesion sizes and positions across images. The Hilbert transform further complemented these by introducing rotational invariance, thus generating repeatable and stable signatures. Combining these three transforms provided a rich and comprehensive feature set, enhancing the discriminative power of the machine learning models applied.

The multi-channel analysis (RGB and grayscale) was found to be particularly beneficial. Each channel emphasized unique lesion characteristics, with the red channel highlighting pigmentation, the green channel focusing on intermediate structures, and the blue channel enhancing fine textures. This approach is consistent with recent studies showing that multi-channel feature extraction outperforms single-channel medical image analysis methods.

Table 3 presents a comparative analysis of various state-of-the-art models for skin lesion classification using different datasets. Among the listed models, Inception-v2 achieved the highest recall (0.9015) and F1 score (0.8876), indicating strong sensitivity and overall performance. VGG19 and ResNeXt101 also demonstrated competitive results across precision, accuracy, and F1 score. However, it is important to highlight that our proposed method shows strength in terms of false positive (FP) rate and specificity, with values of 0.0267 and 0.9733, respectively—metrics not reported for the other methods. This low FP rate suggests a significantly reduced rate of incorrect positive classifications, which is crucial in medical applications to avoid unnecessary patient anxiety or treatment. Despite a slightly lower recall (0.8124), the method achieves the highest accuracy (0.9533) among all models compared, indicating excellent overall classification reliability. These results demonstrate that our approach provides a more balanced trade-off between minimizing false alarms and maintaining high classification performance.

Future work will focus on optimizing the computational pipeline for real-time deployment, including GPU acceleration or applying dimensionality reduction techniques such as PCA or autoencoders to reduce the processing burden introduced by the Hermite transform, particularly when applied across multiple channels.

While the results are promising, some limitations exist. The computational complexity of the Hermite transform, mainly when applied to multiple channels and high-resolution images, can be a challenge for real-time deployment. Future studies could explore optimizing this process or applying dimensionality reduction techniques to streamline computation. Testing the method on other datasets or clinical settings will also validate its generalizability and real-world applicability.

This study presents a novel hybrid approach for automated skin lesion classification, combining the Hermite transform, Radial Fourier–Mellin transform, and Hilbert transform. Integrating these transforms across RGB and grayscale channels allows for comprehensive feature extraction, capturing both global invariance properties and localized spatial details. Experimental results demonstrate that this method enhances classification, accuracy, and robustness, outperforming traditional transform-based approaches.

The findings underscore the potential of hybrid feature extraction techniques for advancing AI-driven dermatological diagnosis tools. Future work will focus on reducing computational costs and expanding the model’s validation to other datasets and clinical applications. Improved precision in automated skin lesion classification suggests that this method can contribute to non-invasive, early detection strategies, ultimately benefiting patient care and clinical decision-making.

Author Contributions

Methodology, L.F.L.-Á. and J.Á.-B.; Software, L.F.L.-Á. and J.Á.-B.; Validation, L.F.L.-Á. and J.Á.-B.; Visualization, L.F.L.-Á. and J.Á.-B.; Supervision, J.Á.-B.; Project administration, J.Á.-B.; Funding acquisition, J.Á.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Centro de Investigación Científica y de Educación Superior de Ensenada, B. C. (CICESE), funding number is F0F181.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Available online: https://challenge.isic-archive.com/data/#2019, accessed on 10 January 2019.

Acknowledgments

Luis Felipe López-Ávila in Centro de Investigación Científica y de Educación Superior de Ensenada, B. C. (CICESE) was supported by SECIHTI, with postdoc application number 4553917, CVU 693156, Clave: BP-PA-20230502163027674-4553917.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shetty, B.; Fernandes, R.; Rodrigues, A.P.; Chengoden, R.; Bhattacharya, S.; Lakshmanna, K. Skin lesion classification of dermoscopic images using machine learning and convolutional neural network. Sci. Rep. 2022, 12, 18134. [Google Scholar]
Wu, Y.; Chen, B.; Zeng, A.; Pan, D.; Wang, R.; Zhao, S. Skin cancer classification with deep learning: A systematic review. Front. Oncol. 2022, 12, 893972. [Google Scholar] [CrossRef] [PubMed]
Dubal, P.; Bhatt, S.; Joglekar, C.; Patil, S. Skin Cancer Detection and Classification. In Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI), Langkawi, Malaysia, 25–27 November 2017; pp. 1–6. [Google Scholar]
Cullell-Dalmau, M.; Noé, S.; Otero-Viñas, M.; Meić, I.; Manzo, C. Convolutional neural network for skin lesion classification: Understanding the fundamentals through hands-on learning. Front. Med. 2021, 8, 644327. [Google Scholar] [CrossRef]
Debelee, T.G. Skin lesion classification and detection using machine learning techniques: A systematic review. Diagnostics 2023, 13, 3147. [Google Scholar] [CrossRef]
Sulthana, R.; Chamola, V.; Hussain, Z.; Albalwy, F.; Hussain, A. A novel end-to-end deep convolutional neural network based skin lesion classification framework. Expert Syst. Appl. 2024, 246, 123056. [Google Scholar]
Nugroho, A.K.; Wardoyo, R.; Wibowo, M.E.; Soebono, H. Image dermoscopy skin lesion classification using deep learning method: Systematic literature review. Bull. Electr. Eng. Inform. 2024, 13, 1042–1049. [Google Scholar] [CrossRef]
Hu, Z.; Mei, W.; Chen, H.; Hou, W. Multi-scale feature fusion and class weight loss for skin lesion classification. Comput. Biol. Med. 2024, 176, 108594. [Google Scholar] [CrossRef]
Benjamin, J.R.; Jayasree, T. Improved medical image fusion based on cascaded PCA and shift invariant wavelet transforms. Int. J. Comput. Assist. Radiol. Surg. 2018, 13, 229–240. [Google Scholar] [CrossRef]
Barajas-Garcia, C.; Solorza-Calderon, S.; Gutierrez-Lopez, E. Scale, translation and rotation invariant wavelet local feature descriptor. Appl. Math. Comput. 2019, 363, 124594. [Google Scholar] [CrossRef]
Kim, H.K.; Kim, J.D. Region-based shape descriptor invariant to rotation, scale and translation. Signal Process. Image Commun. 2000, 16, 87–93. [Google Scholar] [CrossRef]
Popa, C.A.; Cernăzanu-Glăvan, C. Fourier Transform-Based Image Classification Using Complex-Valued Convolutional Neural Networks. In Advances in Neural Networks–ISNN 2018, Proceedings of the 15th International Symposium on Neural Networks, Minsk, Belarus, 25–28 June 2018; Springer International Publishing: Cham, Switzerland, 2018; Volume 15, pp. 300–309. [Google Scholar]
Uzun, I.S.; Amira, A.; Bouridane, A. FPGA implementations of fast Fourier transforms for real-time signal and image processing. IEE Proc. Vis. Image Signal Process. 2005, 152, 283–296. [Google Scholar] [CrossRef]
Braccini, C.; Gambardella, G.; Grattarola, A. Digital Image Processing by Means of Generalized Scale-Invariant Filters. In Issues in Acoustic Signal—Image Processing and Recognition; Springer: Berlin/Heidelberg, Germany, 1983; pp. 315–329. [Google Scholar]
De Sena, A.; Rocchesso, D. A fast Mellin and scale transform. EURASIP J. Adv. Signal Process. 2007, 2007, 089170. [Google Scholar] [CrossRef]
Bigot, J.; Gamboa, F.; Vimond, M. Estimation of translation, rotation, and scaling between noisy images using the Fourier–Mellin transform. SIAM J. Imaging Sci. 2009, 2, 614–645. [Google Scholar] [CrossRef]
Guerra-Rosas, E.; López-Ávila, L.F.; Garza-Flores, E.; Vidales-Basurto, C.A.; Álvarez-Borrego, J. Classification of skin lesion images using artificial intelligence methodologies through radial Fourier–Mellin and Hilbert transform signatures. Appl. Sci. 2023, 13, 11425. [Google Scholar] [CrossRef]
Garza-Flores, E.; Guerra-Rosas, E.; Álvarez-Borrego, J. Spectral indexes obtained by implementation of the fractional Fourier and Hermite transform for the diagnosis of malignant melanoma. Biomed. Opt. Express 2019, 10, 6043–6056. [Google Scholar] [CrossRef]
Martins, A.S.; Neves, L.A.; de Faria, P.R.; Tosta, T.A.A.; Longo, L.C.; Silva, A.B.; Roberto, G.F.; Nascimento, M.Z.D. A Hermite polynomial algorithm for detection of lesions in lymphoma images. Pattern Anal. Appl. 2021, 24, 523–535. [Google Scholar] [CrossRef]
Barba-J, L.; Vargas-Quintero, L.; Calderón-Agudelo, J.A. Bone SPECT/CT image fusion based on the discrete Hermite transform and sparse representation. Biomed. Signal Process. Control 2022, 71, 103096. [Google Scholar] [CrossRef]
Agrafioti, F.; Gao, J.; Hatzinakos, D.; Yang, J. Heart biometrics: Theory, methods and applications. Biometrics 2011, 3, 199–216. [Google Scholar]
Damian, F.A.; Moldovanu, S.; Dey, N.; Ashour, A.S.; Moraru, L. Feature selection of non-dermoscopic skin lesion images for nevus and melanoma classification. Computation 2020, 8, 41. [Google Scholar] [CrossRef]
McGuire, M. An image registration technique for recovering rotation, scale and translation parameters. NEC Res. Inst. Tech. Rep. 1998, TR-98-018, 1–29. [Google Scholar]
Ngo, L.H.; Luong, M.; Sirakov, N.M.; Viennet, E.; Le-Tien, T.T. Skin lesion image classification using sparse representation in quaternion wavelet domain. Signal Image Video Process. 2022, 16, 1721–1729. [Google Scholar] [CrossRef]
Was, L.; Milczarski, P.; Stawska, Z.; Wyczechowski, M.; Kot, M.; Wiak, S.; Wozniacka, A.; Pietrzak, L. Analysis of Dermatoses Using Segmentation and Color Hue in Reference to Skin Lesions. In Artificial Intelligence and Soft Computing, Proceedings of the 16th International Conference, ICAISC 2017, Zakopane, Poland, 11–15 June 2017; Springer International Publishing: Cham, Switzerland, 2017; Volume 16, pp. 677–689. [Google Scholar]
Gerstenblith, M.R.; Shi, J.; Landi, M.T. Genome-wide association studies of pigmentation and skin cancer: A review and meta-analysis. Pigment. Cell Melanoma Res. 2010, 23, 587–606. [Google Scholar] [CrossRef] [PubMed]
Gallagher, R.P.; Hill, G.B.; Bajdik, C.D.; Coldman, A.J.; Fincham, S.; McLean, D.I.; Threlfall, W.J. Sunlight exposure, pigmentation factors, and risk of nonmelanocytic skin cancer: II. Squamous Cell carcinoma. Arch. Dermatol. 1995, 131, 164–169. [Google Scholar] [CrossRef] [PubMed]
Peng, S.; Jiang, H.; Wang, H.; Alwageed, H.; Zhou, Y.; Sebdani, M.M.; Yao, Y.D. Modulation classification based on signal constellation diagrams and deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 718–727. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
Malik, H.; Farooq, M.S.; Khelifi, A.; Abid, A.; Qureshi, J.N.; Hussain, M. A comparison of transfer learning performance versus health experts in disease diagnosis from medical imaging. IEEE Access 2020, 8, 139367–139386. [Google Scholar] [CrossRef]
Mijwil, M.M. Skin cancer disease images classification using deep learning solutions. Multimed. Tools Appl. 2021, 80, 26255–26271. [Google Scholar] [CrossRef]
Indraswari, R.; Rokhana, R.; Herulambang, W. Melanoma image classification based on MobileNetV2 network. Procedia Comput. Sci. 2022, 197, 198–207. [Google Scholar] [CrossRef]
Zhi, S.; Li, Z.; Yang, X.; Sun, K.; Wang, J. A multiclassification model for skin diseases using dermatoscopy images with Inception-v2. Appl. Sci. 2024, 14, 10197. [Google Scholar] [CrossRef]

Figure 1. Representative examples of skin lesion images from the ISIC 2019 dataset used in this study. These images illustrate the visual variability across lesion types, including differences in size, shape, color, and texture, which the proposed hybrid feature extraction method is designed to handle.

Figure 2. Filtering masks used in the radial Hilbert transform step. (a) A binary disk mask applied in the frequency domain to localize the transform. (b) Real component (

H_{R}

) of the radial Hilbert mask. (c) Imaginary component (

H_{I}

) of the mask.

Figure 2. Filtering masks used in the radial Hilbert transform step. (a) A binary disk mask applied in the frequency domain to localize the transform. (b) Real component (

H_{R}

) of the radial Hilbert mask. (c) Imaginary component (

H_{I}

) of the mask.

Figure 3. Workflow of the proposed hybrid transform-based feature extraction method. (a) RGB and grayscale channel separation of skin lesion images. (b) The Hermite transform of order (1,1) is applied to capture localized spatial features. (c) Computation of Radial Fourier–Mellin transform for translation and scale invariance. (d) Application of the radial Hilbert transform for rotational invariance. (e) Normalization of transform-based signatures. (f) Extraction of Local Binary Pattern (LBP) features for texture representation. (g) Final signature generation through concatenation of transform and LBP descriptors.

Figure 4. Illustration of the Local Binary Pattern (LBP) computation. The surrounding 3 × 3 neighborhood is compared for each central pixel to determine binary threshold values. The resulting binary sequence is converted into a decimal code to label the central pixel, capturing local texture patterns.

Figure 5. End-to-end feature extraction steps for a skin lesion image’s red (R) channel. (a) Original R-channel input. (b) The result of the Hermite transform of order (1,1) highlights localized image features. (c) Modulus of the Fourier–Mellin transform for scale and translation invariance. (d) Real part and (e) imaginary part of the radial Hilbert transform used to ensure rotational invariance. (f) LBP feature extraction from the R-channel. (g) The final one-dimensional signature vector was obtained by concatenating all feature descriptors.

Figure 6. Confusion matrix of the classification model on the test set. Rows represent the actual lesion classes, while columns represent the predicted classes. The intensity of each cell in the heatmap corresponds to the number of predicted instances; lighter shades indicate higher values and thus stronger model confidence in those predictions. The strong diagonal dominance indicates high classification accuracy for most skin lesion types, with some misclassifications occurring between visually similar lesions.

Figure 7. Multiclass receiver operating characteristic (ROC) curves for each of the eight skin lesion classes. The micro-average ROC is also included to summarize overall model effectiveness.

Table 1. Representative examples of correct and incorrect predictions made by the proposed classification methodology. Each row shows a skin lesion image, its actual class, the predicted class, and whether the classification was correct.

Actual Class	Predicted Class	Accurate Classified
NV	NV	True
MEL	MEL	True
VASC	MEL	False
BCC	BCC	True
NV	BKL	False

Table 2. Performance evaluation metrics (multiplied by 100) for the proposed classification model across eight skin lesion classes.

Class	Recall	FP Rate	Specificity	Precision	Accuracy	F1 Score
NV	55.85 ± 2.33	4.50 ± 0.33	95.50 ± 0.33	63.92 ± 1.43	90.60 ± 0.22	59.34 ± 1.44
MEL	59.31 ± 2.11	5.25 ± 0.44	94.75 ± 0.44	62.07 ± 1.57	90.34 ± 0.32	60.39 ± 1.26
BCC	75.16 ± 1.18	2.93 ± 0.29	97.07 ± 0.29	78.54 ± 1.76	94.35 ± 0.29	76.73 ± 1.16
BKL	71.68 ± 1.71	3.94 ± 0.31	96.06 ± 0.31	72.61 ± 1.32	92.99 ± 0.28	72.01 ± 1.06
VASC	94.77 ± 0.99	0.97 ± 0.10	99.03 ± 0.10	93.37 ± 0.60	98.50 ± 0.12	94.03 ± 0.50
DF	97.54 ± 0.72	1.17 ± 0.13	98.83 ± 0.13	92.29 ± 0.79	98.66 ± 0.17	94.83 ± 0.65
SCC	97.01 ± 0.83	1.59 ± 0.17	98.41 ± 0.17	89.71 ± 1.00	98.23 ± 0.22	93.20 ± 0.82
AK	98.62 ± 0.58	0.98 ± 0.16	99.02 ± 0.16	93.64 ± 0.95	98.97 ± 0.18	96.05 ± 0.65
Overall Average	81.24 ± 2.19	2.67 ± 0.22	97.33 ± 0.22	80.77 ± 1.64	95.33 ± 0.45	80.82 ± 1.90

Table 3. Comparison of the proposed method with state-of-the-art models for skin lesion classification using the ISIC-2019 and other benchmark datasets.

Model	Dataset	Recall	FP Rate	Specificity	Precision	Accuracy	F1 Score
2D superpixels + RCNN [28]	HAM-10000	0.8450	-	-	0.8349	0.8550	0.8530
ResNeXt101 [29]	ISIC-2019	0.8810	-	-	0.8740	0.8850	0.8830
MobileNetV2 [30]	ISIC-2019	0.8633	-	-	0.7890	0.8530	-
VGG19 [31]	ISIC-2019, Derm-IS	0.8666	-	-	0.9070	0.8857	0.8765
ConvNet [32]	ISIC-2019, Derm-IS	0.8747	-	-	0.8614	0.8690	-
Inception-v2 [33]	ISIC-2019	0.9015	-	-	0.8737	0.8904	0.8876
This work	ISIC-2019	0.8124	0.0267	0.9733	0.8077	0.9533	0.8082

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

López-Ávila, L.F.; Álvarez-Borrego, J. Hybrid Transform-Based Feature Extraction for Skin Lesion Classification Using RGB and Grayscale Analysis. Appl. Sci. 2025, 15, 5860. https://doi.org/10.3390/app15115860

AMA Style

López-Ávila LF, Álvarez-Borrego J. Hybrid Transform-Based Feature Extraction for Skin Lesion Classification Using RGB and Grayscale Analysis. Applied Sciences. 2025; 15(11):5860. https://doi.org/10.3390/app15115860

Chicago/Turabian Style

López-Ávila, Luis Felipe, and Josué Álvarez-Borrego. 2025. "Hybrid Transform-Based Feature Extraction for Skin Lesion Classification Using RGB and Grayscale Analysis" Applied Sciences 15, no. 11: 5860. https://doi.org/10.3390/app15115860

APA Style

López-Ávila, L. F., & Álvarez-Borrego, J. (2025). Hybrid Transform-Based Feature Extraction for Skin Lesion Classification Using RGB and Grayscale Analysis. Applied Sciences, 15(11), 5860. https://doi.org/10.3390/app15115860

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Transform-Based Feature Extraction for Skin Lesion Classification Using RGB and Grayscale Analysis

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Dataset

2.2. RGB Channel and Grayscale Separation

2.3. Hermite Transform

2.4. The Signatures

2.5. Radial Fourier–Mellin Signatures Through Hilbert Transform

2.6. Signatures Classification

2.7. Model Architecture

2.8. Model Compilation

2.9. Training Procedure

2.10. Model Performance

3. Results

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI