1. Introduction
Skin cancer poses a significant global health challenge, as evidenced by its increasing prevalence over the years [
1]. The early and accurate diagnosis of skin lesions is critical for effective treatment. Conventional imaging methods, such as dermoscopy [
2] and RGB imaging [
3], often face limitations in providing detailed information about skin lesions. In response to this challenge, hyperspectral imaging (HSI) emerged as a promising non-invasive and contactless technique [
4,
5,
6], offering a comprehensive view of skin lesions by capturing a broad range of wavelengths. HSI capabilities exceed those of traditional methods, allowing for the extraction of detailed spectral information from the observed tissues. This technique combines imaging and spectroscopy, producing data in the form of a 3D hyperspectral cube where the first two dimensions represent the spatial location of the sample, and the third dimension represents the light spectrum (see
Figure 1).
In medical applications, HSI provides images rich in information about the tissue examined, including details about its structure, chemical composition, and other physical properties. This wealth of information assists in diagnosing diseases, monitoring treatment progress, identifying changes in tissue, and providing guidance during surgical procedures. The advanced capabilities of HSI make it a valuable tool for characterizing skin lesions with high precision [
4,
5,
6].
In the past few years, numerous studies have explored the utilization of HSI in detecting skin lesions [
7,
8,
9,
10,
11]. Various approaches have utilized artificial intelligence (AI) and HSI for precise lesion segmentation and classification [
12,
13,
14,
15,
16,
17,
18]. Huang et al. [
12] utilized AI and HSI to distinguish mycosis fungoides with high sensitivity and specificity. Nagaoka et al. [
13] developed a melanoma screening system using a melanoma discrimination index based on pigment characteristics. Lindholm et al. [
14] applied convolutional neural network (CNN) analysis with a hand-held spectral imager to differentiate both pigmented and non-pigmented lesions, achieving notable accuracy. Additionally, Leon et al. [
15] differentiated pigmented lesions through a combination of algorithms, while Hosking et al. [
16] achieved high sensitivity but low specificity in melanoma detection. Parasca et al. [
17] assessed carcinoma margins with spectral angle mapping (SAM), and Courtenay et al. [
18] developed classification tools using near-infrared hyperspectral imaging to differentiate between non-melanoma skin cancers and actinic keratosis. A summary of these studies, including their methodology, results, and limitations, is provided in
Table 1.
In our study, we leveraged the power of HSI to analyze skin carcinomas of the head and neck regions. We explored the extraction of radiomic features from tissue index images derived from hyperspectral data, providing a unique perspective for analyzing skin lesions. By integrating radiomic features derived from these images with machine learning (ML) techniques, our aim was to uncover the radiomic signatures associated with tissue characteristics, contributing to a deeper understanding of skin lesion pathology. However, the lack of standardized radiomic pipelines presents challenges to ensuring the reproducibility and clinical applicability of findings in the field [
19]. Currently, we are at IDEAL stages 2a to 2b, focusing on advancing the application and assessing the feasibility, safety, and potential benefits of our approach with a small number of patients, before moving on to more rigorous evaluations and comparisons with other methods [
20]. The aim of this pilot study was to identify the most relevant radiomic features derived from tissue index images to determine which of these indices possessed higher discriminative power, ultimately contributing to higher classification accuracy in differentiating between tumor, peritumor, and healthy skin regions.
Specifically, we conducted a classification analysis in three scenarios: when only RGB images were used (Scenario I), when tissue index images were used (Scenario II), and when their combination was used (Scenario III). This exploration allowed us to discern if the information from tissue index images enhanced the accuracy of distinguishing between tumors, peritumor areas, and healthy skin regions in the head and neck.
Contrasting the existing methods, which dominantly relied on full hyperspectral images, the approach presented in this paper uses biologically significant tissue index images that provide dimensionality reduction while being indicative of the morphological and physiological changes associated with skin carcinoma, allowing for a more comprehensive analysis of tissue characteristics.
Through the extraction and identification of the most relevant radiomic features that are characteristic of the tumor, the presented analysis provides a new layer of interpretability that has not been sufficiently addressed in previous studies. By focusing on the features extracted from tissue index images, which are estimations of blood content, oxygenation, melanin, water content, and other relevant properties, our approach identifies meaningful and interpretable biomarkers that contribute to the overall interpretability of the classification process.
The developed pipeline that integrates radiomic feature extraction, feature selection, and classification allows for the identification and selection of the most relevant radiomic features for more reliable classification.
The evaluation of classification performance across three different scenarios allowed us to assess the effectiveness of the proposed pipeline and the resulting models in different contexts as well as improves our understanding of how each image type contributes to lesion classification.
2. Data and Methodology
2.1. Imaging System
A handheld hyperspectral camera, Specim IQ (Specim, Oulu, Finland), with a 400–1000 nm spectral range, 7 nm spectral resolution, and 512 × 512 spatial pixels, was used to perform the imaging. The camera was mounted on a photographic tripod, ensuring stability. The field of view covered approximately 10 × 10 cm2. Illumination was provided by a 150 W halogen lamp coupled to a ring illuminator through an optical fiber bundle (MI-150DG1, Vision Light Tech, Uden, The Netherlands), ensuring adequate lighting across the entire spectral range of the camera. In order to minimize direct reflections from the skin, a set of crossed polarizers was placed in front of both the camera and the ring illuminator.
2.2. Data Acquisition
Skin images captured were normalized using a pre-recorded image of a white standard (Spectralon, Labsphere Inc., North Sutton, NH, USA). Using a white reference designed specifically for such purpose eliminated the need to verify the standard itself. The raw hyperspectral radiance was converted into reflectance using the following equation [
4]:
where
represents the raw hyperspectral intensity,
denotes the intensity of the white standard reference, and
is the dark current measured when the camera shutter is closed. After normalization, the images were denoised using the minimum noise fraction (MNF) technique [
21]. In brief, MNF first whitens the noise by estimating the noise covariance matrix from neighbouring pixels and applies it to the data, thus removing correlations and effectively making the noise white. Following this step, MNF performs principal component analysis to determine the components containing the majority of the signal. Using only these components the image can then be back-projected into a noise-filtered version.
The imaging itself was performed in a few distinct steps. Before starting this work, the camera was adequately prepared by fully charging the batteries and verifying the calibration, as per the manufacturer’s instructions, which included imaging an orange sample provided with the camera possessing known reflectance values and comparing the results to the on-camera reference measurement. The whole process was automated in the camera control software. Before the acquisition itself, the illumination light source was warmed up to reduce the effects of heating. The position of crossed polarizers in front of the objective and on the ring light used for the illumination was verified with a metallic mirror and adjusted as necessary to achieve minimal mirror reflection. Following this step, the procedure room was blacked out by turning off the ambient lights and closing the window shades, if applicable. Following the preparation, just before imaging the patient, a white reference was acquired using the standard protocol for the camera by placing the reference standard at a predefined distance using a height standard. The distance from the camera to the sample was set in such a way that the white reference was imaged at the same distance as the lesion itself, where the overall surfaces were highly curved. The imaging of the patients themselves was conducted by first covering their eyes and then setting them at an appropriate distance from the camera, defined by the height standard.
Figure 2 shows the camera setup used during the imaging process, along with the Fiber-Lite MI-150 high-intensity illuminator, which ensured adequate lighting conditions.
To minimize any potential movement during the measurements, the patients were lying down on an examination bed while the images were captured. To achieve good positioning, motorized height control of the exam table was used. During the acquisition, the same integration time for the acquisition of the white reference and the image itself was assured to both prevent oversaturation as well as eliminate possible uncertainties introduced by the on-board normalization of the data to the exposure time. Typical values for the integration time were 50 ms per line, thus resulting in a total acquisition time of about 30 s for the whole image. Following the image acquisition, some preliminary quality assurance on the data was performed by the operator; the image was checked for saturation and appropriate spectral shapes of the normalized reflectance in the tumor area and in the surrounding tissue by validating the general trends (hemoglobin absorption peaks) and absolute values (must be lower than 1, preferably between 0.3 and 0.7). Finally, the light source was turned off, and the normal clinical workflow was resumed. Before using the equipment, the protocol was tested, and new operators were thoroughly trained by an experienced user of the imaging system.
2.3. Tumor Dataset Overview
The dataset in this study contained 24 head and neck skin carcinomas from 16 patients, including BCC and SCC types that were selected clinically for ECT treatment. Among these patients, there were 6 women and 10 men, with an age range of 60 to 95 years. These cases were recruited from the Department of Otorhinolaryngology and Cervicofacial Surgery, Ljubljana. Clinical and histopathological characteristics, including patient age, sex, tumor location, and histopathological type (SCC or BCC), were recorded for each participant. A summary table of these data is provided in
Appendix A.
Patients were selected for ECT based on several inclusion criteria: histologically or cytologically confirmed squamous cell or basal cell carcinoma in the head and neck area, the unsuitability of tumors for standard treatment (due to factors like tumor location or patient refusal), being over 18 years old, and maintaining a Karnofsky [
22] performance status of 70 or higher. Key exclusion criteria included life-threatening systemic diseases, severe coagulation disorders, and significantly impaired lung function. For a detailed list of inclusion and exclusion criteria, please refer to
Appendix B.
The dataset consisted of RGB images and 12 tissue index images derived from raw hyperspectral data, which were opened and processed using MATLAB (Mathworks, Natick, MA, USA). To generate the RGB images, normalized spectra were projected on the CIE XYZ colorspace by multiplication with the tristimulus values and converting from XYZ to RGB colorspace using the xyz2rgb function in MATLAB with the standard D65 illuminant option [
23,
24]. Tissue indices are image metrics or mathematical representations used to evaluate tissue properties based on the reflected light spectra captured by the hyperspectral imaging system [
25,
26,
27,
28,
29]. These indices were derived from specific spectral bands chosen for their sensitivity to particular tissue characteristics or physiological parameters. The tissue indices used for this analysis are given in
Table 2, and the equations for calculating these indices are provided in
Appendix C.
This study’s protocol aligned with the guidelines outlined in the Declaration of Helsinki and was approved by the Slovenian National Medical Ethics Committee (application number 0120-135/2021/3). Prior to imaging procedures, informed consent was obtained from all human patients participating in this study.
2.4. Ground Truth Mask Generation
In addition to the images of the skin carcinomas, the dataset included tumor and peritumor region masks. The peritumor region refers to the area exhibiting visual deviations from normal skin morphology, as assessed through clinical examination, while remaining distinct from the macroscopically defined tumor tissue, which we treat with ECT electrodes. This area is clinically relevant because it allows the effective targeting of adjacent tissue that may be affected by the tumor, ensuring comprehensive treatment.
The masks were segmented collaboratively and simultaneously by two medical experts. The initial assessment was conducted through clinical examination, after which the experts delineated tumor and peritumor borders on the RGB photos. Both experts worked together throughout the entire segmentation process, discussing and refining the tumor boundaries in real time to ensure clinical accuracy. A single mask was generated for each image, which was then used for all subsequent analyses.
Furthermore, after delineating the tumor and peritumoral regions, the entire skin region was separated from the background using image processing techniques like the spectral angle mapper (SAM) [
30]. The SAM evaluates the spectral resemblance between the spectra in each pixel of a hyperspectral image and a designated reference spectrum to distinguish the skin from the background. The reference spectrum was derived by manually selecting a pixel in the skin region from an RGB image to identify its coordinates, which were then used to extract the corresponding values from all the other tissue indices derived from the hyperspectral images. This process provided us with a reference spectrum for the skin region and was repeated for each image in the dataset. This technique allowed us to distinguish the skin region from the surrounding elements, effectively eliminating unwanted background.
These regions formed the foundation for our classification task, representing the three main classes in our study. In total, we had 72 regions of interest, with 24 representing tumors (class I), 24 representing peritumor regions (class II), and another 24 representing the healthy regions of the skin (class III), each providing insights into the diverse tissue types under investigation.
Figure 3 shows a volunteer with a BCC tumor on the nose. The quantitative comparison of the tissue indices in the tumor, peritumor, and healthy tissues is shown in the grayscale subfigures presented in
Figure 3, each accompanied by a color bar indicating the corresponding index values.
Figure 3a shows the RGB image of a volunteer’s head, with the corresponding masked overlay presented in
Figure 3b, highlighting three regions of interest: the tumor region in red, the peritumor region in blue, and the healthy skin in green. The erythema index (
Figure 3c) exhibited an average value of 98.97 ± 21.43 for the tumor, 70.58 ± 13.32 in the peritumor region, and 67.87 ± 20.70 for the healthy skin. These values suggested increased blood concentration in the tumor region and decreased levels in healthy regions. Deeper skin oxygenation, as indicated by Ishimaru’s oxygenation index (
Figure 3d), exhibited higher values in tumor regions, with values of 70.16 ± 2.12, compared to 68.41 ± 2.24 in the peritumor and 65.34 ± 4.02 in the healthy skin. Similarly, superficial oxygenation, represented by Huang’s oxygenation index (
Figure 3e), also demonstrated elevated values in tumor regions compared to the peritumor area and the healthy skin. Additionally, the melanin index (
Figure 3f) revealed differences between the tumor and healthy skin regions, with values of 5.92 ± 0.94 in the tumor and 4.46 ± 0.91 in the healthy skin, further emphasizing the variations observed in tissue characteristics across different regions.
2.5. Feature Extraction and Selection
PyRadiomics Python library (version 3.1.0) [
31] was utilized to extract first- and second-order radiomic features from both the RGB images and 12 tissue index images for each region. PyRadiomics provided a wide range of features, generally classified as shape-based and first- and second-order statistics. For a comprehensive overview of these features, please refer to Scapicchio et al. [
32], which includes detailed descriptions of the classes of features, such as first-order and second-order statistics, that were used in this analysis.
Independent of the gray-level intensity distribution within the region of interest, the shape-based features describe the 2D or 3D size and shape of the region of interest. This feature class was intentionally excluded during the initial extraction process, considering the unique morphological attributes of the tumor, peritumor, and healthy skin regions. Given the distinct shapes—tumors being round, peritumors being round with a central void, and healthy skin being square with a central void—incorporating shape features was deemed unnecessary. This enabled us to concentrate on extracting features that provided greater insight into the tissue composition and pathology, thereby enhancing the precision of our classification.
First-order statistics features, such as energy, standard deviation, and entropy, quantify the distribution of the pixel intensity values within a given region of interest. These features assess pixel values independent of their spatial arrangement, providing insights into the overall distribution of intensity values within the image. For more detailed definitions, including the equations for the calculation of these features as well as the second-order features, please refer to the PyRadiomics documentation [
31], which served as a reference standard for the radiomic analysis.
Second-order statistics features focus on how pairs of pixels are distributed spatially, revealing patterns not easily seen by the human eye. In our analysis, we used these features as the input for our classifiers to improve the differentiation between tumoral and healthy tissue by leveraging the textural information they provided.
The analysis of second-order statistics features involved two steps: first, creating a matrix that captured the spatial distribution of pixel values; second, calculating metrics from this matrix. Once the matrix was created, metrics were computed to summarize the information contained within it. Each matrix produced a single value for each feature, reflecting the spatial relationships and distributions of the pixel intensities between pairs of pixels.
Common matrices used include the Gray-Level Co-occurrence Matrix (GLCM), Gray-Level Run Length Matrix (GLRLM), Gray-Level Size Zone Matrix (GLSZM), Neighboring Gray-Tone Difference Matrix (NGTDM), and Gray-Level Dependence Matrix (GLDM). The GLCM contains statistical information that represents the distribution of pixel pairs throughout the image. The length of successive pixels in an image with the same intensity in a predetermined direction is expressed by the GLRLM. Gray-level zones, or the number of connected pixels with the same gray-level intensity, are quantified in an image by the GLSZM. The NGTDM measures the difference between the gray value of a pixel and the average gray value of its neighboring pixels within a specified distance. Gray-level dependencies, or the number of connected pixels within a certain distance that are dependent on the center pixel, are quantified with the GLDM in images.
The features of all classes, except shape-based ones, were extracted, resulting in a set of 93 features; however, only those showing a high correlation with the target label were used for the analysis. The selection of the most relevant features was conducted using the minimum redundancy–maximum relevance (mRMR) algorithm [
33]. This algorithm employs two criteria, as implied by its name: maximum relevance and minimum redundancy. Maximum relevance ensures the selected features have a strong correlation with the target variable, while minimum redundancy ensures that these features are unique and dissimilar. Initially, we applied the mRMR algorithm to each of the 12 tissue index images, selecting the 7 most relevant features. The same process was repeated for the RGB images, picking out the 7 most relevant features.
Our approach ensured the identification of the key features from each of the 12 tissue index images and RGB images, resulting in an initial set of features. We again utilized the mRMR algorithm to select the 7 most crucial features from the remaining set. This selection process occurred in two scenarios: when using only tissue indices for the analysis and when combining them with RGB images. We chose to test the combination of features from the RGB and tissue index images to evaluate whether the combined approach would yield improved classification results and to determine which type of images provided more relevant information in this case. Our analysis revealed that only one feature from the RGB images was significant in this case, while the majority of the discriminative power came from the features extracted from the tissue index images, underscoring their predominant role in enhancing the overall classification of skin lesions. During our experiments, we explored both lower and higher numbers of features. We found that using fewer features still gave good performance with the tissue index images, achieving AUC values above 0.90, while the results were less favorable for the RGB images. Similarly, increasing the number of features did not lead to any significant improvement in classification accuracy. Ultimately, the combination of 7 features proved to be the best choice. Therefore, we opted to maintain a concise set of features to enhance computational efficiency and avoid overfitting.
While the mRMR algorithm primarily prioritized features that distinguished between tumor and healthy skin due to their more significant differences in feature values, further analysis revealed that certain features were particularly effective in distinguishing between the tumor and peritumor classes. This observation was especially evident in binary classification scenarios where only tumor and peritumor classes were involved. Therefore, in each scenario—Scenario I (using only RGB images), Scenario II (using only tissue index images), and Scenario III (using their combination)—we employed the mRMR algorithm to identify at least two features that demonstrated high discriminatory power between the tumor and peritumor classes. This approach ensured that our final set of features captured important distinctions between tumor and peritumor regions, despite the bias of the mRMR algorithm towards features distinguishing between tumor and healthy skin. This refined selection, leveraging the mRMR algorithm once again, played a key role in optimizing our analysis.
The selected radiomic features for each scenario are summarized in
Table 3. While the features from the RGB images in Scenario I are listed only by their names, in Scenarios II and III, the selected features are accompanied by symbols. These symbols (given in
Table 2) indicate their origin, specifying from which tissue indices the features were derived. In Scenario III, the symbol “RGB” denotes the feature derived from RGB images.
As shown in
Table 3, among the tissue indices, the erythema, oxygenation, and water content indices demonstrated higher discriminative power for the classification analysis, as the features derived from these indices were identified as the most relevant for our analysis.
2.6. Automated Model Selection and Hyperparameter Optimization
We utilized the Tree-Based Pipeline Optimization Tool (TPOT) [
34,
35,
36] for model selection and hyperparameter optimization. In the context of ML, hyperparameters are configuration settings that are external to the model and are not learned from the data. They play a crucial role in determining the behavior and performance of the model [
37].
The TPOT, an automated ML tool based on genetic programming, systematically explores a range of hyperparameter settings and considers a diverse set of classifiers, including decision trees, support vector machines, and gradient boosting classifiers, during its optimization process. This process enabled us to identify the most suitable model for our dataset with fine-tuned hyperparameters. Hyperparameter optimization is significant as it can considerably impact model performance. By fine-tuning these parameters, we enhanced accuracy, reduced overfitting, and improved the model’s ability to perform better on unseen data. In each of the three scenarios, the TPOT generated a pipeline tailored to the specific data. In Scenario I, the selected 7 radiomic features from RGB images were utilized as the input data for the classification task, which aimed to distinguish between tumor, peritumor, and healthy regions. In this case, TPOT generated the Gradient Boosting Classifier as the most optimal classifier. In Scenario II, the selected 7 features from the tissue indices, such as the Dawson’s erythema index and Huang’s oxygenation index, were used for the classification task. In this scenario, the Extra Trees Classifier was generated by the TPOT as the best-performing model. A pipeline with a unique configuration was generated by the TPOT in Scenario III, which combined features from tissue indices and RGB images. This pipeline included a Stacking Estimator using the SGD Classifier as its foundation, followed by a feature constructor, Zero Count, which counted the occurrences of zero/non-zeros per sample, and concluded with a Linear SVC. The specific hyperparameters for each model are detailed in
Table 4. For a detailed description of these hyperparameters, refer to the scikit-learn documentation [
38], which provides comprehensive explanations for each parameter used in the classifiers we applied.
Maintaining consistency in our approach, specifically by setting the number of generations to 100 and the population size to 50 in the TPOT, ensured a thorough exploration of the hyperparameter options, aiming for optimal model performance. This approach with the TPOT for model selection and hyperparameter optimization ensured a comprehensive and optimized modeling process. Moreover, for all scenarios, whether using only RGB images, tissue indices, or their combinations, we split our data into 75% for training the classifiers and 25% for testing.
Following this, the best pipelines generated by the TPOT in each of the three scenarios underwent evaluation through repeated stratified k-fold cross-validation. This validation method involved 8 splits and 30 repetitions, ensuring a robust assessment of the model’s generalization performance. To comprehensively evaluate the performance of the generated pipelines, we utilized a set of diverse metrics, including accuracy, precision, recall, balanced accuracy, and F1 score. These metrics were selected to provide an evaluation of the capabilities of models in capturing different aspects of classification performance. The computation of the standard deviations for each metric provided insights into the stability and consistency of the model’s performance across different folds and repetitions.
As a further step in our analysis, we also calculated the Receiver Operating Characteristic (ROC) curves and the AUC as the mean of the four folds for each of the best-performing models using stratified k-fold cross-validation. Utilizing a one-vs.-rest approach to handle the three-class classification problem, the optimal value of k was determined to be 4. For Scenario I, the probability thresholds that yielded optimal performance were found to be between 0.10 and 1.00. In Scenario II, the optimal probability thresholds ranged from 0.10 to 0.90; however, changing the threshold to 0.80 or even 1.00 did not affect the model’s performance. In Scenario III, due to the unique pipeline configuration that did not provide probability outputs, performance metrics such as accuracy, precision, recall, and F1 score, which had been evaluated across all scenarios, were deemed more suitable for evaluating classifier performance. These metrics offered insights into the classification capabilities of the model given its non-probabilistic output nature.
After generating pipelines using the TPOT, we assessed the importance scores of the features in each scenario. This post-TPOT analysis offered insights into the contribution of individual features to the optimized models. In each case study, whether focusing on RGB images, tissue index images, or their combination, we employed the Extra Trees Classifier to calculate the importance scores. In
Figure 4, we present a schematic overview of the workflow.
4. Discussion
The integration of HSI with ML techniques offers advantages in the analysis of skin lesions. By capturing both spatial and spectral information, HSI provides rich data that can reveal the subtle variations in tissue properties associated with different lesion types. In this study, we investigated the potential of integrating HSI-derived tissue indices with ML classifiers to enhance the classification of skin lesions, particularly focusing on skin carcinomas in the head and neck region. Our findings demonstrate the promising utility of this approach in improving the accuracy and efficacy of lesion classification.
Moreover, this study highlights the importance of feature selection in maximizing the discriminative power of RGB and tissue index images, where a subset of the most relevant features was identified using the mRMR algorithm. The selected features provided insights into characteristics and patterns associated with skin carcinomas in the head and neck region. Features related to texture, non-uniformity, and energy emerged as significant contributors to the classification task, reflecting the complex nature of skin lesion morphology and composition.
In Scenario I, key features such as GLRLM Gray-Level Non-Uniformity, GLRLM Run Length Non-Uniformity, GLDM Gray-Level Non-Uniformity, and GLDM Dependence Non-Uniformity had higher importance scores. These features capture the non-uniformity in gray-level intensity values and run lengths throughout the image. We found that these values were consistently higher in the healthy skin region than in both peritumor and tumor regions, aligning with our expectation that healthy tissue displays a broader range of grayscale intensities. In addition to these features, the analysis also included features from the NGTDM class, such as busyness, coarseness, and first-order energy, which contributed to the classification but had lower importance scores.
In Scenario II, a feature from the GLCM class, IMC1, stood out as the most significant, with the highest importance score. Extracted from the erythema index image, IMC1 provided insights into texture complexity. Higher negative IMC1 values in tumor regions suggest decreased texture complexity, reflecting the homogeneous nature of tumor tissue. In contrast, lower negative IMC1 values in peritumor and healthy skin regions indicate more complex patterns, consistent with the expected diversity of healthy skin. The analysis also highlighted the significance of first-order energy features extracted from the superficial and deeper oxygenation indices, which were anticipated to be relevant for classification due to the observed higher levels of oxygenation in tumor regions compared to the other areas.
Generally, tumor tissues exhibit lower oxygenation levels due to hypoxia, a common characteristic of cancerous tissues [
39]. Our results show an increase in oxygenation, which at first appears as a contradiction to the expected hypoxic state. This is to be expected because HSI gives no depth information, and thus superficial changes and changes at the tumor edge also contribute to the measured data. In these regions, the oxygenation can in fact increase due to the angiogenesis caused by the higher metabolic requirements of the tumor [
40].
Furthermore, several other texture features extracted from the tissue water index and three different erythema indices proved relevant in Scenario II. Across these features, higher values were consistently observed in the healthy skin region compared to the tumor regions, indicating greater non-uniformity in healthy skin texture.
In Scenario III, only one feature from the RGB images emerged as the most relevant, highlighting the predominant contribution of tissue index images to our analysis. The consistent significance of several features from Scenario II reinforces their relevance across different contexts. Overall, our findings emphasize the pivotal role of tissue-index-derived features in biomedical image analysis, suggesting their potential as robust biomarkers for tissue classification.
Expanding on the role of tissue-index-derived features in classification, it is important to consider their broader implications for biomedical imaging technology. By identifying the most relevant tissue indices, we gain insight into the spectral bands that contribute most to tissue classification, potentially reducing the cost of HSI devices by focusing on key spectral ranges. Thus, understanding the significance of these indices improves classification accuracy, informs decisions about spectral band selection, and optimizes the efficiency and cost of imaging devices.
The performance of different classification models further validates the efficacy of our approach.
Table 5 provides a detailed comparison of the performance metrics across the three distinct scenarios. Each scenario was evaluated using the TPOT, which generated distinct pipelines for optimal model performance. In the scenario using only RGB images, the classifiers achieved a mean accuracy of 0.8773 ± 0.0186, with a mean precision of 0.8784, indicating the classifiers’ ability to avoid false positives. However, there was a slight decrease in precision for Class I compared to Class II and Class III.
When using only tissue index images, both accuracy and precision improved, with mean values of 0.9175 and 0.9189, respectively. This suggests that tissue index images provide valuable information for distinguishing between different classes, resulting in higher classification accuracy.
The mean recall values across all scenarios aligned with the mean accuracy, reflecting the classifiers’ ability to correctly identify positive instances. In scenarios using the RGB images and tissue index images individually, Class III achieved a perfect recall of 1.0000. However, this perfect recall may have been influenced by bias from the extracted features related to region size. Similarly, in Scenario III, where RGB and tissue index images were combined, Class III also achieved perfect precision of 1.0000. This outcome could also be attributed to the same factors related to the extracted features.
Additionally, considering that the total pixel count for the healthy skin regions in our analysis was higher than that for the other regions, we anticipated better classification performance for this class compared to the tumor and peritumor regions. Specifically, the ratio of healthy skin pixels to tumor pixels was approximately 40:1, while the ratio to peritumor pixels was about 11:1.
Class III achieved the highest F1 score, indicating a strong agreement between precision and recall for healthy skin regions. Balanced accuracy, which considers sensitivity and specificity, exhibited trends similar to the overall accuracy across all scenarios. To ensure a fair evaluation of classifier performance, class balance was maintained throughout the training and testing phases.
In scenarios utilizing both RGB and tissue index images, the classifier demonstrated high recall for Class III, indicating the accurate identification of healthy skin regions. This performance aligns with the distinctive features captured by the tissue index images, particularly the greater non-uniformity seen in healthy skin compared to tumor and peritumor regions, which enhances the classifier’s effectiveness for Class III lesions.
Distinguishing between Class I and Class II lesions proved more challenging, especially when only RGB images were used. Nevertheless, the classifiers maintained high accuracy, precision, recall, F1 score, and balanced accuracy across all classes.
The differing performance across scenarios emphasizes the importance of the input data in classification tasks. As shown in
Figure 6, Scenario I displayed substantial variation with RGB images, indicating potential reliability issues. In contrast, the stability observed in Scenario II, with tissue index images, suggests that leveraging these indices can enhance classifier performance. This improvement results from the comprehensive spectral information captured by tissue index images, which provide valuable physiological insights—reflecting properties like oxygenation levels, water content, and melanin distribution—that help distinguish healthy tissues from diseased ones. Moreover, tissue index images capture texture and morphology features specific to various tissue types, contributing to the classifier’s stability and providing reliable indicators across multiple assessments.
In comparison to the recent study by Parasca et al. [
17], who utilized HSI primarily through SAM for assessing skin carcinoma margins, our research adopted a fundamentally different approach. While, in their study, the authors segmented hyperspectral images based on similar spectral properties, with notable AUC values above 0.89 for distinguishing various tissue types, we focused on utilizing biologically significant tissue indices that reflect morphological and physiological changes related to malignancy.
Methodologically, Parasca et al. [
17] employed a single classifier in their analysis, whereas our approach involves an integrated pipeline that combines feature extraction, feature selection, and classification. This framework allows for flexibility and adaptability in classifying different types of skin lesions, enabling the selection of optimal classifiers for better performance. While our current study focused on malignant lesions, this approach has the potential to be extended to classify benign and other types of skin lesions, indicating its broader applicability.
In our study, the mean AUC for tumor versus the rest and peritumor versus the rest reached 0.96 and 0.94, respectively, when tissue indices were incorporated into the analysis. While these findings demonstrate the strong discriminative ability of our approach, they are preliminary, and further validation on a larger dataset is required. Overall, both studies highlight the potential of HSI for skin lesion classification; however, our work emphasizes the importance of integrating features extracted from tissue index images to improve classification accuracy and interpretability.
In comparison to the study by Lindholm et al. [
14], who faced challenges in accurately classifying BCC due to ill-defined tumor borders and the misclassification of surrounding healthy skin as lesions, our research integrated the classification of peritumor regions. While Lindholm et al. [
14] achieved promising sensitivity and specificity rates for differentiating lesions, our approach emphasizes the importance of considering this surrounding tissue in the overall analysis of skin lesions.
While Courtenay et al. [
18] utilized near-infrared hyperspectral imaging in the range of 900.6 to 1454.8 nm and achieved AUPRC values of 0.78 for differentiating healthy skin from BCC and 0.75 for healthy skin versus SCC, our research employed a different spectral range of 400 to 1000 nm. From this range, tissue index images were calculated using specific spectral bands. The AUC values in our analysis indicated that the tissue indices demonstrated higher discriminative power in distinguishing healthy tissues from BCC and SCC, ultimately leading to improved classification accuracy.
It is essential to acknowledge certain limitations of our study. The relatively small dataset size of 24 skin lesions of skin carcinomas in the head and neck region may have influenced our results, potentially limiting the generalizability of our findings. Factors contributing to this small dataset included strict inclusion/exclusion criteria, limited consent from eligible patients, and challenges with data acquisition during the post-COVID-19 period.
We recognize the potential for overfitting when selecting seven radiomic features out of a large pool of features from a dataset consisting of 72 data points. For each of these data points, we analyzed 12 tissue index images and their corresponding RGB images. The selected textural features—along with a few first-order statistics—align with the features reported in previous studies [
19], even though the imaging modalities differ. As noted by [
19], despite the methodological variability and different feature sets utilized in prior studies, there remains consistency in the type of information provided, particularly in terms of the textural characteristics of lesions. This similarity in feature types indicates their potential prognostic value; however, we remain cautious about the bias introduced during the selection process.
Through experiments with varying numbers of features, we aimed to achieve a balance between maintaining discriminative power and recognizing the risk of overfitting, ultimately opting for seven features that ensured both statistical relevance and interpretability without losing critical information from the tissue-based features.
To evaluate model performance, we applied stratified k-fold cross-validation and used a separate test set, emphasizing that there was no data leakage at any stage of the testing phase. Additionally, the use of tissue index images derived from hyperspectral data contributed to initial dimensionality reduction and noise minimization, improving analytical robustness and reducing bias. However, despite the steps taken, there remained the possibility that the performance metrics could have been affected by biases associated with the selected features and the limited sample size. This will be addressed in future research by expanding the dataset to include new measurements of both malignant and benign skin lesions.
Moreover, due to dataset constraints, we opted not to employ deep learning methods, which often require larger datasets to train complex models effectively. While deep learning approaches have shown promise in various medical imaging tasks, including skin lesion classification, their application in this study was deemed impractical given the dataset constraints. We also investigated the potential of pretrained neural networks, including U-Net, for our classification task. However, our attempts with these models did not yield satisfactory results. This highlights the importance of dataset-specific considerations when selecting and adapting ML models. Nevertheless, future studies with larger datasets could explore the feasibility and potential benefits of incorporating deep learning methodologies to improve classification accuracy and model interpretability further.
In summary, our preliminary results indicate potential, as evidenced by the AUC values achieved in our analyses, laying the groundwork for future research aimed at integrating classification-based segmentation techniques for skin lesions, with the ultimate goal of developing a fast and non-invasive method that can discriminate between different skin lesions based on easily understandable biological features and could be applied in clinical settings. While the current study focused on the classification of skin lesions, by incorporating classification-based segmentation using radiomic features extracted from tissue index images, we aim to achieve more-precise identification and delineation of lesion boundaries in the future.
Additionally, subsequent research will involve comparing the performance of this classification-based segmentation approach with segmentation performed by deep learning networks. This comparative analysis will provide valuable insights into the strengths and limitations of different segmentation methodologies, further informing the development of robust and accurate skin lesion analysis techniques.