Next Article in Journal
Surface Soil Moisture Evaluated from Satellite Multispectral Optical Data Through Visible and Shortwave Drought Index and Its Comparison with Microwave-Based Soil Moisture Products
Previous Article in Journal
Non-Contact Detection of Steel Corrosion Using Sub-Terahertz Waves
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Abstract

Raman Spectroscopy Diagnosis of Melanoma †

1
CNR-ISTI, National Research Council, via Moruzzi 1, 56124 Pisa, Italy
2
CNR-IBF, Institute of Biophysics, National Research Council, via Moruzzi 1, 56124 Pisa, Italy
3
Section of Pathological Anatomy, Department of Health Sciences, University of Florence, 50134 Florence, Italy
4
Unit of Dermatology, Specialist Surgery Area, Department of General Surgery, Livorno Hospital, Azienda Usl Toscana Nord Ovest, 57124 Livorno, Italy
5
CNR-IFC, Institute of Clinical Physiology, National Research Council, 56124 Pisa, Italy
*
Author to whom correspondence should be addressed.
Presented at the 18th International Workshop on Advanced Infrared Technology and Applications (AITA 2025), Kobe, Japan, 15–19 September 2025.
Proceedings 2025, 129(1), 10; https://doi.org/10.3390/proceedings2025129010
Published: 12 September 2025

Abstract

Cutaneous melanoma is an aggressive form of skin cancer and a leading cause of cancer-related mortality. In this sense, Raman Spectroscopy (RS) could represent a fast and effective method for melanoma-related diagnosis. We therefore introduced a new method based on RS to distinguish Compound Naevi (CN) from Primary Cutaneous Melanoma (PCM) from ex vivo solid biopsies. To this aim, integrating Confocal Raman Micro-Spectroscopy (CRM) with four Machine Learning (ML) algorithms: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Support Vector Machine (SVM), and Random Forest Classifier (RFC). We focused our attention on the comparison between traditional pre-processing operations with Continuous Wavelet Transform (CWT). In particular, CWT led to the maximum classification accuracy, which was ∼89.0%, which highlighted the method as promising in view of future implementations in devices for everyday use.

1. Introduction

Among skin cancers, Cutaneous Melanoma (CM) is the most aggressive and mortal form [1]. Today, the most accepted method for CM diagnosis is represented by a dermoscopy-assisted clinical examination, followed by a histopathological assessment [2]. These operations show several drawbacks, including the high percentage of false positive cases after the initial examination, or strong similarities between histotypes of different nature. Raman Spectroscopy (RS) has emerged as a highly promising technique to address the aforementioned issues. This method measures the so-called Raman effect [3]. RS potentially allows the distinction between complex samples that appear macroscopically identical. In contrast to the conventional histopathological examination, RS can identify melanoma within minutes [4]. Finally, due to its label-free character, RS is suitable either for ex vivo or in vivo measurements. However, the large amount of information within a single Raman spectrum hinders the qualitative interpretation of such experimental data. In this sense, Machine Learning (ML) represents a complementary tool to rapidly read and elaborate Raman data, to bring out the information of interest. In this paper, we built an innovative approach, based on the coupling between Confocal Raman Microscopy (CRM) and ML, i.e., Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Support Vector Machine (SVM), and Random Forest Classifier (RFC), as a diagnostic tool to distinguish solid biopsies of Compound Naevus (CN) and Primary Cutaneous Melanoma (PCM). We explored two different approaches for the spectral pre-processing: the conventional techniques, based on the fluorescence (baseline) removal, and the Continuous Wavelet Transform (CWT), which allowed to deconvolve the baseline from the Raman component. This last solution led to a significant increase in the classification accuracy, which reached the maximum value of ∼89% for RFC.

2. Materials and Methods

The study involved 5 μm-thick Formalin-fixed paraffin-embedded (FFPE) tissue sections of CN (12 biopsies) and PCM (18 biopsies), retrospectively retrieved from the Section of Pathology at the Department of Health Sciences, University of Florence, and from the Azienda USL Nord-Ovest Toscana, Livorno, Italy. We collected spectra in grids of resolution 25 μm × 25 μm within the cutaneous lesions. A single spectrum was the result of the arithmetic mean of 85 accumulations, with an acquisition time of 0.2 s per accumulation. In addition, we restricted our analysis to the spectral interval between 400 and 1800 cm−1. We retrieved between 150 and 200 spectra per biopsy, depending on the amount of tissue available. To suppress effects related to spatial non-uniformities (voids or cracks), we normalized the raw spectra by the integral value. In this study, we compared two different pre-processing approaches: in one case, we removed the baseline, attributed to sample fluorescence, through an Asymmetrically Reweighted Penalized Least Squares (ARPLS) algorithm [5]. Subsequently, we suppressed the high-frequency noise with a Savitzky–Golay (SG) algorithm (window: 17 points; polynomial order: 3). In a second approach, we applied the Continuous Wavelet Transform (CWT) [6]. This operation can be conceived as the convolution of the original signal x ( t ) with a series of wavelets { Ψ t b a } , where b is a transnational parameter and a > 0 is called scale. The aforementioned wavelets are generated from the so-called mother wavelet Ψ t . In this work, we adopted the so-called “Mexican hat” as the mother wavelet [7]. In addition, we adopted an array of N s = 100 evenly spaced scales between 0.1 and 100. When applied to a single Raman spectrum, CWT led to a vector of N s × N p = 69,300 components, where N p is the number of components of the original spectrum. As we will explain in detail in the following, we focused our attention on the problem of distinguishing CN and PCM. Since the number of biopsies and, consequently, the number of spectra of CN and PCM was not the same, after having performed the pre-processing operations, we applied a Synthetic Minority Oversampling TEchnique (SMOTE) to obtain a balanced dataset of N s p e c t r a = 5100 spectra [8]. Finally, we applied PCA to the resulting spectra to reduce the system dimensionality, and employed the first N P C A = 10 Principal Components (PCs) to feed the ML models. We tested the classification performances of LDA, QDA, SVM (kernel: radial basis function, RBF), and RFC. In particular, in SVM, we fixed the γ parameter of RBF to N P C A 1 and optimized the regularization parameter C based on the maximization of the classification accuracy through a grid-like procedure employing an array of evenly speced values of C between 10 1 and 5. Finally, in RFC, the trees of the forest were grown until reaching 100% training accuracy, and each tree node was obtained by randomly choosing N p 1 2 among the features available in the dataset and by selecting the feature leading to the maximum gain in terms of Gini index. To build the whole forest, we optimized the number N t of trees through a grid-like procedure, adopting an array of evenly spaced integers between 50 and 150. We quantified the classification performances with a 10-fold cross-validation, in terms of Accuracy (A), Area Under ROC curve (AUROC), Recall (R), and Precision (P).

3. Results

In Figure 1, we report the averaged Raman signal associated with PCM and CN. The most relevant detail observable from the qualitative analysis of this graph is the strong overlap between the two signals, as testified by the averaged signal and the corresponding standard deviation, which is represented here as shaded areas. This outcome can be seen as further proof of the inadequacy of a qualitative approach aimed at distinguishing the two classes, CN and PCM. The lack of distinct spectral bands with significant intensity differences between CN and PCM indicates that a Machine Learning (ML) model’s classification power must depend on the combined statistical data from multiple spectral components. This suggests that highly non-linear ML models are the best candidates for high classification accuracy. This conclusion is supported by the performance of the models in this study. Linear models like LDA, QDA, and SVM performed similarly, with accuracies too low for reliability. In contrast, the highly non-linear Random Forest Classifier (RFC) achieved the best results, with accuracy reaching approximately 80–89, as reported in Table 1.
The second interesting outcome of this analysis comes from the comparison between the two pre-processing conditions adopted in this paper. The conventional condition ARPLS+SG, based on the baseline removal, led to the worst classification performances, probably indicating that the baseline component, attributable to the sample fluorescence, contains precious information for the correct classification. On the other hand, CWT, which does not involve signal removal, allows for maintaining the baseline contribution, resulting in better performance. Despite this interesting result, CWT led to the drawback of dramatically increasing the number of features employed to feed the ML models. Although this effect can be attenuated by applying PCA, it turns out to be time-consuming, with potential negative consequences in terms of practical use and/or the occurrence of overfitting. In our case, while ARPLS+SG required ∼10 s to be accomplished, CWT required ∼59 s. This aspect must be taken into account in view of the future applications of this technology in engineered devices, where requirements such as high diagnostic speed and reliability are mandatory.

4. Conclusions

In this preliminary investigation, we explored the possibility of employing the Confocal Raman Microscopy coupled with Machine Learning to diagnose Cutaneous Melanoma from solid biopsies, i.e., to distinguish Primary Cutaneous Melanoma from Compound Naevus. To this aim, we employed the resulting Raman spectra as examples to build ML models based on different principles. Furthermore, we compared the traditional pre-processing operations, based on the removal of the baseline filtering, with an innovative approach, i.e., the application of the Continuous Wavelet transform (CWT). The results of such an investigation highlighted how the Random Forest Classifier led to the maximum classification accuracy, with values reaching ∼89%, acting as a good candidate for future employments in engineered devices. Among the most promising routes, we mention the employment in an ex vivo fashion to assist the histopathologists during the diagnostic process, or for the realization of probes for non-invasive in vivo diagnosis. Finally, the classification performances were maximized with CWT, indicating that the signal baseline, usually considered an undesired contribution to the measured signal, carries valuable information for the diagnostic task.

Author Contributions

M.D., D.M. (Daniela Massi), P.V. and M.L. conceived the experiment, G.L. and M.D. conducted the experiment, D.M. (Daniela Massi) and P.V. made available the biological tissues, D.M. (Davide Moroni), O.S. and G.L. managed the data. All the authors analyzed the data and wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Regione Toscana through the TELEMO Project under Grant Ricerca Salute 2018.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Azienda USL Toscana Nord Ovest U.O.C. Dermatologia—Livorno Mod. PROTOCOLLO STUDIO PILOTA OSSERVAZIONALE B11 Vers_20160118.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The request for dataset, both raw and processed data, can be made directly to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Saginala, K.; Barsouk, A.; Aluru, J.S.; Rawla, P.; Barsouk, A. Epidemiology of melanoma. Med. Sci. 2021, 9, 63. [Google Scholar] [CrossRef] [PubMed]
  2. Senan, E.M.; Jadhav, M.E. Analysis of dermoscopy images by using ABCD rule for early detection of skin cancer. Glob. Transit. Proc. 2021, 2, 1–7. [Google Scholar] [CrossRef]
  3. Raman, C.V.; Krishnan, K.S. A new type of secondary radiation. Nature 1928, 121, 501–502. [Google Scholar] [CrossRef]
  4. Fox, S.A.; Shanblatt, A.A.; Beckman, H.; Strasswimmer, J.; Terentis, A.C. Raman spectroscopy differentiates squamous cell carcinoma (SCC) from normal skin following treatment with a high-powered CO2 laser. Lasers Surg. Med. 2014, 46, 757–772. [Google Scholar] [CrossRef] [PubMed]
  5. Baek, S.J.; Park, A.; Ahn, Y.J.; Choo, J. Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst 2015, 140, 250–257. [Google Scholar] [CrossRef] [PubMed]
  6. Kandjani, A.E.; Griffin, M.J.; Ramanathan, R.; Ippolito, S.J.; Bhargava, S.K.; Bansal, V. A new paradigm for signal processing of Raman spectra using a smoothing free algorithm: Coupling continuous wavelet transform with signal removal method. J. Raman Spect. 2013, 44, 608–621. [Google Scholar] [CrossRef]
  7. Ramakrishnan, S. Introductory Chapter: Wavelet Theory and Modern Applications. In Modern Applications of Wavelet Transform; IntechOpen: London, UK, 2024. [Google Scholar]
  8. Bellantuono, L. Artificial Intelligence-assisted thyroid cancer diagnosis from Raman spectra of histological samples. IL NUOVO CIMENTO 2024, 100, 47. [Google Scholar]
Figure 1. Averaged RS of PCM (red) and CN (black). Shaded areas represent the standard deviation.
Figure 1. Averaged RS of PCM (red) and CN (black). Shaded areas represent the standard deviation.
Proceedings 129 00010 g001
Table 1. Classificationperformances of the ML classifiers, for the two pre-processing conditions examined. Errors are determined as the standard deviation on the folds of the cross-validation.
Table 1. Classificationperformances of the ML classifiers, for the two pre-processing conditions examined. Errors are determined as the standard deviation on the folds of the cross-validation.
ClassifierPre-ProcessingA (%)AUROC (%)R (%)P (%)
LDACWT 63.2 ± 2.6 67.3 ± 3.6 63.2 ± 2.6 63.2 ± 2.5
LDAARPLS+SG 63.0 ± 2.6 65.1 ± 2.7 63.0 ± 2.6 63.4 ± 2.7
QDACWT 55.0 ± 3.2 55.3 ± 4.9 55.0 ± 3.2 58.2 ± 5.1
QDAARPLS+SG 49.9 ± 0.1 50.0 ± 0.1 49.9 ± 0.1 24.9 ± 0.1
SVMCWT 55.8 ± 3.5 66.3 ± 4.1 55.8 ± 3.4 61.3 ± 6.1
SVMARPLS+SG 49.9 ± 0.1 60.9 ± 3.9 49.9 ± 0.1 24.9 ± 0.1
RFCCWT 89.1 ± 1.7 96.0 ± 1.1 89.1 . 8 ± 1.7 89.1 ± 1.7
RFCARPLS+SG 79.6 ± 1.9 87.9 ± 1.3 79.6 ± 1.9 79.7 ± 1.9
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lazzini, G.; Massi, D.; Moroni, D.; Salvetti, O.; Viacava, P.; Laurino, M.; D’Acunto, M. Raman Spectroscopy Diagnosis of Melanoma. Proceedings 2025, 129, 10. https://doi.org/10.3390/proceedings2025129010

AMA Style

Lazzini G, Massi D, Moroni D, Salvetti O, Viacava P, Laurino M, D’Acunto M. Raman Spectroscopy Diagnosis of Melanoma. Proceedings. 2025; 129(1):10. https://doi.org/10.3390/proceedings2025129010

Chicago/Turabian Style

Lazzini, Gianmarco, Daniela Massi, Davide Moroni, Ovidio Salvetti, Paolo Viacava, Marco Laurino, and Mario D’Acunto. 2025. "Raman Spectroscopy Diagnosis of Melanoma" Proceedings 129, no. 1: 10. https://doi.org/10.3390/proceedings2025129010

APA Style

Lazzini, G., Massi, D., Moroni, D., Salvetti, O., Viacava, P., Laurino, M., & D’Acunto, M. (2025). Raman Spectroscopy Diagnosis of Melanoma. Proceedings, 129(1), 10. https://doi.org/10.3390/proceedings2025129010

Article Metrics

Back to TopTop