Curve-Based Classification Approach for Hyperspectral Dermatologic Data Processing

This paper shows new contributions in the detection of skin cancer, where we present the use of a customized hyperspectral system that captures images in the spectral range from 450 to 950 nm. By choosing a 7 × 7 sub-image of each channel in the hyperspectral image (HSI) and then taking the mean and standard deviation of these sub-images, we were able to make fits of the resulting curves. These fitted curves had certain characteristics, which then served as a basis of classification. The most distinct fit was for the melanoma pigmented skin lesions (PSLs), which is also the most aggressive malignant cancer. Furthermore, we were able to classify the other PSLs in malignant and benign classes. This gives us a rather complete classification method for PSLs with a novel perspective of the classification procedure by exploiting the variability of each channel in the HSI.


Introduction
Hyperspectral (HS) imaging (HSI) combines conventional imaging and spectroscopy methods in a single imaging technique providing both spatial and spectral information of the captured display [1,2]. It is also a fitting method for medical applications due to its non-invasive, non-ionizing, and label-free nature [3]. Dermatology is one of the medical fields where HSI have shown its potential as an appropriate imaging technique [4].
Currently, the most common form of cancer, with more than 1.3 million new cases worldwide in 2018, is skin cancer [5]. There are several types of skin lesions. Pigmented skin lesions (PSLs) contain a wide variety of types, including cancerous and non-cancerous PSLs [6]. There are two types of PSLs depending on the type of growth of the tissue: benign and malignant. Nevus, which is benign, has a slow growth rate and is noncancerous, while e.g., melanomas, which is malignant, are invasive and potentially metastatic tumors [6]. Other types of skin cancer produced by different types of cells include squamous cell carcinoma and basal cell carcinoma. However, melanomas are much more dangerous than other types of skin cancer, and an early detection of this skin lesion can be extremely important in improving the patient survival [7]. The diagnosis of PSLs is performed by dermatologists employing their naked eye or dermascopic cameras, which enhance the morphological visualization of the PSL. After the inspection, an analysis of the lesion following the ABCDE (Asymmetry, Border irregularity, Color, Diameter and Evolving size, shape, or color) rule is applied in order to establish a preliminary diagnosis [7]. A suspicious lesion will accordingly get a histopathological analysis carried out to determine the definitive diagnosis. These diagnostic tools based on conventional imaging have been used to help dermatologists in preliminary diagnosis. However, conventional imaging has limitations that could be surpassed by the enriched spectral information provided by the HSIs.
HSI technology has already been explored as a target technology to aid in PSLs diagnosis. A recent study by Leon et al. showed promising results in discriminating PSLs by random forests and artificial neural networks [8]. Another recent study by Hosking et al. used digital dermascopy HSI with machine learning methods in order to detect melanomas [9]. Kazianka et al. used HSIs to differentiate between normal skin, melanomas, and moles [10]. They used unsupervised approaches to segment the images and moreover evaluated the performance of several supervised classifiers aimed to retrieve the diagnosis of each pixel. The results indicated that it was possible to differentiate melanomas from moles with high specificity and sensitivity. In another study, Nagaoka et al. [11] collected a dataset for discriminating between melanomas and other PSLs, evaluating the statistical significance of an index defined to this end. Tomatis et al. used a classifier based on a neural network model [12]. There have also been developed some commercial systems as SIAscope/SIAscopy [13] and MelaFind [14][15][16]. MelaFind is being used in several studies. In [15], Elbaum et al. conduct a leave-one-out cross-validation procedure using a database composed of 183 melanocytic nevus and 63 melanomas. Monheit et al. [16] applied it in a multicenter study. At last, Song et al. performed a paired comparison between MelaFind and a reflectance confocal microscopy system to differentiate between melanoma and non-melanoma PSLs by comparing MelaFind with a confocal microscopy system. In the study by Stamnes et al., they discriminate between malignant and benign PSLs, i.e., no specific melanoma detection [17].
The current default method of data analysis (DA) is deep learning (DL). DL emerged as a competing DA method around 2009 and solves a wide range of problems exemplified by the versatile TensorFlow-package [18]. DL is also applied in the HSI field. However, DL often needs a huge amount of data in order to train the parameters enough in the neural network [19]. Thus, in the proposed method, this presents a difficult problem due to the lack of data. We are pursuing this problem by an ongoing data collection project, aiming for future applications of DL systems.
Most of the research in the use of HSI for skin analysis is focused on the automatic diagnosis of PSLs. Clearly, a successful methodology of this type would be an extremely useful decision support tool for general practitioners (GPs) and dermatologists. In our contribution in this direction, we were able to find clear patterns for the melanoma PSLs, the other malignant PSLs and the benign PSLs. In the novel exploitation of the variability of each channel in the HSI cubes, we made curves of the mean and standard deviation of sub-images in these channels. Furthermore, using these curves and their inherent patterns, we were able to discriminate between malignant and benign PSLs, but also between the aggressive melanoma PSL and the others.
The outline of the paper is as follows. First, we describe the developed dermatological HSI acquisition system, and thereafter, we describe the datasets we are using in this curvebased classification method. Next, we point out how the data are pre-processed and labeled before they are used as input in our new classification method, which consists of a training, validation, and testing phase.

Hyperspectral In Vivo Dermatologic Data
The system used to acquire the HS dermatologic images is described in [20], using the same database as in Leon et al. [8]. The database consists of 76 images of PSLs from 61 patients: 36 cancerous and 40 noncancerous. The dataset is divided into a training set, a test set, and a validation set. The training set is subsequently divided into melanoma PSLs, malignant PSLs, and benign PSLs. Each mentioned set is mutually exclusive. The acquisition system is composed of an HS snapshot camera able to capture HS images in the very near infrared (VNIR) range, between 450 and 950 nm, with a spatial resolution of 50 × 50 pixels and 125 spectral bands. This acquisition system employs a customized dermascopic contact structure and a halogen source light (150 W) coupled to a fiber optic ring light guide for cold light emission. The effective capturing area of the system is 12 × 12 mm, and the acquisition time is lower than 1 s. The system was applied to create an HS database composed of 76 images of PSLs from 61 patients. The data acquisition was done at the Hospital Universitario de Gran Canaria Doctor Negrín and Complejo Hospitalario Universitario Insular-Materno Infantil (Spain). The research protocol and consent procedures were approved by the Comité Ético de Investigación Clínica-Comité de Ética en la Investigación (CEIC/CEI) at the University Hospital Doctor Negrin.

Curve-Based Classification Approach
In the pre-processing step, a calibration is done of the HSI using the white and dark reference images following Equation (1), where CI is the calibrated image, RI is the raw image obtained from the HS camera, and DI and WI are the dark and white reference images, respectively. The white reference image is obtained capturing an image of a white reference tile that reflects 99% of the incident light. The dark reference image is captured by keeping the tap in the lens of the camera.
Finally, the data is normalized in the range [0,1] to avoid differences in the spectral signature intensities caused by possible different illumination conditions.

Region of Interest Curves
The region of interest (ROI) was obtained through the extraction of 7 × 7 sub-images for all the channels. Subsequently, the mean (x-coordinate of the curves) and standard deviation (y-coordinate of the curves) of these ROIs from each HS channel were plotted against each other, thus creating ROI curves for each HSI cube. These curves were rather different, some almost linear, other more convoluted. Thus, we applied a morphological dilation filter to the more complex curves to ease the interpretation and subsequently use these filtered curves in function fits. Morphological dilation is a common way to enhance or change digital images in some way. Thus, often used in bridging gaps, our morphological filter can be seen as an inverse dilation, a pruning.
There were some issues of this method that needed to be addressed. In particular, how the ROIs are chosen. So, the first issue is where to collect this ROI in the HSI, i.e., which coordinates to choose? To shed some light on this, we established some criteria for each of the classes of PSLs and further picked the ROIs best suited for creating the curves with adequate classification properties. In each channel, we started from (x,y) = (16,16) and went to (x,y) = (35, 35). This gave us 20 × 20 = 400 HS ROI curves for each PSL HSI cube collected from different locations. However, with some overlap and since each ROI area is 7 × 7 pixels, we cover the HSI from the start at (x,y) = (13,13) to the end at (x,y) = (38, 38): thus, an area of 26 × 26 = 676 pixels. We tried a wider area, but the results did not improve, so we kept this size due to computational cost. Thus, we tested how the ROI extraction affected the mean square error (MSE) of the curves chosen by suitable criteria for the melanoma as well as the other malignant and benign PSLs. The CPU used to run the code was an Intel Xeon 2.8 GHz running on a 16 GB RAM computer.

Curve-Fitting
By trying to classify the PSLs through fits of the ROI curves, all the HSI ROI curves were fitted by a quartic (fourth degree) polynomial function, where the coefficients are found through the nlinfit function of MATLAB ® , which minimizes the weighted least squares equation, where the weights were chosen through a weight function, w(y) = 1/(0.011 + 0.011y) 2 and n = 125, the number of data points. Further is the quartic regression function, with parameters b j , j = 0, . . . ,4. Then, the nlinfit-function uses the iteratively reweighted least-square algorithm, which is designed to deal with outliers [21]. Furthermore, the use of a quartic polynomial with a weight function proved to be the best option with regard to flexibility and robustness to outliers. We also tested higher degree polynomial functions; however, they were prone to overfitting. Figure 1 gives an overview of the algorithm. The constants a, b, c, d, e, f, and g are found though the training phase. The features are as follows: df (first-order derivative), ddf (second-order derivative), totm = a·df + b·ddf, mabdf = mean + abs(df) and mean.

Curve-Fitting
By trying to classify the PSLs through fits of the ROI curves, all the HSI ROI curves were fitted by a quartic (fourth degree) polynomial function, where the coefficients are found through the nlinfit function of MATLAB ® , which minimizes the weighted least squares equation,

,
( where the weights were chosen through a weight function, w(y) = 1/(0.011 + 0.011y) 2 and n = 125, the number of data points. Further is f(x, b) = b0 x 4 + b1 x 3 + b2 x 2 + b3 x + b4, the quartic regression function, with parameters bj, j = 0,…,4. Then, the nlinfit-function uses the iteratively reweighted least-square algorithm, which is designed to deal with outliers [21]. Furthermore, the use of a quartic polynomial with a weight function proved to be the best option with regard to flexibility and robustness to outliers. We also tested higher degree polynomial functions; however, they were prone to overfitting. Figure 1 gives an overview of the algorithm. The constants a, b, c, d, e, f, and g are found though the training phase. The features are as follows: df (first-order derivative), ddf (second-order derivative), totm = a•df + b•ddf, mabdf = mean + abs(df) and mean.

Results
Here, we will represent the obtained experimental results. We start the representation of the training phase.

The Training Phase of the Method
As mentioned above, we acquired an ROI from each channel of the HSIs, where the curves were formed through the calculation of the mean and standard deviation of the ROIs from each channel. First, we investigated the melanoma PSLs. The features coming most natural from earlier experiments are a high mean first-order derivative (Max df) and a high mean curvature given by the mean of the second derivative (Max ddf). However, during the training phase, we found that a high sum of mean df (multiplied by 1, a in

Results
Here, we will represent the obtained experimental results. We start the representation of the training phase.

The Training Phase of the Method
As mentioned above, we acquired an ROI from each channel of the HSIs, where the curves were formed through the calculation of the mean and standard deviation of the ROIs from each channel. First, we investigated the melanoma PSLs. The features coming most natural from earlier experiments are a high mean first-order derivative (Max df) and a high mean curvature given by the mean of the second derivative (Max ddf). However, during the training phase, we found that a high sum of mean df (multiplied by 1, a in Figure 1), mean ddf (multiplied by 0.1, b in Figure 1), and the derivative at the maximum value gave the most successful feature (Max totm > 2.86, c in Figure 1). In Table 1 and Figure 2a, we can observe that this feature produced rather good fits according to the MSEs: the MSE ddf following this measure and the MSE df lower for two melanomas. Thus, all the plots of the HSI ROI curves from the melanoma PSLs are from the fits of the Max totm curves. Moreover, the recurring theme in Figure 2c-f is that all the HSI ROIs came from the interior and/or near the edges of the melanomas. In Table 1, we can also observe the most distinct outlier, the P82C1000 PSL, which did not have the same characteristics as the other melanomas, behaving more similar to the malignant PSLs. In Figure 2a, we can observe that this PSL is the one with an MSE ddf diverging most clearly from the MSE totm.
2a, we can observe that this PSL is the one with an MSE ddf diverging most clearly from the MSE totm.  Regarding the malignant PSLs, we inferred based on earlier experiments that the HSI ROI curves with a high mean (Max Mean > 0.05, d in Figure 1) or a high product of max x and max y (Max Prod > 0.395, e in Figure 1) from the fitted curve are plausibly originating from this type of PSL. This proved successful for some of the HSI ROI curve fits from the malignant PSLs. As seen in Table 2 and Figure 3a, this was the most difficult type PSL to fit, giving rise to rather high MSEs. The Max Mean was the best feature accordingly. Some of the curves with least MSE Mean are plotted in Figure 3b, where all have means lying  Regarding the malignant PSLs, we inferred based on earlier experiments that the HSI ROI curves with a high mean (Max Mean > 0.05, d in Figure 1) or a high product of max x and max y (Max Prod > 0.395, e in Figure 1) from the fitted curve are plausibly originating from this type of PSL. This proved successful for some of the HSI ROI curve fits from the malignant PSLs. As seen in Table 2 and Figure 3a, this was the most difficult type PSL to fit, giving rise to rather high MSEs. The Max Mean was the best feature accordingly. Some of the curves with least MSE Mean are plotted in Figure 3b, where all have means lying above the standard deviation (y-value) of 0.05 (d in Figure 1). From these HSI ROI mean standard deviation curves, the most complex data occurred-thus, as mentioned above, giving rise to rather high MSEs. The corresponding ROIs of the plotted malignant curves are also here lying on the inside or at the edge of PSLs, as seen in Figure 3c-f.  The benign PSLs HSI ROI curves proved easier to fit than the malignant ones. From earlier experiments, two useful features came to be low mean (Min Mean < 0.05, g in Figure 1), low absolute value first order derivative (Min abs df), and the sum of these (Min mabdf < 0.109, f in Figure 1). This proved rather successful with a few exceptions, as can be observed in Table 3 and Figure 4a, with Min abs df and Min mabdf as similar rather successful features. In Figure 4b, four of the curves with lowest MSE abs df are plotted with their corresponding placements of the ROIs given in Figure 4c-f. We can here see that for these PSLs, the ROIs are somewhat different placed than for the melanomas and malignant ones-not all inside the PSLs, but at some sort of feature, nevertheless.  The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P29C2000 PSL. (d) The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P13C2000 PSL. (e) The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P18C1000 PSL. (f) The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P30C1000 PSL.

Classification
From the training phase, we inferred that the classification procedure can consist of two consecutive steps. The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P29C2000 PSL. (d) The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P13C2000 PSL. (e) The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P18C1000 PSL. (f) The corresponding ROI in red drawn on the HSI for one of the channels used in the creation of the HSI ROI curve from the P30C1000 PSL.

Classification
From the training phase, we inferred that the classification procedure can consist of two consecutive steps.
First, we identify the curves where the first derivative increases, possibly converging at maximum value of the HSI ROI curves mean. Additionally, the curves with high mean positive curvature and less double inflexion points were also identified as melanomas. In addition, the sum df + 0.1ddf proved successful (totm).
Then, we find the two clusters of the rest of the curves, such that they can be classified as benign or malignant. This is mainly done by choosing what curves have mean above y = 0.05 (malignant) and below y = 0.05 (benign). For the benign curves, an additional useful feature was the sum of the mean of the fitted curve and the mean absolute value of the first-order derivative in each point (mabdf).

Validation and Test Experiment
We also performed an experiment in order to validate the training phase with the same type of ROI curves as in the training part but different PSLs. This resulted in the usual 125 points mean and standard deviation curve; then, a quartic polynomial function was fitted, as mentioned earlier. Based on the features developed in the training phase, then, we were able to first discriminate between melanomas and the rest of the PSLs, where a typical melanoma fit is given in Figure 5a, and then discriminate between malignant and benign PSLs. Examples of their respective fits given in Figure 5b,c. First, we identify the curves where the first derivative increases, possibly converging at maximum value of the HSI ROI curves mean. Additionally, the curves with high mean positive curvature and less double inflexion points were also identified as melanomas. In addition, the sum df + 0.1ddf proved successful (totm).
Then, we find the two clusters of the rest of the curves, such that they can be classified as benign or malignant. This is mainly done by choosing what curves have mean above y = 0.05 (malignant) and below y = 0.05 (benign). For the benign curves, an additional useful feature was the sum of the mean of the fitted curve and the mean absolute value of the first-order derivative in each point (mabdf).

Validation and Test Experiment
We also performed an experiment in order to validate the training phase with the same type of ROI curves as in the training part but different PSLs. This resulted in the usual 125 points mean and standard deviation curve; then, a quartic polynomial function was fitted, as mentioned earlier. Based on the features developed in the training phase, then, we were able to first discriminate between melanomas and the rest of the PSLs, where a typical melanoma fit is given in Figure 5a, and then discriminate between malignant and benign PSLs. Examples of their respective fits given in Figure 5b,c.  When looking at the results, we denote correctly classifying a malignant PSL as a true positive (tp) and correctly classifying a benign PSL as a true negative (tn). Furthermore, falsely classifying a malignant PSL is denoted as a false positive (fp) and at last, falsely classifying a benign PSL is denoted as a false negative (fn). By using the validated classification criteria, we were able to test the method with a test set. The results are given in Table 4, where the P23C1001 gave a false positive due to some regression difficulties regarding outliers. The accuracy of this proposed method is 90%. The sensitivity is 100% and the specificity is 80%. Considering the melanoma discrimination, both sensitivity and specificity are 100%. In Table 5 results from other researchers are provided.  Table 5, we can discern that most research studies have been discrimination of the aggressive melanoma PSLs, where Nagaoka et al. [11] and Leon et al. [8] had the highest sensitivity and specificity. Stamnes et al. [17] also had high such measures; however, these were only discriminating between benign and malignant PSLs. Thus, the sensitivity and specificity in the method developed here is comparable to the others.
The works listed in Table 5 mostly use an array of machine learning methods, while the proposed method only use a regression method, making it a simpler approach than the others.
The computation time was on average 0.16 s for each PSL. However, including the best ROI search, this jumps to 70 s. Thus, it is a rather slow algorithm due to rather complex calculations. Compared to the previous work in [8], in which six of the 10 patients were correctly identified with an average execution time of 0.5 s, the proposed methodology identifies nine out of 10 patients with an average of 160 s. However, this is only the core algorithm; an efficient segmentation algorithm has to be included also, raising the execution time.

Limitations and Future Directions
Further studies must be conducted in order to improve the limitations in this research. The first limitation is the rather low number of samples in particular for the melanoma class. In addition, in the benign and malignant class, it would have been desirable of an increase of samples. However, the number of samples was enough to find and validate the patterns for each class of PSLs considered here. Another limitation is the computing time. However, this will be decreased by the automatic choosing of ROIs by segmentation of the PSLs, which have several known potential solutions. We will also try to improve the regression of the curves by further experimentation of the weight function or other types of regression. The developed system in this paper is at an experimental level; however, it has the potential to assist in the diagnosis of skin cancer and reduce the number of biopsies relating to this.

Conclusions
The work presented here shows that HSIs can be used to discriminate between melanoma, malignant, and benign PSLs as a potential future diagnostic tool for dermatologists or GPs. The proposed curve-based method is able to do this discrimination by using the inherent patterns of these curves. The presented method was able to correctly classify nine out of 10 in the test set, identifying all cancerous cases accurately (100% of sensitivity). Thus, it is a rather good result. In addition, in the test phase, it had a sensitivity of 80% overall and 100% for the melanomas. Since this method is relatively experimental, we will improve it further in order to increase its accuracy and lower its computation time. The limitation of the proposed method is mainly the lack of data, which is remedied in an ongoing project of data collection. The merits of the proposed method is the utilization of the richer structure provided by the HSI technology in a novel way. Funding: This work has been supported by the Canary Islands Government through the ACIISI (Canarian Agency for Research, Innovation and the Information Society), ITHACA project "Hyperspectral Identification of Brain Tumors" under Grant Agreement ProID2017010164 and it has been partially supported also by the Spanish Government and European Union (FEDER funds) as part of support program in the context of Distributed HW/SW Platform for Intelligent Processing of Heterogeneous Sensor Data in Large Open Areas Surveillance Applications (PLATINO) project, under contract TEC2017-86722-C4-1-R.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to patient confidentiality.

Conflicts of Interest:
The authors declare no conflict of interest.