Melanoma and Nevus Skin Lesion Classification Using Handcraft and Deep Learning Feature Fusion via Mutual Information Measures

In this paper, a new Computer-Aided Detection (CAD) system for the detection and classification of dangerous skin lesions (melanoma type) is presented, through a fusion of handcraft features related to the medical algorithm ABCD rule (Asymmetry Borders-Colors-Dermatoscopic Structures) and deep learning features employing Mutual Information (MI) measurements. The steps of a CAD system can be summarized as preprocessing, feature extraction, feature fusion, and classification. During the preprocessing step, a lesion image is enhanced, filtered, and segmented, with the aim to obtain the Region of Interest (ROI); in the next step, the feature extraction is performed. Handcraft features such as shape, color, and texture are used as the representation of the ABCD rule, and deep learning features are extracted using a Convolutional Neural Network (CNN) architecture, which is pre-trained on Imagenet (an ILSVRC Imagenet task). MI measurement is used as a fusion rule, gathering the most important information from both types of features. Finally, at the Classification step, several methods are employed such as Linear Regression (LR), Support Vector Machines (SVMs), and Relevant Vector Machines (RVMs). The designed framework was tested using the ISIC 2018 public dataset. The proposed framework appears to demonstrate an improved performance in comparison with other state-of-the-art methods in terms of the accuracy, specificity, and sensibility obtained in the training and test stages. Additionally, we propose and justify a novel procedure that should be used in adjusting the evaluation metrics for imbalanced datasets that are common for different kinds of skin lesions.


Introduction
Skin cancer has become one of the deadliest diseases for human beings. Globally, each year, between two and three million non-melanoma (less aggressive) cases occur, and over 130,000 melanoma (aggressive) types are diagnosed [1].
Melanoma is the deadliest type of skin cancer. Australia has the highest rates of skin cancer in the world. In 2018, melanoma accounted for about 22% of skin cancer diagnoses, and non-melanoma tumors accounted for about 78% [2]. Studies have shown that this disease is caused most of the time by exposure to UV radiation in daylight, tanning on sunbeds, and skin color, among others. Physicians have suggested that the best way to detect a malignant skin lesion of any kind is early detection. The rate of survival increases to almost 99% over five years if the disease is spotted in the early stages.
Dermoscopy or Epiluminicence Microscopy (ELM) is a medical method that helps a physician to recognize if a skin lesion belongs to a benign or malignant type of the disease. This method uses a dermatoscope, a tool that consists of a light source and amplification lens to enhance the view of medical patterns such as ramifications, globs, pigmented networks, veils, and colors, among others.
Since the image processing techniques were developed, Computer-Aided Detection (CAD) systems and approaches in the classification [3][4][5][6][7] and segmentation [8] of a Pigmented Skin Lesion (PSL) have been improved , benefiting patient diagnoses in early stages of the disease without shocking or painful medical procedures.
In this work, we propose a novel approach in the detection of a skin lesion among melanoma or nevus types, using handcraft features that depend on shape, color, and texture, which represent the ABCD rule (Asymmetry Borders-Colors-Dermatoscopic Structures), and combining them with deep learning features; these latter features were extracted using the transfer learning method as a generic feature extractor. At the next step, the most important features according to the Mutual Information (MI) metric features should be selected using the fusion technique, aiming at the best performance by taking into account the influences of both sets of features on the binary classification result.
This paper is organized as follows: Section 2 presents a brief review of the methods used in CAD developments with fused features and their nature, Section 3 explains in detail the proposed method, the materials used, and the evaluation metrics employed, and Section 4 describes the experimental results and presents a brief discussion. The conclusions are detailed in Section 5.

Literature Survey
Medical detection algorithms are one of the first tools used for determining whether a skin lesion is malignant or benign [9][10][11][12][13]. Nachbar et al. [9] developed a subjective method based on the visual perception of the lesion. ABCD Rule is based on color, shape, and particular structures that appear on skin lesions. Due to the simplicity of the algorithm, it is one of the most practiced for evaluating a lesion by a naked eye exam or using a dermatoscope.
The ABCD medical algorithm is composed of the following parts: • Asymmetry A: The lesion is bisected into two perpendicular axes at 90 • of each other, so as to yield the lowest possible asymmetry score. In other words, whether the lesion is symmetrical or not is determined. For each axis, where asymmetry is found, one point is added. • Borders B: The lesion is divided into slices by eight axes determining whether a lesion has abrupt borders. If one segment presents an abrupt border, one point is added.

•
Colors C: The lesion can contain one or more of the following colors: white, brownish, dark brown, black, blue, and red. They are generated by vessels and melanin concentrations, so for each one color founded, one point per color is added. • Dermatoscopic Structures D: The lesion has the appearance of the following structures: dots, blobs, pigmented networks, and non-structured areas. A point is added for each structure spotted on the lesion.
The described features are weighted as follows: where TDS is the Total Dermatoscopic Score; if it is less than 4.75, it is concluded that the lesion is benign; if the score is between 4.75 and 5.45, the lesion is considered suspicious; if it is more than 5.45, then it is considered malignant. This algorithm has subversions, where some elements such as dermatoscopic structures are changed by diameter or darkness [14,15]. Additionally, in [16,17], the addition of features known as EFG properties has been suggested; E stands for elevation or evolution, F for firmness, and G for growth. These features work as complementary information obtained from a PSL. Modifications in the ABCD rule are based on simplifying the evaluation of a skin lesion such that anyone can evaluate themselves and record any change in the injury. If the lesion is not identified by these methods, physicians are obliged to initiate invasive methods such as a biopsy to determine their type.
Adjed et al. [18] proposed a method where the aim is the fusion of structural features using Curvelet and Wavelet transform employing the Fast Digital Curvelet Transform (FDCT) wrapping method, and statistical metrics and texture features such as local binary pattern are then computed. They fused around 200 features via concatenation using the PH2 dataset [19].
Hagerty et al. [20] developed a fusion method where deep features are extracted from images using transfer learning method based on the ResNET-50 Convolutional Neural Network (CNN) architecture. However, the question as to which handcraft features are used in their method is not clear. Moreover, they used a feature selection algorithm, in this case, the χ 2 method, for performance revision, employing two datasets: the private set and the second set (a modified version of the ISIC 2018 dataset) [21].
Li et al. [22] used a deep learning approach with the fusion of clinical criteria representations, where as a classifier and fusion method, a boosting tree-learning algorithm called LightGBM is used [23]. This method is applied for color properties (RGB and HSL features), texture properties (SIFT and LBP), and shape properties (solidity and circularity, image ratio, and area ratio). The deep learning features were obtained using the transfer learning method based on the ResNET-50 and DenseNET-201 CNN architectures. Data pertaining to 566 features were processed using the ISIC 2018 dataset [21].
Abbas and Celebi [24] proposed a CAD system where the lesion is processed by a Stack-Based Auto-Encoder (SAE), extracting the deep features from the pixels of a lesion while minimizing the information loss. The handcraft features are extracted for color (the Hill climbing algorithm (HCA)) and for texture (the speed-up robust features (SURF)). Applying a feature fusion approach, they used Principal Component Analysis (PCA), and in the concluding stage, Recurrent Neural Networks (RNN) and A Softmax Linear Classifier were employed.
Among the reviewed methods, most of them use handcraft features and deep learning features with the help of the transfer learning method based on well-known CNN architectures [25][26][27][28][29][30][31]. As one can see, the revised schemes tried to fuse the information extracted from the lesion images, gathering data via the concatenation of the feature vectors, classifiers, and feature selection. The main drawback of such methods is the unawareness of the medical information, which is relevant to physicians besides the extracted data from image processing algorithms. The analyzed methods above employ several possibilities in the fusion of features, but most of them do not consider the importance of each extracted feature according to their nature, which is relevant for a pertinence class. Moreover, some of them lack features based on medical algorithms due to their assumption of the weakness of the perceptual handcraft features based on the subjective visual human system. Modern image processing techniques and machine learning approaches are able to learn the patterns and implement those features, as in a vision scoring system. Finally, some of the reviewed methods attempt to perform a multiclass classification, where the system classifies a lesion image to a specific lesion category. Nevertheless, here a problem endures in the multiclass classification, because the data available for each class are limited, and some of the public databases are not well balanced to perform this classification. As a result, a designed system can perform incorrect classifications. Summarizing, we consider the importance of developing an intelligent system that is able to perform the correct classification of melanoma disease employing both types of features, where medical features have relevance on the classification with the aid of deep learning features, aiming for the best performance.
The novel method considers relevant information obtained from handcraft and deep learning features, improving performance quality presented by commonly used criteria: accuracy, specificity, and sensibility. Different from other schemes, our novel framework encourages the use of ABCD rule features, also known as perceptual features, with a set of features equivalent to or based on a similar medical nature.

Principal Contributions
The principal contributions of our novel approach in the classification of dermoscopic lesions are summarized as follows: • A brief survey of computer-aided detection methods that employ fusion between handcraft and deep learning features is presented.

•
Despite the new tendencies of avoiding the ABCD medical algorithm or any of its variations, we utilized descriptors based on them, such as shape, color, and texture, as a new aggregation, and the extraction of deep learning features were used afterwards.

•
A balanced method was employed due to the inconsistency of the ISIC database with respect to classes. A SMOTE oversampling technique was applied, which in this work demonstrates an improvement in performance at the differentiation of melanoma and benign lesion images.

•
A fusing method that employs relevant mutual information obtained from handcraft and deep learning features was designed, and it appears to demonstrate better performance in comparison with state-of-the-art CAD systems.

Materials and Methods
In this section, the proposed system is described. A brief conceptual block diagram of the system is illustrated in Figure 1. As an initial step, the pigmented skin lesion image is segmented from surrounding normal skin tissue and artefacts such as veils, hairs, and air bubbles, among others, by color space transformation, mean thresholding, and extraction of the Region of Interest (ROI). Subsequently, using the binary mask image, the ROI image and a set of handcraft features based on shape, color, and texture are extracted. Thereafter, deep learning features are obtained using a selected CNN architecture, which is pre-trained in an Imagenet classification task; this CNN is employed as a feature extractor. All extracted features are concatenated in one vector, which later is fused according to the MI criterion. The selected classifier is trained on the ISIC dataset comprised of both malignant and other benign skin lesion images. Finally, the trained classifier model is used to predict each unseen pigmented skin lesion image as a benign or malignant lesion. The details of each stage of the proposed method are described in the remainder of this section.

Preprocessing
An image I(x, y) that is analyzed can contain a lesion with some artefacts such as veils, hairs, stamps, among others. In the first step, we apply a preprocessing stage, where an image is enhanced [8,32,33].
A Gaussian filter is applied, and this filter is used to blur the artefacts contained on the image as primary targets: hair, marks, and spots, among others. This maintains the geometric/shape form of the lesion.
The Gaussian filter is denoted as follows: where σ 2 is the variance of the spatial kernel, this step is shown in Figure 2. CIEL*a*b* space is characterized to more closely approximate the human perception system, where there are channels: L stands for lightness; a* and b* stand for chroma channels, where a* is a parametric measure between magenta and green, and b* is a parametric measure between blue and yellow. The L channel presents values between [0, 100], and a* and b* chroma channels have values between [−30, 30]. This transformation is used to avoid the correlation between channels, while keeping the perceptual data intact, such as a pigmented skin lesion that is darker than healthy skin tissue, as one of the sub-variants of the ABCD algorithm states [14].  In each channel of CIEL*a*b* for images I L , I a , I b , the mean thresholding procedure is applied. Such thresholding allows one to differentiate skin tissue from lesion tissue. In Figure 3, one can see how the CIEL*a*b* space is able to visually separate this information. These mean values are calculated as follows: where (x, y) are the spatial coordinates, m, n are the sizes of an image, and I L , I a , I b denote the mean values. The thresholding operation is applied in each channel of the CIEL*a*b* space, forming the thresholded channel images as follows: Afterwards, the following logic operation is applied on each binarized image, I Th L , I Th a , and I Th b , to form a binary mask I bin (x, y) of the image.
Example of extracted Binary Masks images are given in Figure 4. Finally, a median filter with a kernel 5 × 5 is applied to the I bin (x, y), removing the remaining artefacts, which resists thresholding. Next, a bounding box algorithm is performed. The bounding box [34] is a method that is used to compute an imaginary rectangle that completely encloses the given object. This rectangle can be determined by the x-and y-axis coordinates in the upper-left and lower-right corners of a shape. This method is commonly used in object detection tasks because it estimates the coordinates of the ROI in an image. Bissotto et al. [35] have shown the effect of the bias between different types of image segmentation, where those biases can negatively affect the performance of classification models. They consider that the usage of the bounding box algorithm to segment a lesion is appropriate because a CNN architecture can extract all the relevant features of a lesion and distinguish it from the surrounding healthy skin. Therefore, we consider this solution to reduce the bias of the classification model before processing.

Handcraft Features
The ABCD rule represents a set of perceptual features stated by the findings of patterns in PSLs. The ABCD method employs features that are based mostly on shapes, color, and texture. The selected features in this study are the representation of medical attributes using image processing algorithms.
Sirakov et al. [36] proposed a method to estimate the asymmetry of a lesion based on the binary mask I bin , which is obtained from the previous thresholding step. Then, by rotating it through 180 • , the symmetry mask S I bin is formed, and the synthetic image A is calculated as follows : where A is the generated image, which contains the non-overlapping regions of the lesion called false symmetry FS, therefore applying and this technique is applied on the 0 • axis of the binary image. In this study, a variation of the previous method is proposed, where from the A generated image is rotated from the major axis and the minor axis, by applying the same procedure again and finally, computing the average symmetry value between the two axes, as follows: The symmetry values belong to the interval [0,1], where, if this index approaches the highest value (1), a lesion is more symmetric. The Figure 5 shown the extracted ROI images.

Shape Features
Shape features or geometric features [34] can describe an object or a form in numerical values to represent human perception.
For shape features, the following equations are employed: where m, n are the sizes of the image, and x, y are the spatial coordinates; therefore, the area consists of the amount of pixels contained in the ROI of a lesion.
where (x, y) are the spatial coordinates of the i − th pixel, which constructs the contour of the region, and the perimeter contains the amount of pixels around the ROI of a lesion.
where the circularity shows the similarity between a shape and a circle.
where the diameter is formed by obtaining the length of the major axis and the minor axis of the shape, computed from the 2nd central moment. This measure connects two pairs of points on the perimeter of the shape.
which measures the aspect ratio of the length of the major axis to the length of the minor axis.

Colour Features
Medical algorithms, in particular the ABCD rule, try to present, as features, a set of colors contained on a PSL. Therefore, these features can be replaced by statistical characteristics obtained from color spaces. In this study, the following characteristics are used: where I ch (x, y) is the image of a chosen channel for the PSL image in RGB and CIEL*a*b* color spaces.

Texture Features
Haralick et al. [37] proposed the Gray Level Co-occurrence Matrix (GLCM). This method analyzes the statistical texture features of an image. The texture features provide information about how the gray intensities of the PSL of the image are distributed. GLCM shows how often a gray level occurs at a pixel located in a fixed position, using P d (i, j) as the (i, j) element of the normalized GLCM; N g is the number of gray levels; σ x , σ y and µ x , µ y are the standard derivations and the mean values among the i and j axes of the GLCM, and are expressed as follows: The 13 features used in this study are as follows: The angular second moment measures consistency of the gray local values.
This is the second moment. This characteristic measures the variations between pixels.
This is the linear dependency of the gray level values.
This is the second moment. It shows the spread around the mean in the surrounding neighborhood.
This is the Inverse Difference Moment (IDM). It shows how close the elements of the GLCM are in their distribution.
This is the measure of randomness of the gray values in an image. Additional texture features that are used in this study are based on the difference statistics using the probability P x−y (k) that can be written as follows: where P d (i, j) is the (i, j)th element contained in the GLCM, and N g is the number of gray levels.

Deep Learning Features
Based on the discrete convolution operation, the CNN [38,39] represents one type of method, namely, deep learning strategies [40], the basis of which is to obtain the information of an image I(i, j), using filters K(m, n), which are trained on a neural network as feed-forward and back-propagation, according to where W are the computed values for the filters, z is the activation function, X is the input, and b is the bias [40,41]. The design of an architecture of a CNN is a rather complex task due to the statement of parameters such as the number and size of the filters, and the depth, even those that are task-related.

Transfer Learning
The main problem of using a deep learning approach is that a large amount of data is needed to train the network from scratch. Usually, to overcome this problem, the transfer learning method [42][43][44] is applied.
Transfer learning is a technique that can be defined as the generalization of a target task based on applied knowledge extracted from one or more source tasks [43]. The idea originates from human thinking: we do not learn exactly how to recognize a chair, a plane, or a book; we start by recognizing colors, shapes, and textures, and someone else then tells us how to differentiate a chair from an apple. This sharing of knowledge between beings helps us to understand the world as infants. Another idea is handling information collected for a task to solve related ones. Therefore, transfer learning can be defined as follows: Assume a domain D, which consists of two components: where χ is a feature space, and there is a marginal distribution: P(X), X = {x 1 , ..., x n }, x i ∈ χ. Given a task T with two components, where γ is a label space with a predictive function; η is trained from (x i , y i ), x i ∈ χ, y i ∈ γ, for each feature vector in the D domain; η predicts the corresponding label η(x i ) = y i . The paper [43]  Transfer learning is defined in [44]. Given a source domain D S with a corresponding source task T s and a target domain D T with a corresponding task T T , transfer learning is the process of enhancement for a target predictive function f T (·) using the related information from the domain source D S and the task-related T S . CNN architectures are overlay filters that sample the data contained in an image. Therefore, these filters are hierarchical representations called feature maps when all filters learn all the features based on image data. These are connected to the last layer of the CNN architecture, which is a neural network classifier referred to in the literature as a fully connected layer [40].
Moreover, CNN architectures belong to a class of inductive learning algorithms, where the objective of these algorithms is to map input features between classes seeking the generalization of the data.
Therefore, inductive learning can be transferred from an architecture trained on the source task to the target class, and this is done by adjustments of the model space, correcting the inductive bias. Commonly, this can be performed by replacing the last layer of the model, which is the classifier, from the original one to a lightweight classifier, which should be trained on the generalized features.
In this study, we employed the transfer learning method on architectures that are pre-trained on a similar task. In this research, the Imagenet classification task [45] was employed, by the assumption that D S = D T and the CNN Architecture should perform as a generic feature extractor.

Feature Extractor
In our case, the CNN Architecture was used as a deep feature extractor, where features are extracted by the following: Consider an image I(x, y) of the domain D T that is mapped or transformed by where W is the weight computed by the feature extractor, where M, N, and h are the proposed size of the CNN architecture. As a result, we can obtain P = W(I(x, y)) = {w 1 (I(x, y)), ..., w n (I(x, y))}; ∈ R M×N×h , and the pooling transformations are based on Q = f (0, P) = { f (0, w 1 (I(x, y))), ..., f (0, w n (I(x, y)))}, where f (·) is a mapping function.
In Table 1 (51) where the size of vector F is equal to 43 for the handcraft features, plus the number of features extracted from the CNN architecture used.

Algorithm Summary
Let the sum of all proposed and explained procedures be described in the form of an algorithm for extraction features for PSL Images. The proposed CAD system consists of four principal stages: (a) preprocessing, (b) handcraft features, (c) deep learning features, and (d) the fusion stage. In the first stage, artifacts are removed by using a Gaussian filter after a CIELab color transformation is employed. Mean thresholding per channel is employed afterwards. We then extract the ROI by using the boundary box algorithm. In the second stage, shape features are extracted using Equations (12)-(17), Equations (18)- (21) are computed for statistical color features, and texture features are then calculated from Equations (22)- (38). The ROI image is finally processed by the chosen CNN Architecture, whose features are concatenated for the following steps.
Algorithm 1 presents the details of the feature extraction process for PSL images.

Feature Selection
After extracting the deep learning and handcraft features, the number of features is reduced, which is a typical problem in machine learning algorithms, because high-dimensional data increases the computing time for a prediction.
Feature selection is one method to resolve this problem. In [46], filtering methods are applied, and the features are selected based on various statistical tests, such as χ 2 , ANOVA, Linear discriminant analysis (LDA) among others [47].
The extracted data can be represented in the form of a high-dimensional matrix defined by where X is the extracted data, n represents the instances or elements, and p represents the features extracted for each element.
The idea is to reduce the data as much as possible, such that the features are extracted by selecting a subset of each element that is relevant to the pertinence category or label y. This subset is defined as where X S represents the reduced data, n is the same instance or element of the original data matrix, and k is the selected or reduced features based on k << p.

Mutual Information
In this work, we propose the MI metric to reduce the data of the extracted features. MI is a measure based on the entropy measure: where X = x 1 , ..., x n and Y = y 1 , ..., y n in the multi-variable case, H(X|Y) is the conditional entropy between two random variables, and H(X) is the entropy of a random variable [48][49][50]. if I L (x, y) ≥ I L then 4: Assign 1 to I Th L (x, y) 5: else 6: Assign 0 to I Th L (x, y) 7: end if 8: end for 9: for all (i, j) ∈ I a do 10: if I a (x, y) ≥ I a then 11: Assign 1 to I Th a (x, y) 12: else 13: Assign 0 to I Th a (x, y) 14: end if 15: end for 16: for all (i, j) ∈ I b do 17: if I b (x, y) ≥ I b then 18: Assign 1 to I Th b (x, y) Load the weights W i from selected CNN architecture 29: Apply the weights W i to I roi Ross, in [51], proposed an MI estimator for continuous and discrete data aiming at the relationship between datasets.
Based on the nearest neighborhood rule, the idea is to find a k-nearest neighbor between a point i among all the data points N xi using the Chebyshev distance metric: and the MI measure is then computed as follows: where ψ(·) is the digamma function, N x is the average of the data points, k is the number of k-closest neighbors to the point i, and m is the average counted neighbors among the full dataset. If our dataset contains continuous data points, these are discretized using a binning methodgrouping the data into bins-leading to a binned approximation of MI as follows: After applying the MI method, a new vector is delivered, and it shows the value of MI per concrete feature among all obtained features. Next, the mean value is calculated for the MI among all features. We proposed to use this value as a threshold to discard the features with the lowest MI values and to keep the features with the highest MI values. Tables 2 and 3 The found subset contains the features with the highest mutual information values that we consider as the fused data of both sets of extracted features from a PSL. Below, as an example, we present the behavior of the features and their MI values. Tables 2 and 3 expose, for illustration, several MI values between the extracted features for the dataset of images. Some of the deep learning features appear to demonstrate negligible MI values for the binary classification problem, according to the complete set of features extracted from the database, as one can see in Table 2.  Table 3 exposes several features with significant values of MI that are merged with additional significant handcraft features, forming the final set of features for the proposed system. Therefore, the proposed fusion method based on MI measurements demonstrates the fusion for both types of features in accordance with their influence on the binary classification problem. Therefore, in contrast with state-of-the-art techniques that use Concat, PCA-based, and χ 2 -based methods, among others, for the selection of significant features, the proposed approach employs the information measures, justifying the informative weight of each feature that is used in the classification stage.

Results and Discussion
The classifiers: logistic regression, support vector machine with linear and rbf kernel [52], and finally, relevant vector machine [53,54] algorithms were employed in this work. The rationale for the use of different classifiers lies in the fundamental idea of transfer learning, whereby after extracting generic features, a shallow classifier must be applied to test the proposed method.

Evaluation Metrics
In this study, we used commonly applied performance metrics: accuracy, sensitivity, specificity, precision, f-score, and the Matthews correlation coefficient: The accuracy value measures the appropriate classifications over the total elements evaluated.
The sensibility value, also known as recall, measures the number of positive elements that are correctly classified. Speci f icity = tn tn + f p .
The specificity value measures the number of negative elements that are correctly classified.
The precision value measures the number of elements that are correctly classified among all the positive elements to evaluate.
The F-score value measures the harmonic mean between precision and recall. These criteria are described in terms of tp, tn, fp, and fn, which denote true positive, true negative, false positive, and false negative, respectively. Additionally, to characterize the classifier performance, we used the Matthews correlation coefficient [60]: where the MCC value measures the performance of the classification model as a coefficient between the predicted and the observed elements of the binary classification. It returns a value between [−1, 1], where a value of 1 represents a perfect prediction, a value of 0 is no better than a random prediction, and −1 indicates total disagreement between prediction and observation.

Dataset
This study uses the public ISIC 2018: Skin Lesion Analysis Towards Melanoma Detection grand challenge datasets [21] task 3, also known as HAM10000 [61], which contains 10,015 separate images, as shown in Figure 6, where AKIEC corresponds to Actinic Keratosis, BCC is Basal Cell Carcinoma, DF is a Dermatofibroma, MEL is Melanoma, NV is Nevus, BKL is Pigmented Benign Keratosis, and VASC is Vascular. This distribution was obtained from the Ground Truth file, and each image is on RGB space with a size of 450 × 600. As one can see in Figure 6, the ISIC dataset contains images that belong to different types of skin lesions that do not correspond to the melanoma-type lesion. Therefore, we decided to modify the dataset to develop a binary class classification by excluding all the classes, except for Melanoma and Nevus.
We split the ISIC dataset: 75% for the training set and 25% for test set. The features extracted were processed by Z-score normalization:

Balance Data
The adjustment of the ISIC dataset mentioned above means that the data are unbalanced, where one class contains more data than others. In most machine learning techniques, when they employ unbalanced data, this can result in a lower performance of the minority class, which can cause misclassification of the data. SMOTE [62] is a data augmentation method that can oversample the data of the minority class, compensating with the majority class. This method is based on K-NN clustering and Euclidean distance, selecting two points of the minority class and computing a new one based on them. This method is iterative until it reaches an equivalent amount of information of the majority class.
The SMOTE technique has been employed in several studies [63,64], where extracted features that belong to an unbalanced dataset are oversampled to compensate for the number of instances between classes. In this work, we apply this method to compensate the data of the melanoma class against nevus ones to the selected features by the MI criterion, and as result, a balanced dataset with the fused features can be obtained.
The study [65] introduces new metrics to overcome this problem, where geometric mean attempts to maximize the accuracy of each of the two classes similarly, a performance metric that correlates both objectives: Dominance is aimed at quantifying the prevalence relation between the majority and minority classes and is used to analyze the behavior of a binary classifier: The Index of Balanced Accuracy (IBA) is a performance metric in classification that aims to make it more sensitive for imbalanced domains. This metric is defined as follows: The objective of this procedure is to moderately favor the classification models with a higher prediction rate of the minority class, without underestimating the relevance of the majority class.

Experimental Results
In the following tables, the experimental results of binary classification for balance data are presented. The experimental results in Table 4 show that the designed system appears to demonstrate sufficiently good performance when different CNN architectures are fused with handcraft features in accordance with the MI metric that can seek relevant information among features against concatenating or discriminative analysis.  Table 4 provides the experimental results obtained using the selected CNN architectures, where the Mobilenet v2 appears to demonstrate the best performance in comparison to the before-mentioned architectures. The proposed method shows notable evaluation metrics such as accuracy, Area Under Curve (AUC), and IBA metrics. The selected features contain the fused features that are the most relevant for the classification of a lesion with the MI metric.
Presented in Table 5, experimental results for different criteria show that the designed system outperforms several state-of-the-art methods. The experimentally justified performance, in our view, is due to the fusion technique employed, where a mutual information metric seeks relevant information among features upon concatenating or discriminative analysis. Moreover, the IBA metric is employed, achieving a value of 0.80, which confirms the stability and robustness of the system when balanced data are used. The proposed method achieves an accuracy of 92.40%, sensitivity of 86.41%, AUC of 89.64%, and an IBA of 0.80. In [24], the authors proposed the usage of the complete PSL image, which contains healthy skin and artefacts. Whereby, there is a probability of generating misclassification of the recognizing patterns that belong to these objects. By contrast, the proposed system extracts the proposed features from the region of interest of an image only for the entire classification process. This guarantees that the feature extraction is performed exactly in the lesion image.
Moreover, our novel CAD employed the ISIC 2018 database [21] that contains more than 10,000 dermoscopy images that are authenticated by experts. Additionally, because of the unbalanced data that are present in this database, we applied the data augmentation procedure given in Section 4.1.4. This guarantees the robustness of the obtained classifications results. In contrast, the CAD DermoDeep system [24] obtained the experimental results using a synthetic database constructed by four different databases (private and public). In this case, an equal number of melanoma and benign skin lesions were subjectively selected from each of the four databases. This system showed slightly improved performance results as those reported in this study. In our opinion, such an approach does not guarantee that the same high performance can be repeated using data that are not previously preselected.
The proposed system used a data augmentation technique and presented the performance analysis for all images contained in the database and not only those that have been preselected according to a subjective criterion that has no statistical justification.
Finally, our proposed system was developed with medical-based and deep learning features, whereby the system employed data from both sets of features and merged them, applying the MI criterion. As a result, the system enhances the recognition of melanoma and nevus lesions compared to the use of a fully deep learning approach that is extremely computationally expensive to train, requires substantial amounts of labeled data and does not recognize dermoscopic features established in the ABCD algorithm.

Conclusions and Future Work
In this study, a novel competitive CAD system was designed to differentiate melanoma from nevus lesions. Different from commonly proposed CAD systems, the novel method employs handcraft features based on the medical algorithm ABCD rule and deep learning features and applies a transfer learning method as a feature extractor. Additionally, in the proposed system, the set features are fused using an MI metric that, in contrast with state-of-the-art systems, can select the most significant features in accordance with their influence on binary classification decisions.
The performance of the proposed system has been evaluated; the system achieved an accuracy of 92.4%, an IBA of 0.80, and an MCC of 0.7953 using a balanced dataset. The system is competitive against the performance of other, state-of-the-art systems. The proposed CAD system can help inexperienced physicians to visually distinguish the medical features to be applied. Furthermore, it could be used to provide a second opinion to a dermatologist. Our future work will consist of designing a method for multiclass classification using both sets of features, thus permitting the diagnosis of several diseases found in the ISIC challenge dataset.