Staging Melanocytic Skin Neoplasms Using High-Level Pixel-Based Features

The formation of malignant neoplasm can be seen as deterioration of a pre-malignant skin neoplasm in its functionality and structure. Distinguishing melanocytic skin neoplasms is a challenging task due to their high visual similarity with different types of lesions and the intra-structural variants of melanocytic neoplasms. Besides, there is a high visual likeliness level between different lesion types with inhomogeneous features and fuzzy boundaries. The abnormal growth of melanocytic neoplasms takes various forms from uniform typical pigment network to irregular atypical shape, which can be described by border irregularity of melanocyte lesion image. This work proposes analytical reasoning for the human-observable phenomenon as a high-level feature to determine the neoplasm growth phase using a novel pixel-based feature space. The pixel-based feature space, which is comprised of high-level features and other color and texture features, are fed into the classifier to classify different melanocyte neoplasm phases. The proposed system was evaluated on the PH2 dermoscopic images benchmark dataset. It achieved an average accuracy of 95.1% using a support vector machine (SVM) classifier with the radial basis function (RBF) kernel. Furthermore, it reached an average Disc similarity coefficient (DSC) of 95.1%, an area under the curve (AUC) of 96.9%, and a sensitivity of 99%. The results of the proposed system outperform the results of other state-of-the-art multiclass techniques.


Introduction
Pigmented skin lesions represent about 20% of all skin cancer cases [1]. Pigmented skin lesions are generally divided into melanocytic and non-melanocytic lesions [2]. Melanocytic lesions refer to lesions that have different colors, most often due to melanin, such as Melanocyte nevi, Solar Lentigo, Dermatofibromas (DFs), and Vascular lesions (VASC) [3]. Non-melanocytic lesions lack melanin pigment. Their color affects other factors, i.e., hemoglobin or keratin, such as keratinocytic, VASC, and reactive lesions [4]. Exposure to ultraviolet radiation from the sun increases the risk of these lesions to be malignant pigmented lesions or skin cancer. The pigmented network can be typical or atypical [7]. The typical pigmented network is a regular pigment network, which has uniform brown lines and rete ridges. These ridges are relatively similar in width and equidistant as in Melcyt NV, and non-melanocytic lesions, such as lentigo and dermatofibromas [9]. The pigment network comprises intersecting brown lines in the form of a grid pattern, as shown in Figure 2 [10]. These brown lines refer to disorders of pigmentation either within the keratinocytes or the melanocytes.
Atypical pigment network has irregular lines varying in size, color, thickness, or melanin distribution, often found in Dysp NV [10], as shown in Figure 3. Globules have a round to oval shape located along the borders of a melanocytic lesion and commonly found in a growing nevus [11], as shown in Figure 4.
Streaks are linear pigmented within the borders of a melanocytic lesion and include radial flowing (lineal streaks) and round projections (pseudopods). The symmetrical distribution of streaks along the margin of the melanocytic lesion favors the prognosis of a nevus. Still, asymmetrical distribution can be expected to lead to the spread of melanoma [12], as shown in Figure 5. The pigmented network can be typical or atypical [7]. The typical pigmented network is a regular pigment network, which has uniform brown lines and rete ridges. These ridges are relatively similar in width and equidistant as in Melcyt NV, and non-melanocytic lesions, such as lentigo and dermatofibromas [9]. The pigment network comprises intersecting brown lines in the form of a grid pattern, as shown in Figure 2 [10]. These brown lines refer to disorders of pigmentation either within the keratinocytes or the melanocytes.     Atypical pigment network has irregular lines varying in size, color, thickness, or melanin distribution, often found in Dysp NV [10], as shown in Figure 3. Globules have a round to oval shape located along the borders of a melanocytic lesion and commonly found in a growing nevus [11], as shown in Figure 4.         Streaks are linear pigmented within the borders of a melanocytic lesion and include radial flowing (lineal streaks) and round projections (pseudopods). The symmetrical distribution of streaks along the margin of the melanocytic lesion favors the prognosis of a nevus. Still, asymmetrical distribution can be expected to lead to the spread of melanoma [12], as shown in Figure 5.    However, the significant similarity between peripheral globules and pseudopods is that the bordering globules are small clear spaces separating the globule from the primary tumor mass. They correspond to melanocytic cells and are usually associated with growing nevi. The radial linear extends from the periphery of the lesion in a process known as radial streaming. In histopathology, radial streaming also corresponds to pigmented melanocytes [12]. In contrast, pseudopods are directly connected to the primary tumor mass through a stem, which favors the diagnosis of melanoma. Thus, discrimination between radial lines and pseudopods can guide the diagnosis of malignant melanoma from melanocytic nevi [13].
Thus, circumferentially distributed radial lines characterize the presence of Nevi, but segmental pseudopods most likely represent the occurrence of melanoma. According to the clinical description, malignant lesions are characterized by distinct high borders compared to benign cases [5]. The malignant lesion has a different structure than the benign lesion. Thus, the irregularity of lesion borders can provide intuitive rationale feature space for the judgment of melanoma.
Earlier dermatoscopic techniques, i.e., the asymmetry, border, color, and diameter (ABCD) rule, seven-point checklist, Menzies, and CASH algorithm [14], focused on determining lesion type by determining the lesion's area and recognizing its various features. Recent skin imaging techniques enable the visualization of skin structures, which positively affects the diagnostic capability and enhance results [11]. The distributions of color and texture features also provide good discrimination of pigmented skin lesions from unaffected skin regions in the image.
Color features are essential in the assessment of dermoscopic lesions. Most skin colors are caused by an increase in a given chromophore, i.e., melanin is the most critical chromophore in pigmented lesions. These colors may be brown, black, gray, or blue for pigment; yellow for lipids or keratin; white for collagen; or red for blood [4].
Texture analysis attempts to identify, measure, and detect the differences between different regions. The texture can be measured for a region but cannot be detected for a single point. The textural features analysis methods can be divided into statistical and structural methods. Statistical methods look for pixel value relationships and statistical moments, while structural methods are concerned more with regions such as shapes and edges [15].
Statistical methods describe point pixel properties based on gray level statistical moments, using a co-occurrence matrix or extended species distribution models (SDM) [16]. Statistical methods use local grey level statistics to define texture, which is constant or varies slowly over a textured region [17].
Feature descriptors are generally used for capturing unique metrics from whole images or image regions, including textural, statistical, model-based, and basis space methods [14]. Feature descriptors vary in capturing technique and their attributes. Feature detectors determine interesting features in the image, such as interest point, keypoint, or landmark. The key point detector determines the vector orientation of the neighborhood feature descriptor, which provides some amount of invariance and robustness [18].
The shape descriptor concerns the measuring of several pixel regions among the shapes to be used for descriptor computations [19]. The morphological boundary shape descriptor is a method for defining polygon and boundary shape. Morphological shapes are generally described as blobs.
Thresholding is often the first step in defining object boundaries [18]. Morphological reshape operators clean up the shape boundary by growing or shrinking, using erode and dilate techniques [5].
High-level features are features that have been designed using human-observable formulation models, contrary to low-level features that were designed without describing human-observable formulation models. Integration between high-level and low-level features increases the significance of the feature space, which allows the system to provide a more reasonable justification for the classification decision [20].
The feature space that contains high-level features can describe the other characteristics of melanoma, i.e., the border shape and structure of patterns. Implementing a non-invasive, automated pigmented skin lesion system able to identify the type of the lesion could save lives and reduce unnecessary biopsies, in addition to cost reduction [21].
For this goal, this work develops a feature extraction technique-based greyscale, texture morphology, statistical area filters, and basis space filtering for detecting pigment network. The association between several descriptors for the same object increases discrimination rates.
The objectives of this paper are to develop a staging framework that could regulate the progression of skin lesions and evaluate the use of concerning features identifying pigmented lesions accurately, ranging from a Melcyt NV and Dysp NV to Mel. The feature space, which comprises of low-level features for local region pixel details and high-level features for regional shape metrics, were extracted in a pixel passed manner. The segmentation of melanocytic neoplasms is also conducted in a pixel passed manner.
The extracted feature space is used as inputs to different classifiers to distinguish different phases of melanocyte neoplasm. The proposed system was evaluated on the PH2 dermoscopic images benchmark dataset. The high-level pixel-based features suggested are considered reliable biomarkers for melanoma diagnosis and achieved an average accuracy of 95.1% using the support vector machines (SVM) classifier with the radial basis function (RBF) kernel.
The rest of this paper is organized as follows: Section 2 presents various dermatoscopic techniques for several studies. Section 3 introduces the proposed framework for staging melanocytic neoplasms using high-level pixel-based features. Section 4 describes the dataset used and experimental results. Finally, Section 5 concludes the work presented.

Related Work
Most of the recently evolved dermoscopic algorithms were developed to facilitate the ability to distinguish different types of melanocytic neoplasms, using the ABCD rule and low-level features [22][23][24][25]. However, recent studies suggest incorporation between high-and low-level descriptors is a sign that refers to lesion borders. For example, Gutman et al. [26] demonstrated automatic detection for globule and streak dermoscopic features. They performed localization and classification based on superpixels and dermoscopic features in order to judge the presence and absence of the globules and streaks. They evaluated the results using 807 images of training data and the 335 testing datasets from the international skin Imaging collaboration (ISIC) 2016 dataset, which has a classification accuracy of 91%. ISIC 2026 lacks an intuitive mapping label for globule and streak dermoscopic features for further analysis. Do et al. [27] adopted a melanoma detection system to localize lesions of the skin using a set of image features and a hierarchical approach for segmentation. They computed the border irregularity of the shape features, i.e., convexity, compactness, and distance variance between border points and lesion centroid. They evaluated the results on images obtained from the Singapore National Skin Center, comprised of 117 benign nevi and 67 malignant melanomas, and achieved 89.09% sensitivity and 90% specificity. Pixel-based feature extraction techniques can lead to a better distribution of lesion features to avoid misleading results. Lee et al. [28] designed a multiclass skin disease classification system. Their model comprises of segmentation based (DenseNet and U-net) pre-processing steps fed into a successive fine-tune of classification models. They performed the classification of seven skin diseases to predict the disease class. They achieved results using the HAM10000 Dataset, obtaining classification accuracies of 0.899 and 0.785. The proposed model can be planned to add more flexibility to the system. Abbadi and Faisal [22] presented an automated skin image diagnosis based on ABCD rules and a new asymmetry determination method. They computed asymmetry by dividing the lesion into horizontal and vertical parts, then counted the number of mismatched pixels between the two parts using the union and intersection of the two parts. The proposed method was tested on 220 different images of 120 images from the PH2 database and 100 images from the websites, in which 113 images are of cancer, and 107 images are of non-cancer, achieving an accuracy of 95.45%. They tested accuracy only on a sample of malignant and benign lesion images, in which the PH2 dataset included multiple stages of skin lesion. Nammalwar et al. [29] integrated both texture and color features for segmenting lesions of the skin. They used ABCD's clinical features of pigmented lesions as a measurement to detect and localize lesions in skin images, i.e., characteristics of the color, maximum of diameter, and irregularity of the boundary. They first extracted texture and color information to be used in segmenting lesion boundaries. Modified Kolmogorov-Smirnov (MKS) was used to discriminate the texture distribution to feed into a boundary refinement algorithm in order to obtain the final segmented image. They evaluated the proposed model on 18 skin cancer images obtained from the dermatology gallery, based on the comparison with the Live Wire segmentation results. The author's main concern was to extract significant features for the segmentation method. They did not provide any quantitative comparisons that assure the accuracy of their proposed method. Codella et al. [30] proposed a system for the classification of melanoma using skin dermoscopic images. They combined the more recent machine learning techniques, deep residual networks, and fully convolutional neural networks into ensembles focused recognition techniques. They evaluated the system on the ISBI 2016 dataset and achieved 94.7% accuracy. They showed that the integration between different approaches could have higher performance. They performed a comparison on a fixed dataset partition, but maintaining a held-out dataset comparison is essential for a public challenge. Li and Shen [31] proposed an automated melanoma detection system using two deep learning methods. They used two fully convolutional residual networks (FCRN) simultaneously for deeper classification. They implemented the lesion feature network for dermoscopic feature extraction. They evaluated their model on ISIC 2017 and achieved an accuracy of 0.833. Kawahara et al. [32] employed pooling over a convolutional neural network for augmented feature space. They trained convolutional neural networks (CNN) using natural images to generalize classification to 10 non-dermoscopic classes. They evaluated their model on 1300 images with the 10-class dataset and achieved an accuracy of 81.8%. They focused the comparison on a partition of classes, with poorer comparison significance. Ballerini et al. [21] introduced a hierarchical k-nearest neighbor (KNN) classifier in which images were classified hierarchically into one of the two groups by the top-level classifier using low-level features. Then, within the second level classifier, the images were classified into five diagnostic classes using other subsets of features. The active contour region was used for segmenting lesions. They evaluated their model on a database which comprised of 960 image lesions and achieved a 74% classification accuracy over the five classes of skin lesions. The drawback of their method is that classification mistakes at the top level are not adjusted in the second level, which is known as the "blocking" problem. The number of misclassified images in the first level unbalanced the distribution of classes. Shrestha et al. [33] proposed a system that can discriminate malignant melanoma along with benign dysplastic nevi, using texture measures. Lesions are marked with the aid of a dermatologist. They evaluated their results on 106 dermoscopy images, 28 for melanomas, and 78 for benign dysplastic nevi and achieved an average accuracy of 95.4%. Their work focused on detecting pigment network irregularity for only the earlier melanoma, neglecting other stages in melanoma growth. Ganster et al. [34] used features of size and shape, color, and local parameters to resemble the clinical ABCD that represents the border structure, variate color, asymmetry, and dermatoscopic structures defined by a dermatologist. The feature set was optimized to capture most of the significant information and then were fed into the k-nearest neighbors (KNN) classifier. They evaluated their results on 5393 skin lesion images categorized into three classes with an overall classification accuracy of 88%. It was noticed on their model that the higher cardinality subsets Electronics 2020, 9, 1443 7 of 22 had large variability in the classification performance, which resulted from the classifier over-fitting to the training data. Rezvantalab et al. [35] used CNNs in the classification of different skin diseases, using 120 images from the PH2 dataset and 10,015 from HAM10000. They achieved an average accuracy of 87.13% among skin lesion diseases. Hekler et al. [36] used CNN in the classification of skin lesion images into five diagnostic categories. They evaluated their method on 300 test images (60 for each of the five disease classes from the HAM10000 Dataset and achieved an accuracy of 82.95%. Adekanmi and Sellami [37] performed pixel-wise classification using a softmax classifier for melanoma lesions. They categorized outputs into melanoma and non-melanoma based on results derived from pixel-wise classification. They evaluated the results on the PH2 dataset and achieved 95% accuracy and 92% dice coefficient. Lynn and War [23] demonstrated a detection system for lesion borders capable of extracting relevant features of dermoscopic structures for melanoma. They used a bagging decision tree ensemble classifier to classify features extracted using the ABCD rule. The system performance was evaluated on ISBI2016, ISIC2017, and PH2 benchmarking datasets and achieved an average accuracy of 84.5%. Phillips et al. [38] proposed a Deep Ensemble for Recognition of Melanoma (DERM). Their technique was developed to identify melanoma from pigmented lesion-associated features. Their deep framework adopted a binary classification framework for recognizing melanoma from benign pigmented lesions. They trained their model using 7102 dermoscopic images, including melanoma and benign pigmented lesions. They achieved an average area under the curve (AUC) of 0.93 and an average sensitivity of 85%. Phillips et al. [39] developed an algorithm to assess suspicious from benign skin lesions. They employed a deep ensemble method for recognizing melanoma. They used 1550 images, including suspicious and benign skin lesions. They analyzed biopsied and non-biopsied pigmented skin lesions. They achieved an average AUC of 90.1% for biopsied lesions and 95.8% for other lesions. Haenssle et al. [40] proposed a CNN model to detect melanoma. They used a pre-trained Google Inception v4 model. They utilized sensitivity, specificity, and AUC for evaluating their model. They compared their CNN model against an international group of 58 dermatologists. They achieved an average specificity of 82.5%, sensitivity of 86.6%, and AUC of 88.9%. From the above-mentioned techniques, the techniques that evaluated their work on the PH2 dataset or using multiple lesion diseases were selected for performance comparison. Most of the mentioned techniques rely on capturing diagnosing feature space using low-level features that were not designed with the intent of considering the human-observable phenomenon. A feature set that contains high-level features can provide understandable justification for the system's diagnostic decisions. The pixel-based feature extraction and segmentation technique can capture a single representation that enables visualization of image structures [11]. The distributions of texture and color features enable an excellent differentiation between pigmented skin lesions from unaffected skin regions in the image [41]. Table 1 lists a summary of the comparison of the current discussed related work. For this reason, this work adopted a high-level pixel-based characterization technique for diagnosing skin image lesions to enhance diagnostic capability results. The high-level pixel-based features model proved to be a reliable biomarker for diagnosing melanocytic neoplasms progression.

The Proposed Framework
In this section, the steps of staging melanocytic neoplasms using high-level pixel-based features are discussed in detail. The proposed non-invasive staging framework consists of the following steps, as shown in Figure 6.
stages. Figure 7 shows some examples of original Melcyt NV, Dysp NV, and mel dermoscopic images and their corresponding GTs.

Feature Extraction
Shape and pattern feature descriptors are a significant indicator affecting discrimination. Within the shape feature, every single pixel can be a feature descriptor in discriminating shapes. Thus, shapes and patterns may be represented as a single pixel, pixels in a line, a rectangular region of pixels, a polygon shape, or a region of pixels [16]. Texture can be computed based on global or local descriptors. Local descriptors can be described as statistical relationships among neighboring pixels in a region. Global descriptors can be described as computing pixel value relationships among image regions [15]. Local feature approaches are metrics used to identify the nearest range of features around the interest points within images [17]. Global feature approaches use uniform texture metrics to generalize an entire object with a single vector, such as gray level co-occurrence matrix (GLCMs), Grey-level spatial dependency matrices, co-occurrence matrices, or extended SDMs. Color descriptors are computed by RGB-D (red, green, and blue with its corresponding depth image) data channels for greyscale, intensity, or RGB (red, green, and blue) color. Deep feature hierarchies start with local feature descriptors then produce high-level features in feature detection hierarchal layers, producing more in-depth representations [44]. Feature descriptors can also be dense or sparse, based on the selection of pixels. A dense descriptor uses all the pixels in a specified region or patch as a kernel sampling pattern, i.e., Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF). However, a sparse kernel concerns specific pixels, i.e., the local binary descriptor, where only the selected pixels are used, instead of using all pixels of the region [16].

Pre-Processing and Segmentation
The melanocytic neoplasms images were enhanced using (CLAHE) contrast limited adaptive histogram equalization method. CLAHE is an effective pre-processing technique that yields better discrimination and visualization of skin image lesions [42]. CLAHE uses thresholding, equalization, and bilinear interpolation, which result in a limited homogeneous contrast region [43]. The images of the dataset have various sizes. Features extracted from different sizes of images would not have the same feature values. Thus, image pixel values were rescaled to 768 × 576. The size of 768 × 576 was the optimal size that keeps the original image information. Thus, the images were ready for the segmentation step.
For melanocytic dermoscopic images, pixel-based segmentation was carried out in a single-pixel representation manner. The pixels of the original color images are matched with the pixels of the corresponding ground truth (GT) according to a label given by the metadata file. This technique relies on intensity levels within RGB color channel integer values and the background level. Melanocytic neoplasms were identified by intensity levels within the RGB color channel, represented by integer values, where 4 represents MEL, 2 represents the Dysp NV, 1 represents Melcyt NV, and 0 indicates the background objects. Thus, the final labels resulting from segmenting dermoscopic images had the values {0; 1; 2; 3} that can be used in staging melanocytic neoplasms. The resulting labels were used by the classifier to allocate pixel values of different melanocytic neoplasms to their corresponding stages. Figure 7 shows some examples of original Melcyt NV, Dysp NV, and mel dermoscopic images and their corresponding GTs.

Feature Extraction
Shape and pattern feature descriptors are a significant indicator affecting discrimination. Within the shape feature, every single pixel can be a feature descriptor in discriminating shapes. Thus, shapes and patterns may be represented as a single pixel, pixels in a line, a rectangular region of pixels, a polygon shape, or a region of pixels [16]. Texture can be computed based on global or local descriptors. Local descriptors can be described as statistical relationships among neighboring pixels in a region. Global descriptors can be described as computing pixel value relationships among image regions [15]. Local feature approaches are metrics used to identify the nearest range of features around the interest points within images [17]. Global feature approaches use uniform texture metrics to generalize an entire object with a single vector, such as gray level co-occurrence matrix (GLCMs), Grey-level spatial dependency matrices, co-occurrence matrices, or extended SDMs. Color descriptors are computed by RGB-D (red, green, and blue with its corresponding depth image) data channels for greyscale, intensity, or RGB (red, green, and blue) color. Deep feature hierarchies start with local feature descriptors then produce high-level features in feature detection hierarchal layers, producing more in-depth representations [44]. Feature descriptors can also be dense or sparse, based on the selection of pixels. A dense descriptor uses all the pixels in a specified region or patch as a kernel sampling pattern, i.e., Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF). However, a sparse kernel concerns specific pixels, i.e., the local binary descriptor, where only the selected pixels are used, instead of using all pixels of the region [16]. High-level features capture the human-observable phenomenon by describing border irregularity about a lesion image using a morphological boundary shape descriptor [45]. The incorporation between a small set of high-level features and a set of low-level features affected classification results. A morphological boundary shape descriptor is a method for processing boundary shape [20]. Morphological operations can be applied using a structuring element, such as a disk, to define the object boundary and alter the shape in some deterministic way. Morphological reshape operators clean up the shape boundary by growing or shrinking, using erode and dilate techniques [46]. Other significant indicators are also computed and incorporated into the feature space. These descriptors include color features using different color spaces, texture, and statistical features to identify regions based on texture, or statistical measures around each pixel in the input image [44].

Morphological Boundary Shape Descriptor
The shape descriptor concerns measuring several shapes of the pixel regions required for descriptor computations [19]. The morphological boundary shape descriptor is a method for defining polygon and boundary shape [45]. Morphological shapes generally use thresholding in defining High-level features capture the human-observable phenomenon by describing border irregularity about a lesion image using a morphological boundary shape descriptor [45]. The incorporation between a small set of high-level features and a set of low-level features affected classification results. A morphological boundary shape descriptor is a method for processing boundary shape [20]. Morphological operations can be applied using a structuring element, such as a disk, to define the object boundary and alter the shape in some deterministic way. Morphological reshape operators clean up the shape boundary by growing or shrinking, using erode and dilate techniques [46]. Other significant indicators are also computed and incorporated into the feature space. These descriptors include color features using different color spaces, texture, and statistical features to identify regions based on texture, or statistical measures around each pixel in the input image [44].

Morphological Boundary Shape Descriptor
The shape descriptor concerns measuring several shapes of the pixel regions required for descriptor computations [19]. The morphological boundary shape descriptor is a method for defining polygon and boundary shape [45]. Morphological shapes generally use thresholding in defining object boundaries. The proposed high-level morphological boundary shape descriptors can give a more significant measure for the irregularity of the entire border [20]. Morphological reshape operators can clean up the lesion borders by growing or shrinking, using erode and dilate techniques [46].

Binary Dilation
The binary dilation of A by B, denoted A B, is defined as the set operation [47]: whereB is the structuring element B reflection [17]. The binary dilation cab is defined as the intersection between the set of reflected structuring pixel elements when translated to ‡, and the foreground pixels in A [46].

Grayscale Dilation
In the general form of grayscale dilation, the grayscale dilation of A(x, y) by B(x, y) is defined as [45]: where D B is the structuring element domain of B, and A(x, y) is assumed to be -∞ outside the domain of the image [46]. Grayscale dilation is performed with a flat structuring element (B(x, y) = 0). Grayscale dilation use local-maximum operator structuring element:

Binary Erosion
The binary erosion of A by B, denoted A θ B, is defined as [9]: A θ B = { ‡|(B) ‡ ⊆ A}. The binary erosion cab is defined as the set of pixel locations ‡ given by the intersection between the structuring pixel elements when translated to ‡, and the foreground pixels in A [17].

Grayscale Erosion
The grayscale erosion of A(x, y) by B(x, y) is defined as [47]: where D B is the structuring element domain of B, and A(x,y) is assumed to be +∞ outside the domain of the image. Grayscale erosion is generally performed with a flat structuring element (B(x,y) = 0). Grayscale erosion uses a local-minimum operator structuring element. The operations of morphological dilation and erosion require a flat structuring element as a binary neighborhood, two-dimensional or multidimensional. The origin of the pixels identifies the pixel in the image being processed. The opening operation is computed using a 2D-shaped structuring element, while the morphological closing operation is used to fill the gaps in an image [17].

Local Feature Descriptors
Local texture pixel details are computed using the statistical relationships within pixels neighborhood colorimetric in different color spaces [44]. Texture features are captured using a statistical randomness measure that assigns each output pixel to corresponding useful statistic values. Colorimetric using different color spaces, i.e., RGB, CIE XYZ, CIE Lab, and HSV for better characterization. Color channel (colorRGB) intensities are also normalized to eliminate the noise resulting from lights and shadows for better classification results. The HSV color space (colorHSV) can handle the differences in skin image lesions by removing the effect of illumination changes. Each component can provide different information: the hue (H) component can provide intensity value without illumination changes and the component of saturation (S) can also provide a higher contrast to the image being processed. Furthermore, the CIE color spaces (labCIE and xyzCIE) can achieve higher accuracy results. Thus, they can handle the higher similarity within the skin color images [16].

Feature Reduction
The construction of feature space that is composed of high and local features resulted in a large dimensionality of features. The pixel-based feature extraction technique resulted in the generation of a large feature vector for each pixel. The feature vectors are three variables for each color space (RGB, CIE XYZ, CIE Lab, and HSV), nine variables for texture features, four descriptors for the shape, and five pigment features for the presence of globules and streaks. This ends with a total of 26 variables. To handle the curse of dimensionality, principal component analysis (PCA) is used. PCA is a projection-based technique, which projects data into a set of orthogonal axes to obtain the relevant information and dispose of the rest of the data [48]. The dimensionality reduction for large feature vectors reduced the variables to 17 significant variables. The data being transformed from high to low dimensional space affected the difficulty in processing data.

Classification of Melanocytic Dermoscopic Images
The high and local reduced feature arrays are concatenated with their corresponding labels that refer to the neoplasm phase. As mentioned previously, this work aims to stage melanocytic neoplasms dermoscopic images. The stages of melanocytic neoplasms can be Melcyt NV, Dysp NV, or Mel [23]. The SVM and Gradient Boosted Trees (GBT) classifiers are trained using reduced high-low feature space to categorize the processed pixels into neoplasms stages.

Support Vector Machine Classifier
SVM is a machine learning algorithm for classification. The SVM main objective is to find a dependency description between a set of object measurements, measured variables, and specific properties of these variables. Estimating the dependency between these observations can help in classifying new sets based on heuristics. Estimating the function mapping f: RN → {±1} can determine the corresponding values for new observations [49].

Non-Separable Case
Using hard SVM, the hyperplane cannot separate in all cases. In these cases, the promotion of soft margin SVMs is needed. The soft margin SVMs can found separation hyperplanes using positive slack variables ξ i which can be used to adapt the constraints in the following equations [50]: where the model parameters are the weights w, bias b, and the feature input x. Slack variables adaptation gives SVM the flexibility to reduce the optimization influence by allowing some cases to lie inside the margin or within the cases of the other class, as shown in Figure 8 [50].

RBF SVMs
The RBF kernel can handle the nonlinear cases by mapping samples non-linearly into a higher-dimensional space. The RBF utilizes two parameters: c and γ. Parameter setting issues are to identify suitable values for (c, γ), through which the model can predict unknown data accurately. The parameter setting is commonly performed by approximations or heuristics within iterative processes for the pair values of (c, γ). The RBF kernel in 2D space for two inputs, x and z, can be given by Equation (4) [49].
The term q 2γ x T z 2 gives constant, linear, and second-order terms, while the value of the term c 2γ x T z is bounded within the range [0, 2γ], where 0 ≤ x T z ≤ 1 in L2-norm normalization.

Gradient Boosted Trees:
GBT is a combination of accurate weak learners (e.g., decision trees), to create a powerful and effective predictive model [51]. The GBT ensemble boosting relies on minimizing the loss function by calculating the mean squared error between the target and predicted outputs when doing gradient boosting [52]. GBT has great popularity due to its accurateness, flexibility, and robustness, with less computational resources [52]. This is given a training set, {( 1 , 1 ), …, ( , )}, to minimize the expected value of loss function ( , ( )) [53]: ̂= arg min , [ ( , ( ))].
Gradient boosting approximates ̂ as a weighted sum of weak learners of function ℎ( ) and the coefficient [52].
When using GBT to fit the model, the input space is partitioned into disjoint regions 1 , ….., . Thus, the base learner ℎ( ) for the tree ensemble model can be calculated by Equation (7) [53].
where is the predicted value within and I is a convex loss function [54]. For each iteration, the model is updated, as in Equation (8) [52]. The update rule is modified by Friedman by using for each region, instead of , as shown in Equation (9) [53]. The final gradient boosting can be obtained by minimizing the objectives as in Equation (10) [54].

Staging Melanocytic Neoplasms
The single-pixel representation was obtained from the segmentation step, using the original color images and their corresponding GTs, resulting in identified intensity levels within the RGB

Gradient Boosted Trees:
GBT is a combination of accurate weak learners (e.g., decision trees), to create a powerful and effective predictive model [51]. The GBT ensemble boosting relies on minimizing the loss function by calculating the mean squared error between the target and predicted outputs when doing gradient boosting [52]. GBT has great popularity due to its accurateness, flexibility, and robustness, with less computational resources [52]. This is given a training set, {(x 1 , y 1 ), . . . , ( x n , y n )}, to minimize the expected value of loss function L(y, F(x)) [53]: Gradient boosting approximatesF as a weighted sum of weak learners of function h(x) and the coefficient γ i [52].
When using GBT to fit the model, the input space is partitioned into disjoint regions R 1m,...,j . Thus, the base learner h(x) for the tree ensemble model can be calculated by Equation (7) [53].
where b jm is the predicted value within R jm and I is a convex loss function [54]. For each iteration, the model is updated, as in Equation (8) [52]. The update rule is modified by Friedman by using γ jm for each region, instead of b jm , as shown in Equation (9) [53]. The final gradient boosting can be obtained by minimizing the objectives as in Equation (10) [54].

Staging Melanocytic Neoplasms
The single-pixel representation was obtained from the segmentation step, using the original color images and their corresponding GTs, resulting in identified intensity levels within the RGB color channel in integer values, where 4 represents Mel, 2 represents the Dysp NV, 1 represents Melcyt NV, and 0 indicates the background objects. Thus, the final labels resulted from segmenting dermoscopic images {0; 1; 2; 3} are used to allocate pixel values of different melanocytic neoplasms to their corresponding stages.
Label {1} Melcyt NV has a typical pigment network with uniform brown lines and regular equidistant rete ridges. Label {2} Dysp NV is characterized by a typical pigment network with irregular grid lines, globules, and streaks that may be found with the symmetrical distribution. Label {3} Mel is characterized by a typical pigment network with irregular grid lines and asymmetrical distribution of globules and streaks, with the spread of melanoma expected [55].
The formation of Mel can be seen as the deterioration of pre-malignant skin lesions in the functionality and structure of the infected lesion. They emerge from various pigmented lesions that appear on sun-damaged areas relative to limited photo-protection. Therefore, the distinction between pigmented pre-malignant and malignant lesions is a challenging task that can lead to early detection of different types of lesions and minimization of unnecessary biopsies. Melanocytic neoplasms have different dermoscopy structures, such as pigment network, globules, and streaks. Detection of streaks, globules, and pigment network is very significant in the assessment of the malignancy of a lesion [55]. Table 2 lists the pseudo-code for the staging melanocytic neoplasms.

Start
Load the training data Load the corresponding GT data Load the labels CSV file Step 1. Pre-processing: Step 1.1 Image resize: Images were rescaled to 768 × 576 Step 1.2 Image Enhancement: Images were enhanced using CLAHE.
Step 1.3 Image Conversion: Images were converted to grayscale.
Step 2. Pixel base Segmentation: For Original Color Images For Corresponding GT Images For CSV labels file Test whether a label is melcyt nv, dysp nv or mel Use Corresponding GT mask along with Original Color Images End for Return single pixel labels have the values {0; 1; 2; 3}; 0 for background, 1 for melcyt nv lesion, 2 for dysp nv and 3 for mel Save integer intensity levels array of segmented images as labels mat file. End High-level features using a morphological boundary shape descriptor can describe the border irregularity to consider the human-observable phenomenon. Melanocytic neoplasms can be Melcyt NV, Dysp NV, or Mel [45]. Abnormal growth and proliferation of abnormal cells can increase malignancy in malignant cases. Melcyt NV is characterized by a typical pigment network with uniform brown lines and regular equidistant rete ridges. Dysp NV has an atypical pigment network with irregular grid lines, globules, and streaks that may found with the symmetrical distribution. Mel characterized by an atypical pigment network with irregular grid lines, and asymmetrical distribution of globules and streaks means that the spread of melanoma is expected. The border irregularity features, i.e., pigment network, streaks, and globules, are mapped to intuitive labels given by the PH2 dataset [56] set to construct the high-level features. The local descriptors in the form of color and texture features are also extracted in a pixel-based manner. The small set of high-level features and the set of low-level features are incorporated together to construct the feature space that is used for staging melanocytic neoplasms. The high and low-level feature space is concatenated with clinical diagnosis labeling to construct a training set to be fed into the classifier.

Dataset
For evaluating results, 150 skin image lesions from the PH2 dataset [56] are used. The PH2 dataset of dermoscopic images has been developed for benchmarking research purposes, to facilitate comparative studies for segmentation and classification algorithms. The database of PH2 dermoscopic images has been acquired at the Dermatology Service of Hospital Pedro Hispano, Matosinhos, Portugal. The dermoscopic images of the PH2 database are 8-bit RGB color images with a resolution of 768 × 560 pixels, the same conditions as the Tuebinger Mole Analyzer system, and a magnification of 20×. This dermoscopic database has a total of 200 dermoscopic images of melanocytic neoplasms, including 80 benign nevi (non-melanoma), 80 atypical nevi, and 40 melanomas. This database is comprised of training images and their corresponding GTs, clinical label diagnoses, and high-level intuitive labels. The PH2 dermoscopic images were rescaled to 768 × 576 to unify the size of the skin lesion. Different evaluation metrics were adopted to check the significance of the proposed melanoma characterization framework.

Hardware and Software Specifications
This work was implemented on an HP (Hewlett-Packard, Palo Alto, CA, USA) Envy laptop with AMD FX-7500 Radeon R7 CPU, 10 Compute Cores 4C + 6G at 2.10 GHz, and 6 GB RAM: Windows 10, 64-bit operating system, 64-based processor system type. The first processes of the algorithm were implemented by MATLAB 2018a. Then, comparisons are evaluated using RapidMiner Studio.

Performance Evaluation
The four outcomes of P positive instances and N negative instances formulates the 2 × 2 confusion matrix for the experiment. The area under the receiver operating characteristics (ROC) curve (AUC) is an important evaluation metric for checking the performance of the classification model. The higher the AUC, the better the model in distinguishing between classes [57]. The ROC curve is plotted with the true positive rate (TPR) against the false positive rate (FPR), where TPR is on the y-axis, and FPR is on the x-axis [57]. The accuracy (ACC) measure is used to check the capability of the classification model. Accuracy can be calculated using Equation (11). The sensitivity (Sen) or Recall measure is used to check the capability of a classifier to recognize the positive class patterns. The sensitivity of the classifier can be determined using Equation (12). The specificity (Spec) measure is used to check the capability of a classifier to recognize the patterns of negative class. It can be calculated using Equation (13). The F measure or dice similarity coefficient (DSC) considers both precision and recall, measuring the accuracy of the test. DSC ranges from 0 worst score to 1 best score as a weighted average of precision and recall. The DSC measure can be calculated using Equation (14) [24].

Results
Various classification models are used for comparisons to evaluate the proposed technique. SVM, GBT, random forest (RF), Naïve Bayes (NB), and deep learning (DL) classifiers are used for comparison. Significant evaluation metrics are used to check the capability of the distinguishing between classes. ACC, AUC, DSC, Sen, and Spec are used as performance indicators. The proposed framework achieved higher performance results using the SVM and GBT classifiers. During the training of the GBT classifier, the optimal parameters were 150 trees in the forest with a maximal depth of 7. The optimal learning rate was 0.1. The RBF kernel parameters were exploited in training, gamma was set to 0.01, and the C parameter was set to 1000. The SVM model included a total of 1176 support vectors and 1.234 Bias (offset).
To evaluate the results of the proposed system, the computed results were compared with other different state-of-the-art classifiers, which are RF, NB, and DL. RF classifier utilized 140 decision trees in the forest, with a maximal depth of 7. The NB model built the Bayesian classification method. The data distribution was modeled with best-fit Gaussian and multinomial distribution. The DL model is constructed based on a multi-layer feed-forward neural network, which is based on back-propagation. This model was used as a classifier, which was trained by the previously extracted feature. The network structure consists of an input layer, three hidden layers, and an output layer. The input layer consists of 17 input feature neurons resulting from dimensionality reduction. The first and second hidden layers consist of 50 neurons each. The last hidden layer consists of 25 neurons. The output layer has four neurons based on the number of tested classes. We conducted hyper-parameterization to choose the optimal values for the learning rate, momentum training, annealing rate, regularization, and loss function in order to enable high predictive accuracy. The number of images was extended using an augmentation technique in different transformations, i.e., rotation, shifting, scaling (zoom in/out), and flipping. The parameters and their optimal values for the comparing classifiers are listed in Table 3. The 10-fold cross-validation technique is employed to evaluate the performance of the proposed system. For the 10-fold cross-validation, the dataset is split into 80 for training and 20 for the validation set. The performance evaluation using 10-fold cross-validation for the proposed technique against other classifiers can be seen in Table 4. Regarding the comparison measurements in Table 4, the higher performance was achieved using the SVM classifier, rather than GBT.  35.8%, respectively. The results of the proposed system, using the SVM and GBT classifiers, outperforms other state-of-the-art techniques. The SVM model achieved higher results because of its potential for high accuracy with few training sets [24]. The SVM multiclass classifier has the ability to map the class of interest, locate the support vectors, and use the optimal kernel function that makes the classifier more flexible and robust against the outliers [25].
In addition, we used a four-fold cross-validation technique to validate the obtained results. For the four-fold cross-validation, the data set is split into 70 for training and 40 for the validation set. Table 5 shows a comparison of the proposed technique against other classifiers by using a four-fold cross-validation technique. Higher performance was achieved using the SVM classifier than GBT. The proposed system achieved an average ACC of 92.9%, AUC of 0.959, DSC of 95.3%, Sen of 98.8%, and Spec of 86.7% using SVM. Furthermore, the proposed system achieved an average ACC of 92.6%, AUC of 0.959, DSC of 94.2%, Sen of 93.5%, and Spec of 77.5% using GBT. To visualize the performance for the proposed diagnostic system, a ROC curve was constructed for the proposed model, along with the other tested classifier. A ROC curve is created by plotting the TPR against the FPR. Figure 9 shows the relationship between sensitivity and specificity for all tested classifiers. The results show the quality of the proposed model's predictions, along with the other tested classifiers. Electronics 2020, 9, x FOR PEER REVIEW 18 of 22 The literature works include some techniques based on a binary output classifier and others based on a multiclass output classifier for evaluating their results. The multiclass output techniques were used for performance comparison with the proposed framework. Performance comparison of researchers that evaluated their work on the same PH2 dataset and adopted a multiclass output classifier as the proposed work is presented. Performance comparison with state-of-the-art multiclass techniques is shown in Figure 10.

Discussion
Earlier dermoscopic techniques relied on capturing diagnosing feature space using low-level features that were not designed with the intent of considering the human-observable phenomenon. A feature set containing high-level features can provide understandable justification for the system's diagnostic decisions. The pixel-based feature extraction and segmentation technique can capture a single representation that enables the visualization of image structures. The distributions of texture and color features enable an excellent differentiation between pigmented skin lesions from unaffected skin regions in the image. For this reason, this work adopted a high-level pixel-based characterization 87 Rezvantalab et al. [49] Phillips et al. [45] The Proposed Model Accuracy Figure 9. The area under the receiver operating characteristics (ROC) curve for the tested classifiers.
The literature works include some techniques based on a binary output classifier and others based on a multiclass output classifier for evaluating their results. The multiclass output techniques were used for performance comparison with the proposed framework. Performance comparison of researchers that evaluated their work on the same PH2 dataset and adopted a multiclass output classifier as the proposed work is presented. Performance comparison with state-of-the-art multiclass techniques is shown in Figure 10. The literature works include some techniques based on a binary output classifier and others based on a multiclass output classifier for evaluating their results. The multiclass output techniques were used for performance comparison with the proposed framework. Performance comparison of researchers that evaluated their work on the same PH2 dataset and adopted a multiclass output classifier as the proposed work is presented. Performance comparison with state-of-the-art multiclass techniques is shown in Figure 10.

Discussion
Earlier dermoscopic techniques relied on capturing diagnosing feature space using low-level features that were not designed with the intent of considering the human-observable phenomenon. A feature set containing high-level features can provide understandable justification for the system's diagnostic decisions. The pixel-based feature extraction and segmentation technique can capture a single representation that enables the visualization of image structures. The distributions of texture and color features enable an excellent differentiation between pigmented skin lesions from unaffected skin regions in the image. For this reason, this work adopted a high-level pixel-based characterization 87 Rezvantalab et al. [49] Phillips et al. [45] The Proposed Model Accuracy Figure 10. Performance comparison with state-of-the-art multiclass techniques based on accuracy.

Discussion
Earlier dermoscopic techniques relied on capturing diagnosing feature space using low-level features that were not designed with the intent of considering the human-observable phenomenon. A feature set containing high-level features can provide understandable justification for the system's diagnostic decisions. The pixel-based feature extraction and segmentation technique can capture a single representation that enables the visualization of image structures. The distributions of texture and color features enable an excellent differentiation between pigmented skin lesions from unaffected skin regions in the image. For this reason, this work adopted a high-level pixel-based characterization technique for diagnosing skin image lesions to enhance diagnostic capability results. The proposed model is evaluated on the PH2 dataset of dermoscopic images acquired from Hospital Pedro Hispano, Portugal. Several researchers have evaluated their model on the PH2 dataset of dermoscopic images. Some of these researchers evaluated their model on the PH2 dataset, i.e., Adekanmi et al. [37], Lynn et al. [23], and Abbadi et al. [22], and adopted a binary output classifier in their model. Adekanmi et al. [37], Lynn et al. [23], and Abbadi et al. [22] achieved average accuracies of 95%, 84.5%, and 95.45%, respectively, but in binary output classifiers.
Other researchers, i.e., Rezvantalab et al. [35], evaluated their model on the PH2 dataset and adopted a multiclass output classifier for evaluating their results. Rezvantalab et al. [35] achieved 87.13% accuracy adopting a multiclass classifier on the PH2 dataset. Others adopted a multiclass output classifier for evaluating their model using a different dataset, i.e., Hekler et al. [36] and Codella et al. [30]. Hekler et al. [36] and Codella et al. [30] achieved accuracies of 82.95% and 76%, respectively. The proposed technique outperforms the results achieved by other researchers, achieving 92.2% multiclass classification on the PH2 dataset for characterizing different melanocytic neoplasms stages.

Conclusions
This paper proposes a comprehensive pixel-based framework for staging melanocyte neoplasms. The framework proposed uses high-level analytical reasoning to describe border irregularity, in addition to various feature descriptors, i.e., color, texture. Different types of features are derived from staging growing of lesions, from benign lesions in terms of NV up to pre-malignant Dysp NV or malignant melanoma. The distributions of texture and color features enable differentiation between pigmented and unaffected skin regions within the images. The adopted a high-level pixel-based technique assisted the extraction of significant features for training the model. These features are color features in different color spaces, local statistics, and texture morphology. The mapping between the high-level features to intuitive labels given by the PH2 data set assisted the construction of feature space. The incorporation between a small set of high-level and low-level features affected the classification results. Staging melanocyte neoplasms were carried out by training SVM and GBT classifiers, with the extracted feature space along with clinical labels. The results show that the proposed system can help in guiding the diagnosis of pigmented skin lesions at different stages. The diagnosis of skin lesions at earlier stages can help in improving the durability of skin cancer and reducing the skin cancer mortality rate. For future works, more analysis of the images and expansion of the image database is required for more promising results. On the other hand, DL performs better with a large number of data. Therefore, future works will also seek to investigate DL with a large skin cancer dataset. We will work to employ a CNN with a multi-path to classify different grades of skin cancer.