Detection and Classification of Immature Leukocytes for Diagnosis of Acute Myeloid Leukemia Using Random Forest Algorithm

Acute myeloid leukemia (AML) is a fatal blood cancer that progresses rapidly and hinders the function of blood cells and the immune system. The current AML diagnostic method, a manual examination of the peripheral blood smear, is time consuming, labor intensive, and suffers from considerable inter-observer variation. Herein, a machine learning model to detect and classify immature leukocytes for efficient diagnosis of AML is presented. Images of leukocytes in AML patients and healthy controls were obtained from a publicly available dataset in The Cancer Imaging Archive. Image format conversion, multi-Otsu thresholding, and morphological operations were used for segmentation of the nucleus and cytoplasm. From each image, 16 features were extracted, two of which are new nucleus color features proposed in this study. A random forest algorithm was trained for the detection and classification of immature leukocytes. The model achieved 92.99% accuracy for detection and 93.45% accuracy for classification of immature leukocytes into four types. Precision values for each class were above 65%, which is an improvement on the current state of art. Based on Gini importance, the nucleus to cytoplasm area ratio was a discriminative feature for both detection and classification, while the two proposed features were shown to be significant for classification. The proposed model can be used as a support tool for the diagnosis of AML, and the features calculated to be most important serve as a baseline for future research.


Introduction
Acute myeloid leukemia (AML) is the deadliest of the four types of leukemia, accounting for 11,000 annual deaths in the US with an average five-year survival rate of 28.7% [1]. AML is characterized by the overproduction and accumulation of immature leukocytes, specifically myeloid precursors, in the bone marrow and peripheral blood. The immature white blood cells prevent the functions of the bone marrow, including the production of red blood cells and platelets, which makes the immune system vulnerable [2,3]. Detecting and classifying immature leukocytes is crucial for the diagnosis of AML.
Progressing rapidly, AML can be fatal within months or even weeks if not diagnosed and treated immediately [4]. Hence, accurate and quick diagnosis is necessary for AML patients. Microscopic examination of peripheral blood smears is the standard procedure for the diagnosis of leukemia, but other procedures are also used [5]. Manual blood smear examination is labor intensive and time consuming [6]. Moreover, manual examination is prone to considerable inter-and intra-observer color space, are proposed and demonstrated to be discriminative. Furthermore, the most important features for both detection and classification are calculated and ranked using the Gini importance, which is defined as the loss of Gini impurity caused by each feature in the random forest. To the best of the authors' knowledge, this is the first study that calculates the Gini importance of a multitude of morphological features for classification of leukocytes in AML.

Dataset
Labelled images of leukocytes from the peripheral blood of 100 AML patients and 100 healthy controls were collected from the dataset assembled by Matek et al. [20] in The Cancer Imaging Archive [21]. The dataset contains a total of 18,365 images centered around a leukocyte with ground truth labels that classify images by leukocyte type (Figure 1). Ground truth annotations were made by a medical examiner experienced in cytomorphology [12,20].
Bioengineering 2020, 7, x FOR PEER REVIEW 3 of 12 classification of leukocytes, specifically the average and standard deviation of nucleus color intensity in the B channel of LAB color space, are proposed and demonstrated to be discriminative. Furthermore, the most important features for both detection and classification are calculated and ranked using the Gini importance, which is defined as the loss of Gini impurity caused by each feature in the random forest. To the best of the authors' knowledge, this is the first study that calculates the Gini importance of a multitude of morphological features for classification of leukocytes in AML.

Dataset
Labelled images of leukocytes from the peripheral blood of 100 AML patients and 100 healthy controls were collected from the dataset assembled by Matek et al. [20] in The Cancer Imaging Archive [21]. The dataset contains a total of 18,365 images centered around a leukocyte with ground truth labels that classify images by leukocyte type (Figure 1). Ground truth annotations were made by a medical examiner experienced in cytomorphology [12,20].  Table 1 displays the number of images from each leukocyte type used in this study. Classes of immature leukocytes with less than 20 images (bilobed promyelocytes and metamyelocytes) were omitted because after splitting into training and testing sets, an insufficient number of images would remain in the testing set for statistically significant results. For the myeloblast class, which contained over 3000 images, a random sample of 500 images were used. In total, 731 immature leukocytes were used with considerable imbalance across classes. Data augmentation was not utilized to increase samples in minority classes because morphological features are invariant regardless of rotations and reflections. A total of 600 mature leukocytes were used to provide a control group for the detection of mature leukocytes.

Methodology
The methodology consisted of four main phases: segmentation, feature extraction, classification, and calculation of feature importance. During segmentation, binary masks of the cell and nucleus were obtained for each image. A total of 16 features were extracted to be inputted into a random forest algorithm for classification between immature and mature cells, as well as further classification of immature cells. Finally, the importance of each feature was calculated using the metrics of the  Table 1 displays the number of images from each leukocyte type used in this study. Classes of immature leukocytes with less than 20 images (bilobed promyelocytes and metamyelocytes) were omitted because after splitting into training and testing sets, an insufficient number of images would remain in the testing set for statistically significant results. For the myeloblast class, which contained over 3000 images, a random sample of 500 images were used. In total, 731 immature leukocytes were used with considerable imbalance across classes. Data augmentation was not utilized to increase samples in minority classes because morphological features are invariant regardless of rotations and reflections. A total of 600 mature leukocytes were used to provide a control group for the detection of mature leukocytes.

Methodology
The methodology consisted of four main phases: segmentation, feature extraction, classification, and calculation of feature importance. During segmentation, binary masks of the cell and nucleus were obtained for each image. A total of 16 features were extracted to be inputted into a random forest algorithm for classification between immature and mature cells, as well as further classification of immature cells. Finally, the importance of each feature was calculated using the metrics of the random forest algorithm. The project was coded in the Python programming language [22] with numerous open source libraries [23][24][25][26][27], including sci-kit image for feature calculation and sci-kit learn for machine learning implementation.

Segmentation
The objective of segmentation (see Figure 2) was to obtain masks of the cell and nucleus, from which morphological features could be extracted from. To obtain a cell mask, each image was converted to LAB format to better differentiate the cytoplasm from background cells [16,28] (see Figure 2b). Multi-Otsu thresholding [29] with three thresholds was used to group image pixels into four clusters: image background, background cells, cytoplasm of cell, and nucleus of cell (see Figure 2c). Since the nucleus and cytoplasm had the highest and second-highest intensities, respectively, components below the second multi-Otsu threshold were removed. Morphological dilation followed by erosion by the same factor was used to separate noise from the region of interest (ROI). A few images contained multiple stained leukocytes, thus a positional filter was applied to only select the ROI in the center of the image. The image was smoothed with the removal of small objects with an area below 2000 to obtain the final cell mask, as displayed in Figure 2d.

Segmentation
The objective of segmentation (see Figure 2) was to obtain masks of the cell and nucleus, from which morphological features could be extracted from. To obtain a cell mask, each image was converted to LAB format to better differentiate the cytoplasm from background cells [16,28] (see Figure 2b). Multi-Otsu thresholding [29] with three thresholds was used to group image pixels into four clusters: image background, background cells, cytoplasm of cell, and nucleus of cell (see Figure  2c). Since the nucleus and cytoplasm had the highest and second-highest intensities, respectively, components below the second multi-Otsu threshold were removed. Morphological dilation followed by erosion by the same factor was used to separate noise from the region of interest (ROI). A few images contained multiple stained leukocytes, thus a positional filter was applied to only select the ROI in the center of the image. The image was smoothed with the removal of small objects with an area below 2000 to obtain the final cell mask, as displayed in Figure 2d.
A similar process was carried out to obtain a binary mask of the nucleus. Since the nucleus was always the darkest component, the image was converted to gray scale for the best discrimination of the nucleus, as shown in Figure 2e. Multi-Otsu thresholding with two thresholds was utilized to group pixels into either the background, background cells and cytoplasm, or nucleus of the cell (see Figure 2f). In some images, noise would be grouped in the same cluster as the nucleus. To overcome this, only connected components containing the center pixel were kept to discern the final binary nucleus mask. As exhibited in Figure 2g, the nucleus mask was subtracted from the cell mask to obtain a binary mask of the cytoplasm. The proposed segmentation procedure successfully segmented 1070 out of 1274 images (83.99%). The majority of failed segmentation can be attributed to background cells overlapping with the ROI and stain obscuring the ROI. Given that the dataset was collected for a study with a CNN, which does not require segmentation and feature extraction, the segmentation results are acceptable. Table 2 displays the number of images remaining in each class after segmentation. A similar process was carried out to obtain a binary mask of the nucleus. Since the nucleus was always the darkest component, the image was converted to gray scale for the best discrimination of the nucleus, as shown in Figure 2e. Multi-Otsu thresholding with two thresholds was utilized to group pixels into either the background, background cells and cytoplasm, or nucleus of the cell (see Figure 2f). In some images, noise would be grouped in the same cluster as the nucleus. To overcome this, only connected components containing the center pixel were kept to discern the final binary nucleus mask. As exhibited in Figure 2g, the nucleus mask was subtracted from the cell mask to obtain a binary mask of the cytoplasm.
The proposed segmentation procedure successfully segmented 1070 out of 1274 images (83.99%). The majority of failed segmentation can be attributed to background cells overlapping with the ROI and stain obscuring the ROI. Given that the dataset was collected for a study with a CNN, which does not require segmentation and feature extraction, the segmentation results are acceptable. Table 2 displays the number of images remaining in each class after segmentation. The purpose of feature extraction was to obtain a set of descriptors that are discriminative for classification of leukocytes. From each image, 16 cytomorphological features were extracted, which could be divided into four categories: nucleus size, nucleus shape, elliptical features, and color features. Nucleus size features consisted of area, perimeter, area to perimeter ratio, equivalent diameter [19], and nucleus to cytoplasm area ratio (N:C ratio) [30]. Size features of the nucleus are important for classifying leukocytes because as leukocytes mature, the nucleus decreases in size [16]. The nucleus shape features include circularity (Equation (1)), solidity (Equation (2)), and compactness (Equation (3)) calculated as follows: In the above equations, A is the area of the nucleus, A c is the area of the convex hull of the nucleus, and P is the perimeter of the nucleus. Elliptical features included eccentricity, minor axis length, major axis length, and elongation. Eccentricity (Equation (4)) and elongation (Equation (5)) were calculated as follows: here, D f is the focal distance, l M is the major axis length, and l m is the minor axis length. Due to the unique morphological characteristics of leukocytes, the standard features used for classification of tumors are not sufficient [28]. We propose two new color features in this study: average and standard deviation of nucleus in the B channel of LAB color space. Additionally, two cytoplasm color features conceived by Ghane et al. [28] are also used. Color features have been demonstrated in previous studies to be significant for classification of leukocytes [6,16]. All 16 features for each image were added to a feature matrix, which served as the input for the classifier. Morphological features of the whole cell were not used, with the exception of the cytoplasm area in the N:C ratio, because the positioning of background cells dictates the shape and orientation of the cytoplasm of the leukocyte. Therefore, the shape of the cytoplasm would be highly variable and not correlated with leukocyte type. While the study did not utilize texture and fractal features, previous works have utilized them and obtained successful results [16]. Future studies can employ texture and fractal features, which may improve classification performance.

Classification
A random forest algorithm was chosen for classification because of its higher performance with imbalanced data when compared to other machine learning classifiers [31][32][33][34]. A random forest algorithm is an ensemble classifier that combines a specified number of decision trees and takes the majority decision to predict classification, thus preventing overfitting.
In the classification step, binary classification between immature and mature leukocytes was first performed, followed by classification of immature leukocytes into four types. For binary classification, 80% of the data in the features matrix was used for training and 20% was reserved for testing of the model. For multiclass classification of immature leukocytes, 70% of the data was used for training and 30% was used as the testing set. All splitting of data into training and testing sets was randomized. A random forest classifier with 100 trees was initially tested and evaluated for both binary and multiclass classification. Binary classification was quantitatively evaluated on the testing set with accuracy, precision, recall (equivalent to sensitivity), and specificity as performance metrics. The performance metrics were based on the possible outcomes of classification: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Binary classification performance metrics, namely accuracy (Equation (6)), precision (Equation (7)), recall (Equation (8)), and specificity (Equation (9)), were defined as follows: Multiclass classification was evaluated on the testing set with overall accuracy, precision for each class, and recall for each class. For multiclass classification, a true positive refers to an image correctly being given a label; a true negative refers to an image correctly being not given to a label; a false positive refers to an image incorrectly being given a label; and a false negative refers to an image incorrectly not being given a label. The multiclass classification model was optimized through a search of ten randomized combinations of random forest hyperparameters. Combinations of parameters were evaluated with the mean of precision scores across classes during five-fold cross validation on the training set. The class weight parameter, which is set to none in the default setting, was selected to be balanced to overcome the imbalance of data. A balanced random forest classifier uses that size of each class to assign weights inversely proportional to the frequency of each class. The optimized model was assessed with the same metrics as the initial multiclass classifier.

Calculation of Feature Importance
The importance of each feature was quantitatively evaluated with Gini importance, also called mean decrease in impurity (MDI). The Gini importance of a feature in a random forest algorithm is defined as the mean reduction in Gini impurity across all decision trees caused by the feature [34]. For each of the 16 features, the five most important features were ranked to establish which features are most crucial for classifying leukocytes.

Detection of Immature Leukocytes
After the random forest model was trained, the performance was evaluated with the previously listed performance metrics. Table 3 displays the performance of the model for binary classification between immature and mature leukocytes on the training and testing set with the random forest algorithm. The model classified all the images in the training set correctly, with 92.99% accuracy on the testing set. Precision, recall, and specificity values for the testing set also were above 90%. Table 4 displays the confusion matrix for binary classification, which is the number of correct and incorrect predictions for each class. Compared to the accuracy and recall of the model, the precision and specificity are slightly lower due to the number of false positives. Although not ideal, high recall is preferred over precision for fatal diseases such as AML. Figure 3 displays the receiver operating characteristic (ROC) curve, which plots the false positive rate against the true positive rate for the binary classifier.
Bioengineering 2020, 7, x FOR PEER REVIEW 7 of 12 Table 3. Performance metrics of the optimized model for binary classification between immature and mature leukocytes on training and testing sets. Compared to the accuracy and recall of the model, the precision and specificity are slightly lower due to the number of false positives. Although not ideal, high recall is preferred over precision for fatal diseases such as AML. Figure 3 displays the receiver operating characteristic (ROC) curve, which plots the false positive rate against the true positive rate for the binary classifier. The area under the curve of the ROC curve (AUC-ROC) is 0.98, which is comparable to the current state of art model in the study by Matek et al. [12], which achieved an AUC-ROC of 0.992. The high-performance metrics display that the proposed random forest classifier can be used as an effective tool for identifying immature cells in the diagnosis of AML.

Classification of Immature Leukocytes
For multiclass classification, the initial random forest model obtained precision and recall above 85% for all classes except the promyelocyte class. The optimized model, which was constructed with average precision as the scoring metric, obtained precision above 65% for all classes (see Table 5). The model achieved above 90% precision and recall for the myeloblast class, which is the most common immature leukocyte in AML patients. The area under the curve of the ROC curve (AUC-ROC) is 0.98, which is comparable to the current state of art model in the study by Matek et al. [12], which achieved an AUC-ROC of 0.992. The high-performance metrics display that the proposed random forest classifier can be used as an effective tool for identifying immature cells in the diagnosis of AML.

Classification of Immature Leukocytes
For multiclass classification, the initial random forest model obtained precision and recall above 85% for all classes except the promyelocyte class. The optimized model, which was constructed with average precision as the scoring metric, obtained precision above 65% for all classes (see Table 5). The model achieved above 90% precision and recall for the myeloblast class, which is the most common immature leukocyte in AML patients. The results are superior to previous state of art [12], which achieved precision scores of below 65% for classification of most immature leukocyte types despite very high performance in detection. While past research has obtained low performance on the minority class, the proposed model achieved 100% recall on the monoblast class. The model had the lowest performance on the promyelocyte class, with precision and recall scores below 85% for both the initial and optimized model. Table 6 displays the confusion matrix for the optimized multiclass model, which shows that the majority of incorrect predictions on the promyelocyte class labeled the image as a myeloblast. The lower performance on the promyelocyte class can be attributed to the fact that promyelocytes and myeloblasts are consecutive steps in the cell lineage of myeloid cells, therefore the two classes share morphological characteristics [35]. Likewise, the majority of incorrect predictions on the myeloblast class classified the image as a promyelocyte due to the similarity between classes.

Most Important Features
Based on Gini importance, the most important features for detection (see Table 7) and classification (see Table 8) were calculated. The five most important features for detection are all either nucleus size features or elliptical features, which is explained by the trait of leukocyte to decrease in size as the cell matures [16].  The N:C ratio was calculated to be a significant discriminator for both detection and classification of immature leukocytes. This finding is supported by previous research that classified hemocyte precursors [6]. For classification, the Gini importance of the two proposed nucleus color features were the highest of all 16 features, while cytoplasm color features from [28] were also displayed to be discriminative.

Conclusions
To overcome the limitations of the manual diagnosis methodology for AML, a random forest model for automatic detection and classification of immature leukocytes was presented. The model was capable of detecting immature leukocytes with 93% accuracy and 0.98 AUC-ROC, which is on par with the current state of art [12]. Furthermore, the model achieved precision of above 65% for each of the four immature leukocyte classes during multiclass classification, despite imbalance in numbers across classes, which is an improvement over previous research. Using Gini importance, N:C ratio was determined to be significant for both detection and classification, while the proposed color features of the nucleus in the B channel of LAB color space were calculated to be important for classification.
Applications of the study are two-fold. While the proposed model cannot diagnose AML alone, it can be used as an effective support tool for doctors to reduce the time and cost required for the diagnosis of AML. The high accuracy of the model in binary classification demonstrates that the model can serve as an efficient screening tool, which can rapidly identify potentially cancerous cells for further examination by a doctor [36][37][38]. The proposed model can expedite the detection of AML by identifying immature leukocytes, especially in developing countries where diagnosis takes numerous weeks, and potentially save lives because early diagnosis is vital for treatment success in AML patients [9,39]. In addition, the precise classification of immature leukocytes can aid in treatment and prognosis decisions, which differ based on the type of cancerous cell [40,41]. The second application of this study is in future research, where the features calculated to be most important and the proposed features can be used to elevate the classification performance.
An important future direction is to gather a comprehensive dataset and develop a machine learning classifier that can classify all the types of immature leukocytes and work with imbalanced data. Future studies can expand on this work by calculating and ranking the importance of additional morphological features for the classification of leukocytes. Improving the discrimination between similar cell types, such as myeloblasts and promyelocytes, is also an avenue for future work. The difficulty of differentiating myeloblasts and promyelocytes can potentially be overcome by identifying features that are especially discriminative for the two cell types and training a specialized model to discriminate between the two cell types. Research on leukemia detection has obtained very promising results, and further work is required to develop systems that can be completely integrated into the clinical diagnosis method. Contributions of this study are an accurate model for detecting and classifying immature leukocytes, as well as calculation of the most important morphological features, which provide a basis for future research on computer-aided diagnosis of leukemia.