White Blood Cell Classification Using Texture and RGB Features of Oversampled Microscopic Images

White blood cell (WBC) type classification is a task of significant importance for diagnosis using microscopic images of WBC, which develop immunity to fight against infections and foreign substances. WBCs consist of different types, and abnormalities in a type of WBC may potentially represent a disease such as leukemia. Existing studies are limited by low accuracy and overrated performance, often caused by model overfit due to an imbalanced dataset. Additionally, many studies consider a lower number of WBC types, and the accuracy is exaggerated. This study presents a hybrid feature set of selective features and synthetic minority oversampling technique-based resampling to mitigate the influence of the above-mentioned problems. Furthermore, machine learning models are adopted for being less computationally complex, requiring less data for training, and providing robust results. Experiments are performed using both machine- and deep learning models for performance comparison using the original dataset, augmented dataset, and oversampled dataset to analyze the performances of the models. The results suggest that a hybrid feature set of both texture and RGB features from microscopic images, selected using Chi2, produces a high accuracy of 0.97 with random forest. Performance appraisal using k-fold cross-validation and comparison with existing state-of-the-art studies shows that the proposed approach outperforms existing studies regarding the obtained accuracy and computational complexity.


Introduction
White blood cells (WBC) are potent infection fighters; they normally grow and divide in an orderly way regarding the needs of the human body to fight infections and other diseases [1]. However, in people with leukemia, the bone marrow produces an excessive amount of abnormal WBCs that do not function properly, and this leads to several abnormalities [2,3]. The counts of WBCs in blood cells can provide early clues of different probable abnormalities concerning the number of cells of different types of WBC. The WBCs can be categorized into five major classes: lymphocytes, eosinophils, neutrophils, basophils, and monocytes. The percentage of each type of WBC in a healthy person varies within a range. For example, lymphocytes are 20% to 40%, eosinophils account for 1% to 6%, and monocytes make 2% to 10%, while neutrophils are 40% to 80% [4]. A lower count of WBC can cause blood cancer and many other diseases, as WBC contributes as an important part of the body's immune system. The number of WBCs varies when the bone marrow stops making them, or when WBC are destroyed by another entity [4]. Healthy WBCs play an important role in preventing different infections and helping to fight other deadly diseases such as COVID-19 [5].
Estimating the WBC can help with making an early prediction of probable diseases. Blood microscopic images can be used to detect the WBC type and the timely diagnosis of the disease. Traditional approaches for WBC-type detection are time-consuming and low in accuracy, which increases the importance of accurate systems for the fast and accurate analysis of WBC. In this regard, the machine learning approach plays an important role. Several machine learning approaches are introduced recently, which can predict the type of WBC using microscopic images. These include the study [6] using convolutional neural networks (CNN) and an extreme learning machine (ELM) for WBC type detection using microscopic images. CNN with traditional deep learning approaches and handcrafted features for WBC is deployed by the study [7].
This study also follows a machine learning-based approach for WBC type detection using blood microscopic images. We used a WBC image dataset acquired from the IEEE data port, which has imbalanced class distribution and a poor feature set. We proposed a novel approach by combining a data re-sampling technique and a hybrid feature engineering technique to alleviate the influence of such shortcomings and to make the following contributions.
• An improved feature set is obtained by combining the texture features and RGB features to make a more correlated feature set with target classes to obtain a high accuracy. Later, Chi-squared (Chi2) is used to select an important and equal number of features for the models' training. The models are evaluated using texture features and RGB features, in comparison to the proposed hybrid features. • The imbalanced distribution of different classes of WBC is tackled using data resampling, which helps to reduce the model over-fitting. The synthetic minority oversampling technique (SMOTE) is applied for data resampling in this study for evaluating the influence of data resampling on the performance of machine learning models. • Besides using various machine learning models such as decision tree (DT), random forest (RF), k-nearest neighbor (KNN), and support vector machine (SVM), state-ofthe-art pre-trained deep learning models are also employed, including ResNet50 and VVG16. In addition, a custom-designed CNN is also used. Experiments are performed using the original dataset, augmented dataset, and oversampled dataset. • The performances of all models is analyzed regarding different performance evaluation metrics. Furthermore, k-fold cross-validation and statistical T-tests are carried out. Additionally, the performance comparison with recent state-of-the-art approaches is made to analyze the performance of the proposed approach regarding accuracy and response time.
This paper is further divided into three sections. Section 2 discusses several important studies in the context of WBC-type detection. The proposed feature engineering approach and the processes of WBC-type classification are explained in Section 3. Section 4 presents the results and discussion, while the conclusions are provided in Section 5.

Related Work
Research on WBC is one of the most important domains in bioinformatics, while the use of machine learning models for WBC has also been regarded as potentially significant [8][9][10]. The classification of WBC using microscopic images has been investigated by several researchers [6,7,11]. Despite that, several challenges remain unresolved, such as the pure WBC image dataset, the high accuracy machine learning approach, the efficiency and reduction in computational time, etc. In the following, several prominent research works are discussed.
The authors propose a deep learning-based automatic approach for WBC classification in [9]. The study utilizes pre-and post-preprocessing to improve the performance of the CNN model. In preprocessing, data normalization, filled holes, and data augmentation are applied, while in post-preprocessing, the study deployed localization and data segmentation techniques. The study achieved a 95.73% accuracy score using the CNN model on the cytological images dataset. The study [12] used the CNN model and recurrent neural network (RNN), as well as, a combination of both CNN and RNN for WBC classification to resolve the multiple cells overlap problem. The Canonical Correlation Analysis method is used in the study with the BCCD dataset. The achieved accuracy score is 95%. The study [13] proposed a 3D convolutional network, called deep hyper, for WBC microscopic image classification. Spectral and spatial features are used with a deep hyper model to obtain 96% accuracy. Using the deep hyper model with an attention module resulted in a 97% accuracy score.
The study [14] used a hybrid approach for WBC detection, in which the scale-invariant feature transform (SIFT) and CNN model are combined. SIFT is used for feature detection, which is used for CNN training using the LISC and WBCis datasets. The proposed model achieved 95.84% and 97.33% accuracy scores, respectively, for both datasets. Similarly, the study [15] proposed an approach for the classification of WBC using the CNN model. They deployed the proposed approach on the Kaggle WBC images dataset and achieved significant accuracy. The study [16] proposed a multi-level CNN model for the WBC classification for four types of cell classification. At the first level, Faster R-CNN is applied for the detection of the region of interest while at the second level, CNN-based architecture MobileNet is used for cell-type classification.
Besides proposing novel architectures, several studies adopt pre-trained models for WBC classification. The study [6] used a supervised machine learning approach for WBC detection which follows a CNN architecture and ELM model on a microscopic image dataset. Several CNN-based pre-trained models are used, such as AlexNet, GoogleNet, VGG-16, and ResNet for feature extraction while ELM is trained on those features to obtain a 96.03% accuracy. Similarly, [7] used deep learning CNN with handcrafted features to achieve a higher accuracy. The study worked on six types of WBC, including lymphocytes, monocytes, basophils, neutrophils, eosinophils, and abnormal cells. The experimental results are promising. Along the same directions, [17] used a pre-trained CNN architecture for WBC classification. The authors deployed ResNet and Inception variants with finetuned parameters to obtain 100% training accuracy with the ResNet50 model for four classes of WBC; however, the computational complexity of this approach is very high, as it uses 3000 epochs for a very deep ResNet50 model. The study achieved an accuracy score of 98.4%. The study [18] used the DenseNet121 model for the classification of the WBC. Data normalization and data augmentation are used with the optimized DenseNet121 model. The proposed model achieved 98.84% accuracy on the KBC dataset. Table 1 presents a comparative analysis of the cited research works. Predominantly, the above-discussed studies on WBC classification adopt deep learning approaches where the computational costs are higher than for machine learning models. Additionally, several studies use a lower number of WBC classes, and the reported accuracy is high. In addition, a few studies experimented with imbalanced datasets where the models' over-fitting probability is high. This study resolves the problem of high computational cost by deploying machine learning models, and aims at achieving high accuracy using feature engineering. Using a balanced dataset, the probability of model over-fitting is also reduced.

Materials and Methods
This study works on WBC type detection using the image dataset and machine learning approach. In the following sections, the background on WBC, details of the dataset, and a description of feature engineering and models are provided.

White Blood Cells
WBC are colorless cells of the human blood because they do not have any pigment. They comprise 7000 to 8000 cells in one cubic mm of blood [4]. Their size is much larger than the red blood cells. They can be classified into five different types, based on the shape of the nucleus and the density of the granules in the cytoplasm. WBCs have a single large bilobed nucleus, which distinguishes them from the other blood cells. They are formed in the bone marrow cell and then move to the blood and lymph [4]. Types of WBC and their brief description are provided here.

•
Neutrophils make up about 62% of the total WBC. They can engulf any foreign particles such as viruses and bacteria, and destroy them to neutralize their effects. They are about twice the size of a red blood cell. Their nucleus contains about 2 to 5 lobes [21]. Figure 1 presents the types of WBC. • Basophils comprise approximately less than 1% of WBCs. They are about twice the size of a red blood cell, with a bilobed nucleus. These cells release the protein called heparin, which prevents blood clotting. Basophils also release another protein called histamine, which causes inflammation. They release antibodies and antibiotics, which protect the body from the effects of any foreign objects. Figure 1b shows the types of basophils of WBCs Figure 1b. • Monocytes forms approximately 3% of the total WBC. They are about two to three times larger than red blood cells. The nucleus is an almost round to lobed shape. It produces macrophages, which can destroy the larger particles via phagocytosis. The life span of macrophages in the blood is about 8 to 10 h, after which they move toward the lymphoid tissue and become a macrophage. The monocyte type of WBC is shown in Figure 1c. • Eosinophils comprise about 2% of the WBC. They are about twice the size of a red blood cell, while their nucleus is bilobed. They inactivate the inflammation-producing substance, and attack parasites and worms. Figure 1d shows the sample image of the basophil types of WBC. • Lymphocytes make up approximately 32% of the total WBCs ( Figure 1b). Their size is nearly equal to the red blood cell. They produce antibodies. The life span of lymphocytes in the blood may be months and even years, depending upon the activity of the cell. Lymphocyte samples are shown in Figure 1e.  Figure 2 shows the hierarchy of the WBC type according to the root types, and also illustrates the differences in various types of WBC.

Flow of Implemented Methodology
In this study, first, a dataset containing the WBC images is acquired from the IEEE data port, which is later used to extract the features from the WBC images. We extract two types of features, including texture features and RGB features. Texture features are extracted using the skimage library [22]. After extracting the features, we deploy the feature selection technique to extract the prominent and most important features using Chi2.
We select 3000 features each from the RGB and texture features. Both types of important features (RGB and texture) are combined to form a hybrid feature set. For resolving the problems related to the imbalanced dataset, we performed data oversampling using SMOTE, which helps to reduce the model over-fitting towards the majority class. Several machine learning and deep learning models are deployed to perform WBC-type classification. To train and test the learning models, we split the dataset into training and testing sets with 0.8 to 0.2 ratios for training and testing, respectively. In the end, we evaluate all models in terms of accuracy, precision, recall, and F1 score. Figure 3 shows the proposed methodology of this study.

WBC Dataset
The dataset that is used in this study comprises five categories of WBCs, including neutrophil lymphocyte, monocyte, eosinophil, and basophil. The dataset is acquired from the IEEE Dataport [23]. Each type of WBC consists of a different number of images, as shown in Table 2. The total number of images in the dataset is 3539, with 667 raw images, 1464 augmented images, and 1408 cropped and classified images.

Feature Engineering
For feature engineering, we deployed feature extraction and feature selection techniques to generate an important feature set that can enhance the performance of learning models. We extracted two types of features from the dataset, which are RGB features and texture features, because color and shape are the best parameters for classifying the WBC images.

RGB Features
Color is an important part of the image, and is widely used in image processing. The color information is conveyed through three channels, and these color features are invariant to the rotation of the pixels in an image. The red, blue, and green pixel values represent the content in the image, and can be used for the training of machine learning models [24]. These RGB features are very feasible, as color can potentially be used to discriminate between images of the different target classes.

Texture Features
Texture features can be extracted from grayscaled images, disregarding the color information [25]. Texture features provide information about the intensities of the pixels in an image. They are characterized by the distribution of intensity levels in the neighborhood.
Both RGB and texture features have their own advantages and disadvantages, similar to their significance toward improving the performances of machine learning models. For RGB features, we used the CV2 library, while the texture features are obtained using the skimage library [22]. Table 3 shows the number of features for RGB and texture features. We combined both features vertically and generated a hybrid feature set. Before combining the features, we performed feature selection and extracted the most important features, which reduced the size of the hybrid feature set to improve the accuracy and efficiency of the models. For feature selection, we used the Chi-squared (Chi2) technique.

Chi2
Chi2 is a feature selection technique that finds the relationship between the feature set and the target classes [26]. It finds the dependence of the target class on each of the data features. It assumes the null hypothesis, indicating that the feature distribution is independent. Chi2 calculates the x 2 scores and then orders them in descending order.
The higher the value of x 2 , the more important the feature is. The formula to calculate the x 2 is given as (1) where O ij is the observed frequency, E ij is the expected frequency, k is the number of rows and m is the number of columns, while (i, j) denotes the cell.
We extract 3000 features each from RGB and texture features and make a hybrid feature set. Table 4 shows the number of features with each approach, and Figure 4 shows the process of forming the hybrid feature set.

Synthetic Minority Oversampling Technique
SMOTE is an oversampling technique that generates artificial samples to make a balanced dataset that can help to reduce the probability of models overfitting [27]. SMOTE generates samples for the minority class to equal the number of majority class samples. It is mostly used for the numeric data features, while this study uses it to increase the number samples for the image data. Table 5 shows the ratio of samples after the oversampling of data. Figure 5 shows the approach for data normalizing and balancing using the SMOTE approach.

Data Splitting
Data splitting is performed to train and test the learning models. We split the dataset with 0.8 to 0.2 ratios into cases of both oversampling and the original dataset. This ratio is more optimal as the dataset size is not too large, so we obtain the best results with this ratio. Table 6 shows the number of samples for training and testing with the original dataset and the over-sampled dataset.

Machine Learning Models
Machine learning models can be used to solve different domain problems such as text analysis [28], computer vision application [29,30], IoT [31,32], and image processing [33,34], etc. In this study, we used machine learning models for WBC image classification. We deployed several machine learning models on hybrid features set to show the significance of our feature engineering approach. Four state-of-the-art machine learning models such as DT, RF, KNN, and SVM have been employed with their best hyper-parameters settings according to the dataset used, as shown in Table 7.
DT is a tree-based machine learning model used for classification and regression tasks [35,36]. DT constructs a tree using the features set, and put important features on the root nodes, while leaf nodes function as decision nodes for the DT. The features' importance in DT can be calculated using the Gini Index or Information Gain algorithms.
These algorithms provide purity and impurity in the feature set. For tree split, we used the Gini criterion [37], which can be calculated as where T is the total number of target classes and p(i) is the probability of picking the data point with class i. The DT is used with max_depth parameter with value 200, which is the most optimal value for it, because after a 200 value of max_depth, there is no change in model accuracy. The increase in tree depth increases the complexity of the model, so we restrict it to 200level depth. RF is also a tree-based model with an ensemble of several DTs. RF can be used for classification and regression tasks. It can perform well on nonlinear and imbalanced datasets [38]. RF combines a number of DTs under majority voting. All DTs in RF make their prediction on the test data, and then the most predicted class by the DTs is regarded as the final prediction by the RF. RF can be defined mathematically as r f = mode{DT 1 , DT 2 , DT 3 , ..., DT n } (3) or, where DT 1 , DT 2 , DT 3 are the decision trees and n is the number of decision trees. We used RF with n_estimator parameters with 200 and max_depth parameters with 200 as we used in DT. The n_estimators indicate the number of DTs participating in the prediction procedure of RF.
KNN is a simple machine learning algorithm for classification and regression. KNN, also known as lazy leaner, uses the whole training data. It performs matching on test data with training data, and finds the distance with the nearest target in the training data [39]. Several distance metrics can be used with KNN, while the Euclidean distance is most commonly used: where i and j are two points in Euclidean n-space, and i p − j p shows the Euclidean vectors, starting from the origin of the space. KNN is used with only one parameter, which is n_neighbors with a value of 5, indicating that the model uses five neighbors to predict the class of a given sample.
SVM is a linear model used for classification and regression tasks. SVM draws multiple hyper-planes to classify the data into several classes [40]. Hyper-plane with the best margin from data points will be the selected hyper-plane for the SVM. Hyper-plane can be defined mathematically as where Z is a vector normal to hyper-plane and b is an offset. We used SVM with poly kernel and C = 4.0.

WBC Classification
Algorithm 1 illustrates the WBC classification approach used in this study. The given input image is classified into one of the five classes. First, RGB and texture features are extracted from all images one by one, and then concatenated vertically (axis = 1) to make a hybrid feature set. The feature set goes through the SMOTE technique to make the dataset balanced, which is later used to train machine learning models.

Results
This section contains the results of machine learning and deep learning models for WBC-type detection. We deployed DT, RF, KNN, and SVM from machine learning models with both texture and RGB features. We used a Core i7 11th generation machine with NVIDIA GPU to perform the experiments. The system consists of 16 RAM and 1TB SSD. We used a Jupyter notebook and Python language to implement the approach. Table 8 contains the results of machine learning models on the original dataset with texture and RGB features. The results of all models are very poor on the original dataset because the target ratio in the original dataset is highly imbalanced. All models show the over-fitting for the majority class data. With both the RGB and texture features, the performance of RF is better as compared to other models in terms of accuracy on the original dataset, as it achieves 0.66 and 0.62 accuracy scores, respectively. The F1 scores of all models are poor because of the highly imbalanced dataset. To resolve this imbalanced dataset problem, we used data augmentation.

Results Using the Augmented Dataset
For data augmentation, ref. [23] used Keras preprocessing layers, where the images are augmented to generate a larger dataset. The data augmentation influences the learning models positively and marginally improved the accuracy. The marginal improvement is due to the uncorrelated distribution of features during augmentation. Table 9 contains the results of machine learning models on the augmented dataset with both the texture and RGB features. For the augmented dataset, machine learning models with RGB features are somehow better as compared to the texture features, as RF achieves a 0.64 accuracy score with a 0.54 F1 score. With texture features, RF achieves a 0.35 F1 score. The reason behind the accuracy improvement of the learning models with RGB features after augmentation is that the WBC color is not changed after augmentation, while the texture features are changed after flipping or rotating the image during augmentation.

Performance of Models on Hybrid Features
We try to improve the performance of learning models by making a balanced dataset using the SMOTE technique. The SMOTE technique yields a highly balanced dataset, which increases the number of samples for training and testing for machine learning models and results in significant improvements. Table 10 shows the results of machine learning models using the SMOTE balanced dataset. Using the balanced dataset with SMOTE, the performances of models has been elevated, as the RF achieves the highest accuracy score of 0.92 and the same F1 score using the texture features. SMOTE does not impact on the texture features and the RGB features, as it generates new samples artificially to reduce the models' over-fitting. RF is also good with RGB features after oversampling with SMOTE, as it achieves a 0.87 accuracy score. DT and KNN do not perform well because KNN shows a poor performance when used with a large feature set. A novel hybrid feature engineering approach is deployed with the machine learning models to analyze their performances against the individual texture and RGB features. The hybrid features set combines both the texture features and RGB features selected using the Chi2 technique. In addition, SMOTE is applied to perform the oversample for data balancing. Table 11 shows the results of machine learning models using the hybrid feature engineering technique. The results suggest that this approach significantly improves the performance of all learning models, with RF being the highest accuracy-preserving model, as it obtains an accuracy score of 0.97. The performance of RF is better, owing to its ensemble architecture which combines 300 decision trees on the hybrid feature set to make the final prediction. SVM also shows significant performance with the hybrid feature, as it achieves a 0.89 accuracy score and a 0.89 F1 score. The impact of the hybrid feature set is two-fold, where it provides a highly correlated feature set in the first place and a balanced feature set in the second place, which improves the training process of the machine learning models and enhances their performances. On average, the performances of all the models have been elevated when used with the hybrid feature set. A performance comparison of all machine learning models used in this study is illustrated in Figure 6, using different types of features such as original texture and RGB features, augmented texture and RGB features, and hybrid features. It shows that the best performance of the models is obtained when trained on oversampled hybrid features, with RF as the best performer. Figure 7 show the confusion matrix for the best performer RF using machine learning models.   Figure 8 shows the comparison of different features, including RGB features, texture features, and hybrid features. It is helpful in understanding the performance of machine learning models when trained on different feature sets. Figure 8a shows the feature space of RGB features, indicating that the distribution of samples of various classes is not well separable, which explains why the performance of machine learning models is not good when using RGB features. The same is true for texture features where samples of different WBC types are mixed, as shown in Figure 8b. However, the feature space of hybrid features on oversampled data, as given in Figure 8c, shows that the sample distribution of WBC types is more separable and distinguishable as compared to both texture and RGB features, which leads to a better performance of the machine learning models.

Performance Analysis of State-of-the-Art Deep Learning Models
We deployed several deep learning models in comparison with the proposed approach. For this purpose, CNN, VVG16, and ResNet-150 are adopted for WBC-type detection. The architectures of all models are illustrated in Figure 9. Each model consists of a dense layer in the end, with five neurons and a Sigmoid activation function. The models are compiled using the categorical_crossentropy loss function and an Adam optimizer [33]. All models are fitted with 15 epochs, and 15% of the data are used for validation.  The performance results of deep learning models are shown in Table 12. The results are presented for three types of datasets, including the original, augmented, and oversampled datasets. However, the handcrafted features are not used for deep learning models such as texture or RGB features, because deep learning models work well with their own extracted features. On the original dataset, pre-trained models are good as compared to CNN, as VVG16 achieves a 0.74 accuracy score with a 0.30 F1 score. The performance of VVG16 is the same in the case of the augmented dataset. In both cases, the original and augmented dataset models are over-fitted on the majority class and show poor performance for the minority class, which results in a poor F1 score. The oversampling technique resolves this issue by providing balanced samples for training, which results in a better performance of CNN, with similar results for the accuracy and F1 score. CNN achieves an accuracy of 0.73 and an F1 score of 0.71 using the oversampled dataset.

Validation of the Proposed Approach
We validate the proposed approach using the k-fold cross-validation technique. From this point of view, a 10-fold cross-validation is applied to all of the machine learning models used in this study with the proposed hybrid features on the oversampled dataset. The results given in Table 13 show that the RF obtains the best performance, with a 0.95 accuracy score and a standard deviation (SD) of ±0.03, followed by the SVM, with a 0.90 accuracy score and ±0.02 SD, which validates the superior performance of RF with the proposed hybrid feature set. Table 13. Ten-fold cross-validation results of machine learning models using the proposed approach.

Model
Accuracy SD

Best Performer Optimization Using Particle Swarm Optimization
We used the swam optimizer for hyperparameter tuning to select the best hyperparameters in comparison with our selected hyperparameters. For this purpose, we used the mealpy 1.0.2 library [42,43]. We imported a particle swarm optimization (PSO) model named BasePSO, and tuned the best performer RF with its two hyperparameters, n_estimators and max_depth. We set upper bound and lower values at 300 and 50, respectively, for both n_estimators and max_depth. We set the epochs value to 2 and the population size to 50 for PSO. The best solution using swam optimizer achieves a 0.94 accuracy as it optimizes the n_estimators value of 273 and max_depth value of 110. Table 14 shows the PSO epoch-wise RF results. It can be observed that RF is good as compared to other models, and the approach is good in terms of accuracy and computational cost.

Performance Comparison with State-of-the-Art Approaches
The performance comparison of the proposed approach is also carried out with existing state-of-the-art approaches. We performed a fair comparison by evaluating all approaches on the dataset that is used. We implemented the models from the selected studies according to their base paper architecture; however, the dataset of the current study is used. For comparison, only those studies are selected which investigate WBC-type classification using medical image datasets. We compared the results of the current study with [15], which proposed a CNN model for WBC classification. Similarly, the study [12] proposed a hybrid model by combining CNN and RNN to perform WBC classification. We selected these studies for comparison because they share the same topic. These studies used an imbalanced medical image dataset. In addition, two additional studies [34,44] that investigate medical image classification are also considered. For a fair comparison, the models presented in these studies are implemented using the dataset used in this study. Performance comparison is carried out in terms of accuracy and computation time. Table 15 shows the performance comparison of selected studies with the current study. It can be observed that the current study outperforms the existing studies both regarding the obtained accuracy and computation time, which shows the supremacy of this approach for WBC-type classification.

Statistical T-Test
This study also performs a statistical significant T-test to show the significance of the proposed approach [45]. We deployed a T-test on each approach in comparison with the proposed approach and obtained results in form of acceptance or rejection of the null hypothesis.  Table 16 shows the T-test results for all cases. The T-test gives the T score and critical value (CV) score as outputs, and if T ≤ CV, then it accepts the null hypothesis; else it rejects the null hypothesis. T-test rejects the null hypothesis and accepts the alternative hypothesis in all cases when we compared the proposed approach results with other approaches which show that the proposed approach is statistically significant in comparison with other approaches.

Conclusions
White blood cells are an important part of the immune system, and they protect the body against infections and foreign substances. WBCs consist of different types, and abnormalities in a type of WBC may potentially represent a disease advocating the significance of WBC-type classification. Existing studies are limited by poor accuracy, model overfitting due to an imbalanced dataset, and classification for a lower number of WBC types. From this perspective, this study presents a hybrid feature set of selective features using Chi2-and SMOTE-based oversampling to alleviate the influence of the above-mentioned problems. In addition, machine learning models are adopted to overcome the limitation of the data-intensive training time of deep learning models and robust results. The experimental results indicate that the feature set of both texture and RGB features from microscopic images selected using Chi2 produces a high accuracy of 0.97 with the RF model. Performance appraisals using k-fold cross-validation, T-statistic test, and comparison with existing state-of-the-art studies show the supremacy of the proposed approach, both in terms of the obtained accuracy and computational complexity. In the future, we intend to perform further experiments using derived features from microscopic data.