A Reliable Auto-Robust Analysis of Blood Smear Images for Classification of Microcytic Hypochromic Anemia Using Gray Level Matrices and Gabor Feature Bank

Accurate blood smear quantification with various blood cell samples is of great clinical importance. The conventional manual process of blood smear quantification is quite time consuming and is prone to errors. Therefore, this paper presents automatic detection of the most frequently occurring condition in human blood—microcytic hyperchromic anemia—which is the cause of various life-threatening diseases. This task has been done with segmentation of blood contents, i.e., Red Blood Cells (RBCs), White Blood Cells (WBCs), and platelets, in the first step. Then, the most influential features like geometric shape descriptors, Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), and Gabor features (mean squared energy and mean amplitude) are extracted from each of the RBCs. To discriminate the cells as hypochromic microcytes among other RBC classes, scanning is done at angles (0∘, 45∘, 90∘, and 135∘). To achieve high-level accuracy, Adaptive Synthetic (AdaSyn) sampling for imbalance learning is used to balance the datasets and locality sensitive discriminant analysis (LSDA) technique is used for feature reduction. Finally, upon using these features, classification of blood cells is done using the multilayer perceptual model and random forest learning algorithms. Performance in terms of accuracy was 96%, which is better than the performance of existing techniques. The final outcome of this work may be useful in the efforts to produce a cost-effective screening scheme that could make inexpensive screening for blood smear analysis available globally, thus providing early detection of these diseases.


Introduction
Anemia is an abnormal condition in human blood in which the amount of red blood cells (and therefore their oxygen-carrying capacity) is inadequate to fulfill the physiologic needs of the body.

Related Work
Various state-of-the-art image processing-based techniques for computer-aided disease detection are present excessively, some of which are the most relevant to our work presented here. Moallem et al. [11] used a three-step algorithm for segmentation of overlapped cells in blood smear images by extracting a binary mask in the first step, then an adaptive mean shift algorithm was used to centrally localize a cell, and finally the gradient vector flow algorithm was used to draw boundaries for separation of cells. In the given article, there is no shape analysis done, which fails to classify the significant classes of RBCs. Tomari et al. [12] used some geometric features for the classification of RBCs through Artificial Neural Network (ANN). Alomari et al. [13] worked on the automatic quantification of WBCs and RBCs by using an iterative structured circle detection algorithm, but in the given technique, it was found that it is restricted to circular cells. The technique suggested by Aggarwal et al. [14] is intensity-based Otsu thresholding where segmentation of RBCs infected with a parasite was done, but the segmentation by intensity may suffer extremely due to luminance variations and other photographic conditions. Tek et al. [15] suggested a technique for recognition and classification of malarial RBCs' parasites and species in peripheral blood smears. Shape, color, and local granulometry features were extracted from the area of interest, and k Nearest Neighbor (kNN) classifier was applied to classify them from extracted features. Chen et al. [16] evaluated blood smear slides having hemolytic anemia by determining the chain codes for finding the edges of cells, separated the cells with the help of concavity measurement, and classified cells with a bank of classifiers. A work done by Xu et al. [17] for the detection of sickle cells is segmentation of red cells in the first step, then separation of overlapped cells using random walk algorithm in the next step, and classification through Deep Convolutional Network in the last step. Sharma et al. [18] used a median filter for image smoothing and watershed for overlapped cells separation. Then, they implemented a three-feature vector, circularity matric, aspect ratio, and radial signature, trained with KNN classifier for the recognition of three types of RBCs called sickle cells, elliptocytes, and dacrocytes. A work for the detection and classification of parasites in RBCs was done by Ahirwar et al. [19]. They generated geometric, color attributes and gray level texture feature sets and used artificial neural network for their classification. A fuzzy logic technique was used by Bhagavathi et al. [20] for segmentation of RBCs and WBCs. Morphological operations and hough transform method for circle detection were used by Chandrasiri et al. [21] for the detection and analysis of red blood cells, but the proposed system was unable to determine and analyze the extensive number of clumped or overlapped regions. The above presented approaches can do better in a situation where the population of clumped cells is low. In highly populated clumped cells, accuracy suffers. In our proposed approach, the issue is fixed to much extent by leveraging the concavities and texture-based features in each cell as explained in the following Materials and Methods section.

Materials and Methods
Our proposed plan of work for the detection of hypochromic microcytic cells in sample blood smear images consisted of the following series of steps shown in Figure 2.

Blood Smear Slide Preparation
A consistent blood distribution and proper lucidity are required for reliable blood smear analysis. It can be done by starting with a drop of sample blood at one end of glass slide, which is smeared quickly and gently with a wedge technique to form a thin edge, where all cells are able to be analyzed separately, especially RBCs [22]. This whole process was done by an expert laboratory technician in the local hospital.

Image Acquisition
After the staining process was completed, the image acquisition step started by using a 400× field of a microscope with oil immersion, keeping the horizontal and vertical resolution at 180 dpi and the image dimension at 2592 × 1944 pixels. The images were captured and labelled properly.

Preprocessing
A green channel of RGB blood smear was selected, enhanced, and smoothed using Balance Contrast Equalization Technique (BCET) and median filter. After getting a fine gray-scale image shown in Figure 3, it was quantized with a scale factor F g , calculated from mean intensity value of g(x, y) calculated in Equations (1) and (2), where Q(x, y) is the resultant quantized image.

Segmentation
A global or automatic thresholding technique was used in this work for the segmentation of only RBCs (leaving behind WBCs and platelets). A two-step binarization technique was followed for getting noise-free binary images. In the first step, the quantized image was binarized containing white blood cells; in the second step, the whole original image was binarized. Finally, we used the XOR gate (Exclusive OR) operation to remove white blood cells. The output along with other processed images are shown in Figure 4.

Feature Extraction
A three-way feature extraction plan was implemented. The first was geometric morphological feature extraction, the second was intensity features and texture features using (GLCM and GLRLM), and the third was texture features using the Gabor Filter Bank.

Geometric Morphology of Microcytic Hypochromic RBCs
Keeping in view the visual size and shape of hypochromic microcytes, the useful features were calculated as follows: • Area: Area is an important geometrical feature for the detection of microcytes, being small in size compared to other blood cells. • Circularity: A size-invariant shape descriptor given in Equation (3) which describes a shape to be circular, if the value is closer to 1 and noncircular if the value is closer to 0, where A is Area and P is parameter of a cell.
• Rectangularity: It determines the degree of elongation with respect to a rectangle. Equation (4) shows its calculation, where A s is area of a shape and A r is the area of minimum bounding rectangle.
• Concavity: This property is used to determine how much an object is concave; we applied it on the shapes for identification of the amount of central pallor area occupied in an RBC, given by Equation (5) • Convexity: A cell convexity can be determined by Equation (6), which identifies a shape through its boundary convexity.

RBC Texture Feature Calculation
Features other than geometry include texture information on RBCs. In this research work, we extracted RBG intensity-based features (mean and variances), Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), and Gabor feature bank. This set of features associated with the dispersal of chromatin matter in the RBCs is helpful in the classification of hypochromic, hyperchromic, and normochromic cells given in Figure 1 above. The measure will indicate that either the RBC has a deep red color compared to the others (cells having more haemoglobin) or are less red, with more central pallor area and having a smaller haemoglobin ratio. The texture features calculated are presented below: • RGB mean and variance of hypochromic microcytic RBCs: The mean values µ r , µ g , and µ b of pixels of each RBC in R, G, and B, respectively, were calculated in Equation (7).
• GLCM features of Hypochromic Microcytic RBCs: GLCM is the distribution of cooccurring pixel values defined over an N × N image P at a specific offset, or every P's element determines the occurrences of a pixel with value of gray level, i, lifted by a certain distance to a pixel with value j. Our next six textural features are GLCM features. The mean of 6 GLCM features were determined for offset values conforming to 0 • , 45 • , 90 • , and 135 • consuming 8 gray levels (see Figure 5). Maximum Probability: It measures the strongest response of the cooccurrence matrix. The range of values is [0, 1] as given in Equation (9), where P ij is the pixels of gray image.
Correlation: The degree of correlation of a pixel to its neighbor is determined by the correlation factor of the cooccurrence matrix, ranging from 1 to −1 given by Equation (10). This measure cannot be defined if any of the standard deviation σ is 0 for the two existing correlations, perfect positive and perfect negative correlation.
Pixels intensity contrast: It is a measure of intensity contrast between a pixel and its neighbor over the entire image (calculated in Equation (11)).
Energy: It is the measurement of uniformity in the intensities of an image (as given in Equation (12)). Its value is 1, if an image is constant and 0 if the intensities are variable.
Homogeneity: It measures the spatial closeness of the distribution of elements in the cooccurrence matrix to the diagonal given by (13). The values range is [0, 1], and the maximum value is attained when the matrix is a diagonal.
Entropy: It measure the degree of variability of the elements of the cooccurrence matrix. Its value is 0 if all intensities of P ij are 0 and is maximum when all P ij are equal. It may be calculated by (14).
• Run length matrix features of each RBC: The other textural features are created on the gray-level run length matrix (calculated in Equations (15)- (24). The l * K matrix p, where l is the number of gray levels and k is the maximum run length, is defined for a certain image as the total runs with pixels of gray level i and run length j. Likewise, as in the GLCM, the run length matrices were calculated using 8 gray-levels for 30 • , 60 • , 90 • , and 135 • .
The Gabor filter is the product of a 2D Fourier basis function and origin-centred Gaussian given in Equation (25), where f is the central frequency of the filter, γ and η are the sharpness or bandwidth measure along the minor and major axes of Gaussian respectively, θ is the angle of rotation, and (η/γ) is the aspect ratio. The analytical form of this function in frequency domain is given in Equation (26) as follow: In the frequency domain given by Equation (27), the function is a single real-valued Gaussian centered at f. A simplified version of a general 2D Gabor filter function in Equations (25) and (26) was formulated by [23], which implements a set of self-similar filters, i.e., Gabor wavelets (rotated and scaled forms of each other, irrespective of the frequency f and orientation θ. (25) and (26) by using multiple filters on several frequencies f m and orientations θ n . Frequency in this case corresponds to scale information and is thus drawn from [23]

Gabor bank or Gabor features were created from responses of Gabor filters in Equations
where fm is the mth frequency, f θ = f m ax is the highest frequency desired, and k > 1 is the frequency scaling factor. The filter orientations are drawn from [24]. Gabor features were calculated at 4 wavelengths (3, 6, 9, and 12) and 3 orientations θ (30 • , 60 • , and 90 • ); see Figure 6a-c. Then, each filter was convolved with the real image, and the response image of the same image was produced; here, each image gave us a feature vector. Each feature vector consisted of mean amplitude and mean squared energy. Finally, two matrices were obtained, that were of [1 × 12] each. The matrices were appended to each other, and a [1 × 24] matrix was produced for one image having a [n × 24] vector for n images for supplementary training purpose in the preceding step of classification (as shown in Figure 6).

ADASYN Sampling
A significant aspect in classification and learning is to show a reasonable dataset to guarantee that no inclination is presented by an imbalanced information distribution. A method that has been used in previous works is Adaptive Synthetic Sampling (ADASYN) to enhance the classification accuracy by balancing the datasets, thus decreasing bias factors [25]. Table 1 shows that the original dataset is partially imbalance; therefore, we applied ADASYN to overcome this problem and to balance the dataset. After applying ADASYN sampling, the database then consisted of 354 microcytic, 327 normocytic, 312 macrocytic, 340 hypochromic, and 380 normochromic images of blood smears.

Features Reduction
To maintain variation among interclass data samples, it is necessary to reduce the dimensionality of an original dataset. We used Gray Level Cooccurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), and Gabor filter bank, which collectively produced 52 features for a single cell image in a blood smear image. Therefore, to reduce the dimensionality, a Locality Sensitive Discriminant Analysis (LSDA) [26] approach was applied separately to the features extracted from each cell (shown in Figure 7). We use LSDA because it is significant in the case where there are no sufficient training samples. LSDA uses local structures, and it is generally more important than global structure for discriminant analysis. LSDA determines a projection using the local manifold structure, which results in the maximization of the margin among data points from different classes at every local area. Various experiments on the existing datasets showed an improvement over the Linear Discriminant Analysis (LDA).

Classification
In different situations, varied instances show a tendency to a specific classification tool. Therefore, iterative experiments have been performed for the selection and determination of an optimal tool. The tools determined during the process were random forest and multilayer perceptual modal. These are ideal classification tools in this situation. For classification, the training data set was prepared with the extracted features mentioned above. The images were labelled with ground truth from the existing dataset images, containing Iron Deficiency Anaemia (IDA) images, mostly. The instances of dataset comprised of geometric morphological feature, GLCM, GLRL, and Gabor texture features are given in the features below. We used the classical machine learning methods as the overfitting and underfitting anomalies of pretrained deep learning algorithms suffer in accuracy of results due to less sufficient data in datasets. The performance of our classification technique is given in the Results Section 5.

Results
The results of the proposed work are as follows.

Dataset
The images we used in our work were collected from a local hospital. At the selected hospital, a high frequency of microcytic hypochromic patients were observed. The available datasets were also searched for such images, and a set of 150 images (comprising of about 80 hypochromic microcytes per image) were created based on ground truth, labeled by expert hematologists.

Qualitative Results
Hypochromic microcytes are the cells we are interested in for segmentation, the cell with a more pallor area and smaller than normal mean sizes of RBCs (Figure 8). These types of cells have less pixel area with more hole area proportionality. A hole containing cells was assumed to be a pallor cell and was less chromatic than the cell having no hole. The chromaticity factor is inversely proportional to the proportionality of central pallor: the bigger the central pallor, the less chromatic the cell will be and the cell will have a big hole in it after carrying out binarization operations, and hence, the concavity of the cell will also be more. The qualitative results of the GLCM, GLRL, Gabor mean amplitude, and Gabor mean square energy are given in Tables 2-5. The qualitative results are also demonstrated in Figures 8 and 9. The sample cells for which the features are extracted and demonstrated in Tables 2, 3 , 4 and 5 are shown in Figure 10.

Conclusions
The goal of this study is to develop and improve a robust algorithm for the analysis of blood smear images for the classification of microcytic hypochromic anaemia. Many studies are present in the literature that focus on the classification of blood cells, but there are very rare studies found on the classification of normal and abnormal blood slides as a specific anemic disease. Moreover, the existing state-of-the-art techniques are very expensive and their operation is very difficult. Our proposed system has a reasonable accuracy rate and processing time. The proposed system is capable of detecting various chromatic status of blood and accurately estimates the boundary pixels of RBCs at diverse photographic conditions. The results showed that our algorithm is a better automatic segmentation methods for blood smear images. The geometric and three potential texture features of RBCs, i.e., GLCM, GLRL, and Gabor texture features at 4 different degrees of scan (3, 6, 9, and 12) and 3 orientations θ (30 • , 60 • , and 90 • ) were extracted. After feature extraction and the feature reduction technique, the features were then put in two powerful machine learning algorithms (random forest and multilayer perceptron) using ensemble learning technique. The overall classification accuracy was 96%. Which was compared to the existing techniques and were found better for classification. The present work may be extended to a 3D volume estimation of blood cells, which is necessary in finding the accurate blood indices like Mean Corpuscular Volume (MCV), Mean Corpuscular Haemoglobin (MCH), and Mean Corpuscular Haemoglobin Concentration (MCHC). The proposed system may facilitate pathologists by getting quicker results with high True Positive (TP) and True Negative (TN) rates in the initial stage of diagnosis. It is user-friendly and easily operable with less expenses in terms of cost. Furthermore, the system may be made available on the web without any association of special equipment and user knowledge. Therefore, it can be considered an added value to the existing automated analysis blood smear tools.