An Attention-Based Convolutional Neural Network for Acute Lymphoblastic Leukemia Classiﬁcation

: Leukemia is a kind of blood cancer that inﬂuences people of all ages and is one of the leading causes of death worldwide. Acute lymphoblastic leukemia (ALL) is the most widely recognized type of leukemia found in the bone marrow of the human body. Traditional disease diagnostic techniques like blood and bone marrow examinations are slow and painful, resulting in the demand for non-invasive and fast methods. This work presents a non-invasive, convolutional neural network (CNN) based approach that utilizes medical images to perform the diagnosis task. The proposed solution consisting of a CNN-based model uses an attention module called Efﬁcient Channel Attention (ECA) with the visual geometry group from oxford (VGG16) to extract better quality deep features from the image dataset, leading to better feature representation and better classiﬁcation results. The proposed method shows that the ECA module helps to overcome morphological similarities between ALL cancer and healthy cell images. Various augmentation techniques are also employed to increase the quality and quantity of training data. We used the classiﬁcation of normal vs. malignant cells (C-NMC) dataset and divided it into seven folds based on subject-level variability, which is usually ignored in previous methods. Experimental results show that our proposed CNN model can successfully extract deep features and achieved an accuracy of 91.1%. The obtained ﬁndings show that the proposed method may be utilized to diagnose ALL and would help pathologists.


Introduction
Acute lymphoblastic leukemia, generally known as ALL, is a type of blood cancer that usually begins in the bone marrow where the blood cells are formed. It is the type of cancer that is associated with white blood cells (WBC). Based on the age of the disease, leukemia is divided into four main types: chronic lymphocytic leukemia (CLL), chronic myeloid leukemia (CML), acute myelogenous leukemia (AML), and acute lymphoblastic leukemia (ALL) [1,2]. In acute leukemia, the abnormal cells grow and spread rapidly and require immediate treatment, while chronic leukemia is hard to detect in its early stages. As a result, the blood cannot perform its normal function, making the immune system more susceptible. Moreover, in ALL bone marrow cannot produce healthy platelets and red blood cells, making most parts of the body vulnerable [3].
In ALL, bone marrow generates a large quantity of abnormal WBC. These WBC can stream into the blood and harm different parts of the body like the spleen, brain, kidney, and liver, which can lead to other dangerous types of cancer. Since ALL can spread quickly throughout the body, sometimes it can cause deaths if not treated or diagnosed in the early To remove the aforementioned issues associated with invasiveness and human experts, a fully automated solution based on convolutional neural networks is proposed in this work. The features associated with the dataset play a vital role for statistical, computational learners. Therefore, we trained the CNN model from scratch instead of the transfer learning paradigm used in other work [10,[13][14][15]. Although the cell image features may differ significantly from natural images, a network trained from scratch on cell images may converge to a better solution. To improve the quality and quantity of available datasets, a number of preprocessing methods are used including data augmentation to improve the generalizability and class balancing to avoid overfitting or bias incorporation due to dataset. In this work, we proposed an attention-based CNN to address the task of categorization of ALL and healthy cell images. The designed deep learning model is trained from scratch on the preprocessed dataset to obtain parameter values that are most relevant and provide better convergence for the network.

Conventional Machine Learning Algorithms
Joshi et al. [16] developed a method for the segmentation and classification of white blood cells. For the preprocessing of microscopic blood images, they used histogram equalization and contrast enhancement. In addition, to extract white blood cells, they used the Otsu thresholding segmentation. The extracted features were given to K nearest neighbor (KNN) clustering to categorize the blood images into normal and blast cells. The proposed method was tested on 108 images of peripheral blood smears from a publicly available dataset.
Another unsupervised segmentation method was developed by Mohapatra et al. [17] based on color scheme clustering for the classification of leukemia. A two-stage color segmentation approach based on fuzzy logic was used to differentiate leukocytes from other constituents of the blood. The authors extracted the features like the fractal dimension, shape, texture, and the signature of the contour. Their proposed method used 270 images for the classification of ALL. MoradiAmin et al. [7] proposed an enhanced method for ALL recognition. In this method, they utilized fuzzy c-means clustering for the segmentation of lymphocytes. After feature extraction, they used principal component analysis (PCA) to reduce the features and fed them to the support vector machine (SVM) classifier to distinguish between normal and blast cells. They conducted their experiment on 958 images.
Putzu et al. [18] proposed a method that isolates the leukocytes from a microscopic image and further distinguishes the nuclei and cytoplasm from the leukocytes. They trained different classification models and extracted shape, color, and texture features to determine which model is best suitable for leukemia classification. They used 368 images and tested several classifiers for classification purposes. Based on the classification accuracy they found that SVM with a Gaussian radial basis kernel outperforms other classifiers. Singhal et al. [19] developed an automatic detection algorithm for ALL based on local binary pattern (LBP) and geometric texture features. Their proposed model used a small dataset composed of 368 images for features extraction and these features are then fed to SVM for binary classification. Patel et al. [20] proposed a method for automatic detection of leukemia from 108 microscopic images. They applied some filtering techniques to extract the important parts from images and then K-means clustering was used to produce the binary clusters. For classification purposes, they used an SVM classifier to distinguish between healthy and malignant cells. Karthikeyan et al. [21] came up with a novel idea for leukemia detection using stained blood microscopic images. They used the median filter and histogram equalization for image preprocessing. For segmentation purposes, they used fuzzy c-means and SVM to classify the leukemia cells into normal and malignant classes. They used 19 microscopic images for leukemia cancer classification. Another segmentation and classification method proposed by Mohammad et al. [22] involves the transformation of grayscale images to color space YCbCr. Then they computed the Gaussian distribution on each color and calculated a number of features like texture, scale, and morphology. These features are then given as input to the random forest classifier. They used a dataset of 105 images to train their model for WBCs classification. A significant drawback of these methods is they used small datasets to classify ALL cell images. A classifier trained on a small dataset is more vulnerable to overfitting and may not give an optimal result. The used datasets in the literature vary from 19 to 958 images. Another drawback is that these researchers used conventional machine learning algorithms (SVM, KNN, K-means clustering), which use handcrafted features that may not be an optimal set of features for classification. All these drawbacks can limit the final performance. Thus, there is still a need to devise high-performance algorithms based on a larger dataset and extract features using an automated technique to diagnose ALL with better accuracy.

Deep Learning-Based Methods
With the emergence of deep neural networks, we can achieve better performance in computer vision. A deep neural network can automatically extract task-specific features using two-dimensional convolutional filters to overcome the problem of predefined features. Deep neural networks are widely utilized in the field of computer vision, especially for medical image analysis, such as disease classification [23], localization [24], detection [25], registration [26], and segmentation [27,28]. However, the performance of a deep neural network depends on the size of the dataset, but unfortunately, it is hard to get a larger dataset for medical image analysis. In order to deal with small dataset transfer, learning strategy and data augmentation can be used to overcome the problem of the limited dataset. Rehman et al. [29] used a pre-trained AlexNet and fine-tuning to identify ALL subtypes on a private dataset composed of 330 images. Shafique et al. [30] proposed a deep CNN to classify all four subtypes of leukemia. To avoid training from scratch they used a pre-trained AlexNet to perform binary classification on 368 images. A classification model focused on both transfer learning and deep learning was proposed by Habibzadeh et al. [31] for WBC classification. The proposed approach began with preprocessing of the dataset and then employed transfer learning for feature extraction. Finally, the Inception and ResNet were used to perform WBCs classification. Their proposed model used 352 images for WBC classification. Ahmad et al. [32] proposed a CNN model that recognizes all subtypes of leukemia. Initially, they used 903 images for classification of ALL subtypes. They also used seven distinct data augmentation techniques to enhance the dataset size. In comparison to CNN, they also investigated other machine learning algorithms like decision tree (DT), SVM, Naïve Bayes, and KNN and found that CNN outperforms all other methods. Wang et al. [33] used SVM to identify spatial features and then suggested a neural network with marker-based learning vector quantization to detect and classify ALL cells. They used 24 samples (16 out of 27 ALL patients and 8 normal samples with no clinical history of leukemia) to train their proposed method for ALL diagnosis. Pansombut et al. [34] proposed a new method to identify ALL and its subtypes using a CNN network called ConVNet. They evaluated their model with other machine learning algorithms such as SVM, multilayer perceptron (MLP) and random forest. They used two types of datasets, one for ConVNet and the other for feature extraction. The overall dataset consists of 363 images.
Again, here is the problem of a limited number of images as many researchers used a tiny dataset consisting of 24 to 903 images. The performance results of such a small dataset cannot be considered an optimal performance indicator for medical image analysis. On the other hand, classification becomes extremely difficult when dealing with small datasets, which results in an incorrect and biased classification model [22]. In addition, using pre-trained models for classification needs to adjust the input image's size according to the predefined input shape of the networks, which can alter the morphology of ALL cell images [35]. It is difficult to classify ALL cell images because these leukocytes are challenging to detect due to their high intraclass homogeneity and low interclass separability. Furthermore, they used transfer learning or fine-tuning of deep neural networks for ALL classification. A model trained on non-medical images and used for medical image classification may not achieve optimal results because the cell image features may differ significantly from natural images. A model trained from scratch on cell images may converge to a better solution.
Currently some researchers used a larger dataset for the classification of ALL. Kasani et al. [36] proposed an aggregated-based deep learning-based model for the classification of ALL. They used fine-tuning and transfer learning to propose an ensemble model based on VGG19 and NasNetLarge architecture. Kassani et al. [37] proposed a hybrid model based on VGG16 and MobileNet for the classification of ALL cell images. Global average pooling (GAP) was used between VGG16 and MobileNet to extract high-level features from ALL cell images. LeukoNet was proposed by Mourya et al. [38] for the classification of ALL cell images. The proposed model combines the discrete cosine transform (DCT) domain features extracted using CNN with the optical density (OD) space features to build an effective classifier. Although these researchers used the C-NMC 2019 dataset composed of more than 10,000 images, but they did not segregate the training data based on subject-level variability. Instead, the classifier was trained by combining all of the subject's data into healthy and cancer classes. Because subject-specific features help in class discrimination, this pooling can restrict the classifier's performance. As a result, a practical classifier would require training on data from certain subjects and testing using data from entirely unknown subjects. In this paper we used the C-NMC 2019 dataset for the diagnosis of ALL. The cell images in the C-NMC dataset are selected carefully based on subject-level variability. In order to get more robust results, we split the dataset into seven-fold cross-validation so that no two splits overlap in terms of subject data. This strategy also ensures that none of the cell images of a subject used in training were used to test the classifier. In comparison to other ALL datasets such as the ispat general hospital (IGH) [17], ALL image database for image processing (ALL-IDB) [39], medical image and signal processing research center (MISP) [22], and American society of hematology (ASH) [32], the C-NMC 2019 dataset has a substantially more significant number of images, which can aid in the development of a robust and scalable ALL diagnostic classifier. Our proposed simple model uses an attention module to refine features more and overcome the problem of morphological similarity bringing more reliability to the results and thus classifying the images as healthy or not. To the best of our knowledge, we are the first to use the attention module with a CNN model for ALL classification.

Dataset Description
We have used the publicly available dataset C-NMC 2019. The dataset is released by the cancer imaging archive (TCIA) for the ALL challenge competition. The objective of this competition is to create a computer-aided design (CAD) system to differentiate the normal cells from the leukemic blast (malignant cells) in microscopic images of blood clotting. Both training and test images have been processed to 24-bit RGB format with a consistent resolution of 450 × 450 pixels. Errors in lighting, uneven staining, and image noise have been resolved using the methods given by [11,12]. The images in this dataset were labeled as normal or malignant by an experienced oncologist. The dataset consists of 10,661 single cell images collected from 76 individual subjects, whereas 7272 images were taken from 47 patients having ALL, and the rest 3389 were taken from 26 healthy subjects. The C-NMC dataset includes a considerably large number of images that can help in designing a robust diagnostic ALL classifier. In our experiment, the dataset is divided into 7 folds based on subject-level variability as shown in Table 1. To avoid overfitting and to make the dataset balanced, we applied different data augmentation techniques.

Image Preprocessing
We first resized the input images into 224 × 224 as VGG16 takes an input image of size 224 × 224. Secondly, we normalized the input images by subtracting the channel-wise mean red, green, and blue (RGB) values of all images in the training set and dividing the images with standard deviation [40].

Data Augmentation
Data augmentation techniques can be used to overcome the problem of the limited size of a dataset, specifically in deep learning problems. Several techniques for image augmentation such as flipping, cropping, and rotation are used to obtain different copies of the original image. It gives the algorithm more generalization capabilities when we train our system not only with original images but with augmented images too. It has been stated that training a system with augmented images reduces the error rate and provides better generalization [41][42][43][44][45][46]. The CNM-C 2019 is an imbalanced dataset that may degrade the performance of deep learning models. Data imbalanced problems usually cause overfitting and lead the model towards poor generalization. In order to avoid the problem of overfitting and to increase the number of images in the dataset, we used different data augmentation techniques such as rotation, horizontal and vertical flip, brightness correction, and contrast adjustments on the training dataset. The augmentation on images is applied on the run for every single image. Every time the model gets an image, it loops through the image and generates four augmented images in case of healthy cell images and generates two images in case of ALL cancer cell images, as shown in Figure 2. Random rotation is applied to rotate the images clockwise, according to the value between (−45 to 45) degrees. Horizontal and vertical flipping was also applied on images to horizontally and vertically flip the entire rows and columns of an image to produce a mirror image. Table 2 represents our dataset distribution before and after augmentation for each fold, while Figure 3 represents the images before and after augmentation. There is a smaller number of healthy cell images than ALL cancer cell images in the C-NMC dataset. Therefore, to balance the dataset class-wise, we randomly applied the mentioned augmentation techniques on every image. These augmentation techniques always vary between flipping, rotation, contrast, and brightness correction. The healthy images are augmented at a double rate than ALL cancer cells to produce an equal number of images of both classes. Whenever the model gets an image, it generates two augmented images in the case of cancer cell images and four augmented images in the case of normal cell images.

ECA-Net Based on VGG16
The Overall Architecture The proposed model is based on VGG16, which is one of the famous CNN architecture that won the ImageNet large-scale visual recognition challenge (ILSVR) competition in 2014. There are two big motivations behind using VGG16 as our backbone model for ALL classification. Table 3 provides a comprehensive overview of plain VGG16. To begin with, it extracts the features at a low level by utilizing a smaller kernel size and fewer layers as compared to its counterpart VGG19 network and other deep learning models. Instead of having a huge number of hyper-parameters, the network follows a simple design, where each block of VGG16 consists of convolutional layers with 3 × 3 filters and unit stride. Max pooling layers of 2 × 2 windows with the stride of 2 is used to reduce the size of the image to half. Throughout the network, each block follows the same combinations of max pooling and convolution layers. In the end, it uses 3 fully connected layers (FC) for the output. The stacking of convolutional layers enables the network to capture more information with lesser computational overhead hierarchically. The newer deep learning models such as DenseNet [47], ResNet [48], Inception [49], etc. use a lot of convolution layers, whereas our objective was to create a simple deep learning model for ALL classification. Secondly, the VGG16 model has a higher feature extraction ability for the classification of ALL cell images as shown in [50]. The shallow network keeps more information about the underlying features, which is important for cell texture identification. The overall architecture of proposed model is shown in Figure 4. To explore high value featured related to ALL in the input image, we use an attention module to force the network to learn the high-level features. Attention not only illustrates where it should be focused, but also enhances the representation of features. Wang et al. [52] proposed a local cross-channel interaction strategy which is realized via 1D convolution called ECA, which can be widely used to boost the representation power of CNNs. In addition, considering the complexity of ALL features, it is difficult for the traditional CNNs to train these images. VGG16 with an attention module is more suitable for the proposed problem due to the improvement of feature representation. The shallow convolution modules can only extract the edge and texture features. The deep convolution modules can provide more abstract semantic features, which can better distinguish between ALL cancer and healthy cell images. The ECA module is added to enhance and amplify the difference of semantic features extracted by VGG16. This model is expected to overcome the morphological similarity and further improve the classification performance. Moreover, the ECA module also allows the model to focus on more important channel features. The salient feature information is used by ECA attention to accomplish task adaptive feature pooling operation. The ECA module aids in medical image analysis to automatically learn to focus on target structures of various shapes and sizes. Intuitively, a model trained using the ECA module learns to suppress irrelevant regions in input images while emphasizing silent features for a given task to improve the accuracy and efficiency of a deep learning model. The architecture of the ECA module is shown in Figure 5. Human biological processes are the best approaches to demonstrate the intuition behind attention. For example, in order to assist perception, our visual processing system selectively focuses on specific portions of the image regardless of extraneous information [53]. The ECA module extracts the information from each channel of VGG16 results in a weighted sum of all the aggregated features. This allows the deep learning model to assign greater weight to certain elements of the input images. In Section 4.2 we show the feature maps generated by our proposed method clearly presented that our model focused on the cell itself rather than the image background. Attention modules are used to help CNN learn and to focus on more important features, instead of learning non-useful context knowledge. In order to learn more deep features, ECA has been employed after every VGG16 block. The ECA module helps to avoid feature dimensionality reduction caused by the convolutional block which inevitably brings side effects and is conducive to capturing the dependency between the channels. The ECA module only involves few parameters to explores the local cross-channel interaction by implementing a 1 × 1 convolution via adaptive selection of the kernel size k. In our proposed model we used k = 9 extra parameters with our backbone VGG16 following the original ECA paper. Gradient descent is also utilized to optimize these parameters. ECA module can increase the information interaction between channels of the feature maps, reduce the model complexity, and maintains the performance. Figure 5. The ECA module takes the feature map, which is the output of a convolutional block and is 3-dimensional in shape (W × H × C). W, H, and C represent the width, height, and number of the feature map channels. The GAP reduces the dimensionality of the input feature map into 1 × 1 × C. Then the ECA module explores the local cross-channel interaction by implementing a 1 × 1 convolution via an adaptive selection of the kernel size k. After passing through the sigmoid function, the output channel weights of dimension 1 × 1 × C are used to refine the input feature map via element-wise product. Finally, the refined feature map is used as the input of the next convolutional block.
After the global average pooling (GAP), the ECA module considers each channel of the input feature maps and its k nearest neighbors, and quickly completes the channel weight calculation through one-dimensional convolution. Where k represents the number of neighboring channels involved in the calculation of a channel weight, and the value of k affects the efficiency and effectiveness of the ECA. The ECA module helps to adaptively compute k via a function of channel dimension C, and then performs 1D convolution followed by a sigmoid function to learn channel attention.

Experimental Setting
In our experiments, we resized the cell images to 224 × 224 resolution. The learning rate was set to 0.0001 with stochastic gradient descent (SGD) optimization and the network is trained for 50 epochs. The batch size is set as 16 in the training process. The cross entropy loss function is used as a loss function to train our model. The software environment used is Ubuntu 16.04, Python 3.5, and Pytorch 3.7. The hardware environment is CPU E5-2630 and a NVIDIA GV100GL Tesla V100 32GB graphics processing unit (GPU).

Evaluation Metrics
Five metrics, namely accuracy, sensitivity, precision, specificity, and F1 score are used to determine to evaluate the performance of the proposed method. Accuracy calculates the number of images correctly classified, divided by the total number of images in the test set, which is defined as: Sensitivity (recall) or true positive rate (TPR) in the diagnosis of disease indicates the proportion of true positive outcomes over all actual positive cases (malignant cell images).
Specificity or true negative rate (TNR) computes the true proportion of all true negative outcomes (normal cell images).

Speci f icity(%)
The precision is measured as the actual number of accurately labeled positive samples over the total number of positive samples (either correctly or incorrectly). The precision measures the model's accuracy in classifying a sample as positive.
F1 score is measured as the weighted average of sensitivity and precision. Therefore, this score takes both false positives and false negatives into account.

Performance of the Proposed Method
The original dataset is three-fold. We used the three-fold dataset and split it into seven folds based on subject-level variability. As a result, the experiment is performed six times with various combinations of training data each time. We first train/validate our model using fold-1 data and test it using fold-7 data for final accuracy. We apply the same strategy to the other remaining five folds. Fold-7 is only used to test the efficiency of our proposed model. The dataset is divided based on subject-level instead of image level. Furthermore, we divided the training set into 20% for model validation. The validation dataset is usually used to tune the hyperparameters such as the learning rate, number of epochs, etc. In our experiment, we set the epochs to 50. Our model is simple, which typically reaches an optimal point on the epochs between 40 to 45, and then possibly it stops converging. We used a validation set to get the best model while training. For example, whenever the model achieves good accuracy on the validation set in the training phase, it saves the best model with the best parameters. Later, we test this model using testing data to come up with overall accuracy. In order to verify the robustness of the model, we use different folds to train the model and calculate the average accuracy and standard deviation for each fold. Finally, the six performance estimates are summed to get an overall assessment of the classifier. Table 4 present the results of each fold's performance for VGG16 with and without attention module. Our proposed VGG16 with the ECA module outperforms plain VGG16 by identifying the most pertinent information in an input image and aids in classification accuracy. Adding the ECA module helps the model increase the model performance (the attention module does not ignore the irrelevant features, but it just diminishes their importance). Tables 5 and 6 represent the evaluation matrices of VGG16 with and without attention module obtained from different folds. In ALL classification, we can see that fold-1 produces the best mean accuracy result for cancer and healthy cell images. We can presume that the sample selection for the training set can significantly affect the performance results. Figure 6 provided the receiver operating characteristic (ROC) curves generated in different folds on testing data, while Figure 7 shows the confusion matrices obtained from the proposed method. It can be seen that the proposed method successfully identified most of the cell images in their respective classes. The obtained results suggested that the ECA module helps to improve the model prediction. We tested our approach using six-fold cross-validation since it produces more trustworthy findings in evaluation. Table 4. Accuracy and standard deviation of our proposed model and plain VGG16 over 6-fold cross-validation. Each column in the table represents the fold number. The bold value represents the best performance among 6 folds, while the underline value shows the second best performance.

Impact of Using Attention
To test the efficiency of the attention module, we show the feature maps of the first and second block of VGG16 with and without attention module for the classification of ALL cancer and healthy cell images. We used plain VGG16 and VGG16 with attention to classify ALL cell images. Figure 8 illustrates the feature maps of ALL cancer and healthy cell images generated by the first and second blocks of VGG16 with and without the ECA module. The results indicated that the ECA module can enable the VGG16 model to automatically highlight the relevant features of the input images which, in ALL, is typically the whole cell area with the edges to improve the model performance. The feature maps generated with the ECA module spotlight more on the cell area and edges as compared to plain VGG16.

Comparisons with Other Approaches
Comparison of our proposed method with existing methods is provided in Table 7. These methods used the training part of the C-NMC 2019 dataset, but they did not consider the subject-level variability. They took the initial training (three-fold) part of the dataset and split the data into testing, training, and validation sets. However, it should be noted that no cross-validation was used in the evaluation, and how the test and training sets were chosen is unknown. In our proposed method, we split the training data of the C-NMC 2019 challenge into seven folds by keeping the subject-level variability in mind, which was ignored by previously proposed methods. We split the dataset so that all of the cell images belonging to the same subject are placed in the same fold, and not a single fold includes cell images from the same subject. It is observed that our proposed method gives satisfactory results as compared to the methods proposed in the literature. Classifying ALL cell images is challenging because of the morphological similarities between ALL cancer and healthy cell images and subject-level variability. Table 8 presented methods with the top entry of the C-NMC 2019 Challenge. These methods used the C-NMC 2019 dataset's three sets, i.e., training set, preliminary test composition, and final test set composition. They have only reported the F-1 score of their model performance and did not provide the results concerning other evaluation metrics for the classification of ALL. Table 7. Performance comparisons between our proposed method and other approaches.

Discussion
In this paper, a CNN-based method for the classification of ALL cancer and healthy cell images has been proposed. The proposed model is an enhancement of VGG16 CNN architecture. The attention module not only shows where attention should be taken, but it also improves feature representation. The results show that our proposed method based on the ECA module is able to extract and fuse deep features with a mean test accuracy of 0.911 in ALL classification.
By referring to the classification results, it is obvious that although our proposed method is relatively simple, it obtains acceptable performance for the ALL diagnosis. Hence, the proposed algorithm can be used as an assisting diagnostic tool for pathologists. Furthermore, the clinical impact of this research is that it helps the pathologists to examine a blood smear for finding cancerous cells. It should be noted that, from the performance comparisons of different networks, certain networks performed poorly which is not surprising considering the morphological similarity between the normal and cancerous cells. To conclude, the attention module can increase the CNN model's efficiency in ALL classification. The proposed attention-based CNN has better adaptability, robustness, and classification accuracy than the state-of-art deep learning-based approaches. In this work, we have used only C-NMC 2019 training set for binary classification. As a future work direction, the model could easily be extended for multi-class classification and can be evaluated different data datasets of same or similar disease.

Conclusions
In this study, we proposed a diagnostic support system based on CNN architecture with an ECA module to accurately classify ALL cancer and healthy cell images. VGG16 is used as our backbone to extract the features from the source images. The ECA module was incorporated after each convolutional block to further enhance the relevance of extracted features from VGG16. The performance comparison between the VGG16 model with and without the ECA module showed that the attention mechanism helps in improving the model accuracy since it explores the relationship between channels and obtains better feature representation. The findings of this study revealed that our proposed deep learningbased model outperforms the state-of-the-art approaches. Integrating the attention module in the deep learning architectures may yield a significant performance gain. We evaluated our method on the C-NMC 2019 dataset. The obtained results showed a mean accuracy of 0.911 on this challenging dataset. In the future researchers can focus more on reducing the false positive rate to further improve the final accuracy.