1. Introduction
Malaria is a disease caused by mosquitoes through bites by female Anopheles mosquitoes. Different types of mosquito parasites infect humans because of these bites. These include Plasmodium Ovale, Plasmodium malariae, Plasmodium vivax, and Plasmodium falciparum. The world malaria report in 2020 [
1] showed that these parasites are responsible for up to 229 million estimated cases of malaria and a total of 409,000 deaths worldwide. Children aged under 5 are the most vulnerable group, accounting for 67% of global deaths caused by malaria. Africa accounted for up to 94% of global cases, and six countries accounted for approximately half of the global deaths caused by malaria: Nigeria (23%), Burkina Faso (4%), Niger (4%), the Democratic Republic of Congo (11%), United Republic of Tanzania (5%), and Mozambique (4%). Other locations affected more by malaria include South-East Asia and the Eastern Mediterranean.
Despite all the efforts and financial resources put in place, e.g., more than USD 3 billion was accumulated for malaria control and elimination in 2019, of which a USD 900 million contribution came from malaria-endemic countries, the disease is still ravaging and threatening lives.
Light microscopy using blood films is currently the most popular technique for diagnosing malaria [
2]. Diagnosing malaria under a microscope is conducted by applying a patient’s blood drop on a glass slide, and subsequently immersing it in a staining solution to make the parasites visible. Thick and thin blood smears are prepared; thick smears allow the parasite to be detected more efficiently than thin smears. Thin smears, on the other hand, are not without advantages, since they allow the examiner to identify species and recognise parasite stages easily [
3]. Light microscopy requires no complex tools but only human expertise which makes it cheaper, affordable, and readily available. However, the biggest disadvantages of using this technique include the requirement for extensive training of personnel to equip them with necessary skills to become proficient slide readers. Training and employing staff is highly expensive and involves a large volume of manual work.
Therefore, this study proposes the use of a state-of-the-art automated technique to detect malaria parasites in blood-smear images. Our contributions include the design of automatic procedures using transfer learning to detect blood samples infected by malaria parasites. Secondly, several deep-learning models were used for feature extraction and four machine-learning algorithms for classification are comprehensively analysed. We also provided an in-depth analysis of the computational complexity in each case.
The rest of the paper is organised as follows:
Section 2 provides excerpts of related literature,
Section 3 provides a detailed description of the methodology,
Section 4 provides experimental results and discussion, and
Section 5 provides the conclusion.
2. Literature Review
The latest trend that has boosted the performance of many non-medical domains is the use of deep learning. Deep learning is an extension of a well-known multilayer neural network that automatically learns complex data representations, also known as features. However, deep-learning models, unlike humans, require a huge database of quality and annotated data to learn and make an effective decision on future occurrences. This, perhaps, is one of the reasons the medical domain has been unable to adopt the new technology during its early proliferation because is harder to obtain annotated training sets and many privacy concerns arise. Interestingly, the trained deep-learning models can be used to solve problems in the different but similar applications via an approach known as transfer learning. These trained deep-learning models are better known as pre-trained models, which can be described as a model that has already learned to solve a similar problem as the intended one. Transfer learning is one of the three ways to use deep-learning techniques. Other techniques include training the deep-learning model from scratch and fine-tuning the existing deep-learning models [
4]. Examples of pre-trained deep-learning models include the VGG16, VGG19, ResNet50, ResNet101, ResNet152, DenseNet121, DenseNet201, AlexNet, and Xception models.
Deep learning has been applied in the medical field to address various problems such as face recognition [
5,
6,
7], effective classification of skin burns [
8,
9,
10,
11,
12], and cancer diagnosis [
13,
14,
15], as well as in financial fraud detection [
16,
17]. Interestingly, a similar approach was adopted recently to discriminate between blood-smear images that include the Plasmodium parasite and those that do not.
In [
18], the authors proposed the use of deep-learning techniques for the diagnosis of malaria diseases. This was used basically to discriminate a blood-smear sample that contains malaria and those samples that are negative. During the investigation, two well-known approaches were used: training from scratch and transfer learning approaches using 27,578 RGB images comprising infected and uninfected samples in the ratio 1:1. All images were resized into
pixels. In the first approach, 16-layered Convolutional Neural Network (CNN) architecture was used with 6 convolution layers and 3 fully connected layers. The first and the second convolution layers had 32 filters each and they were followed by the max-pooling layer. The third and the fourth convolution layers had 64 filters each and were followed by the average pooling layer, and the fifth and sixth convolution layers had 128 and 256 filters, respectively. Lastly, the architecture of the CNN model was followed by three fully connected layers each with 256 feature neurons. The model was trained with 90% of the images and tested using the remaining 10% of the unseen image samples, and achieved 97.37% accuracy. In the second approach, a pre-trained CNN model (i.e., AlexNet) was used for feature extraction and SVM for classification. This transfer learning approach achieved a 91.99% accuracy
In another development, a study by Huq, A., et al. [
19] used 27,558 images that consisted of 13,779 normal images and 13,779 malaria-infected images obtained from the National Institute of Health. Though all the images were of variable sizes, and all have three channels, they were all rescaled to 224 × 224 to match the input shape of the VGG16 pre-trained model used. In addition, as part of the data preparation before the training, the data were split into three: 70% of the data from each class was labeled as training data, 30% was for validation, and 10% was taken out of the training data as testing split. The training data further shrank to 60%. Two experiments were conducted: a standard one and adversarial pieces of training. In the standard training, original images were used, and during testing, and thereafter, some adversarial images were introduced or added into the testing images. The standard accuracy and adversarial accuracy achieved were 95.96% and 29.40% respectively, in which the study showed that the model was fooled by the perturbation introduced. In the second phase of the training (i.e., adversarial training), adversarial images were incorporated into the training dataset and the algorithm was able to learn and classify adversarial images (testing split) more effectively, achieving up to 93.38% accuracy, allowing the model to learn robustly and to stand against such perturbation. It also achieved a 95.79% accuracy on standard images.
In a similar development, research by Rajaraman et al. [
3] was conducted to classify 27,558 blood-smear images in which 13,779 contained malaria parasites and 13,779 were uninfected using both deep learning from scratch and a transfer learning approach. A CNN model trained from scratch contained 3 convolution layers with a max-pooling layer after each layer, 2 FC layers, and ReLU activation. The CNN model had an input shape of 100 × 100 × 3 and all convolution layers had a filter size of 3 x 3. The second approach of that study used a certain specific layer which according to the authors provides strong discriminatory features. These pre-trained models included AlexNet, VGG16, ResNet50, Xception, and DenseNet121, and the corresponding feature layers were fc6, block5_conv2, res5c_branch2c, block14_sepconv1, and conv_16_x2, respectively. The results showed that the trained CNN model achieved a classification accuracy of 92.7% and for the transfer learning approach, ResNet50 achieved the best recognition accuracy of 95.9% with a sensitivity of 94.7%.
Another study by Rajaraman et al. [
20] proposed the use of ensemble learning using three pre-trained models (VGG19, SqueezeNet, and InceptionResNet-V2) and a custom-trained CNN model. Features of those four models were put together before final classification to discriminate images that are infected with the malaria parasite and those that are healthy. In addition, the pre-trained models were also trained independently, and a comparison was made. The result showed that VGG19 achieved a 99.32% accuracy with a sensitivity of 99.31%, outperforming all other pre-trained models including the ensemble method, which recorded an accuracy of 99.11% with a sensitivity of 98.94%
A study by Reddy et al. [
21] used a pre-trained ResNet50 model to classify 27,558 blood-smear images for malaria detection. These images were evenly distributed into infected and non-infected images. The approach in this study was the removal of the top-most (classification) layer which was originally trained to classify 1000 classes of objects. In the new scenario, a classification layer with 2 output neurons was added to solve the binary problem, while the earlier (lower) layer of the ResNet50 was frozen allowing them only to contribute features towards training the newly added layer at the top. The new model was compiled and trained using Stochastic Gradient Descent and recorded a validation accuracy of 95.4%.
Several pre-trained models—AlexNet, VGG16, NasNetMobile, Xception, Inception and ResNet50—were also used for malaria parasite detection using blood-smear images by
Sriporn et al. [
22]. A total of 7000 RGB blood-smear images comprising 4500 infected and 2500 uninfected were used in the study, and data augmentation was applied by rotating images by 90, 180, and 270 degrees to increase the number of the samples while preserving the detailed information. Two activation functions (ReLU and Mish), along with three different optimizers (RMSprop, Nadam, and SGD), were tested with each model. The results showed that the Xception model with the Nadam optimizer and Mish activation function achieved the best detection accuracy of up to 98.8%.
4. Results and Discussion
Table 1 presents the results using VGG16 features. It shows the accuracy of each classification algorithm along with several performance evaluation measures, such as precision, recall (sensitivity), f1-score, and training time. Precision (also known as positive predictive value) represents a fraction of the relevant parasitic blood-smear images among the retrieved instances. Precision is mathematically represented as
, where TP stands for true positive, and FP stands for false positive. Recall represents a fraction of relevant retrieved instances from parasitic blood-smear image samples. Recall is mathematically represented as
, where FN stands for false negatives. F1-score is the harmonic mean of precision and recall.
The result in
Table 1 shows the performance outputs of all four classification algorithms. These results were obtained from the VGG16 features. DT and SVM achieved precision of 89.10% and 94.26%, recall of 89.11% and 95.57%, f1-score of 89.22% and 94.91%, and accuracy of 89.24% and 94.88%, respectively. The precision for NB and KNN classifiers is 62.83% and 88.66%, recall of 96.47% and 96.49%, f1-score of 76.10% and 92.41% and accuracy of 69.70% and 92.07%, respectively
The result in
Table 2 shows performance outputs of all the four classification algorithms obtained from the VGG19 features. DT and SVM achieved precision of 86.06% and 93.84%, recall of 85.82% and 95.22%, f1-score of 85.94% and 94.52%, and accuracy of 85.94% and 94.48%, respectively. The precision for NB and KNN classifiers are 60.26% and 85.71%, recall of 96.64% and 95.15%, f1-score of 74.23% and 90.19% and accuracy of 66.46% and 89.64%, respectively.
The result in
Table 3 shows performance outputs of all the four classification algorithms obtained from the ResNet50 features. DT and SVM achieved precision of 86.89% and 94.22%, recall of 87.17% and 95.53%, f1-score of 87.03% and 94.87%, and accuracy of 87.05% and 94.84%, respectively. The precision for NB and KNN classifiers are 80.82% and 84.69%, recall of 87.67% and 94.76%, f1-score of 84.11% and 89.44% and accuracy of 83.43% and 88.81%, respectively.
The result in
Table 4 shows performance outputs of all the four classification algorithms obtained from the ResNet101 features. DT and SVM achieved precision of 88.93% and 94.11%, recall of 88.40% and 95.60%, f1-score of 88.45% and 94.85%, and accuracy of 88.48% and 94.81%, respectively. The precision for NB and KNN classifiers are 82.95% and 83.11%, recall of 88.67% and 96.02%, f1-score of 85.72% and 89.10% and accuracy of 85.22% and 88.25%, respectively.
The result in
Table 5 shows performance outputs of all the four classification algorithms obtained from the DenseNet121 features. DT and SVM achieved precision of 87.45% and 92.55%, recall of 86.82% and 96.15%, f1-score of 87.13% and 94.32%, and accuracy of 87.08% and 94.21%, respectively. The precision for NB and KNN classifiers are 71.67% and 79.77%, recall of 73.79% and 89.27%, f1-score of 72.71% and 84.26% and accuracy of 72.31% and 83.00%, respectively.
The result in
Table 6 shows performance outputs of all the four classification algorithms obtained from the DenseNet201 features. DT and SVM achieved precision of 86.60% and 92.93%, recall of 86.14% and 95.99%, f1-score of 86.37% and 94.43%, and accuracy of 86.33% and 94.34%, respectively. The precision for NB and KNN classifiers are 68.05% and 77.95%, recall of 83.96% and 93.93%, f1-score of 75.17% and 85.20% and accuracy of 72.27% and 83.68%, respectively.
Figure 3 depicts the comparison of each classification algorithm based on accuracy. The SVM classifier performed effectively well than the rest of the algorithms and VGG16 features carry strong discriminatory information. In terms of computational efficiency, we keep track of the training time of each classification algorithm in seconds.
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 show the respective training times using VGG16, VGG19, ResNet50, ResNet101, DenseNet121, and DenseNet201 features, respectively.
Figure 4 shows that the SVM is computationally inefficient compared to the rest of the classifiers during the training stage, while NB is more efficient computationally than the rest of the classifiers. When we compare the NB training time, one will notice that it is the poorest classification compared to the rest of the algorithms used in this study to detect the malaria parasite in blood-smear samples. Moreover, VGG16 features provide the best detection and classification accuracy with SVM of up to 94.88% but it took more than 71163 s (19 h equivalent) to train. VGG19 features has not performed any better than VGG16 features with SVM in terms of both accuracy and computational time. Both features from VGG16 and VGG19 have 4096 feature vectors each. ResNet50 and ResNet101 have 2048 feature vectors each, and each performed better with SVM achieving classification accuracy of up to 94.84% and 94.81%, respectively, and higher computational time with ResNet50 features as seen in
Figure 4. Lastly, DenseNet121 with 1024 feature vectors and DenseNet201 with feature vectors yielded the best detection accuracy of 94.21% and 94.34%, respectively along with SVM.
One notable thing we observe is that the accuracy is higher with an increased number of feature vectors. Additionally, SVM produced outstanding results in each case, with a slight fractional difference.
Receiver Operating Characteristics (ROC) Curve
To further analyse the results obtained in this study, ROC curve is used. It has been used widely as a tool for analysing overall test performance and for the comparison of the discriminating ability of clinical tests [
12,
29]. It is based on the graphical curve plotting the relationship between true positive rate and false positive rate over threshold points of a test. The area under the ROC curve (AUC) summarises the overall performance estimate of the test, where AUC value below 0.5 indicates poor diagnostic test, AUC = 0.5 is considered same as random guesswork by an inexperience clinician, AUC > 0.5 indicates a good diagnostic test, and AUC = 1 stands for perfect diagnostic test.
Figure 5 shows ROC curves of all the four classification algorithms using VGG16 features. All four classification algorithms produced impressive results, outperforming random guess with SVM achieving an almost perfect diagnostic test. The ROC curve depicted in
Figure 6 was generated using VGG19 features, with SVM outperforming all the other classifiers.
Figure 7 and
Figure 8 are the ROC curves using ResNet50 and ResNet1010 features, respectively. With ResNet50 features, SVM achieved an AUC of 0.985 and with ResNet101, the AUC is 0.987
With DenseNet121 and DenseNet201 features, both yielded 0.98 AUC score with SVM classifier as the best performing diagnostic test as shown in
Figure 9 and
Figure 10, respectively
5. Conclusions
In this study, we presented a comprehensive investigation using state-of-the-art algorithms to effectively detect malaria parasites in blood-smear samples. We took advantage of existing pre-trained models to extract useful discriminatory features from the dataset images and subsequently used machine-learning algorithms to classify each sample to know whether a patient with a given blood sample is infected or not.
The result shows feasibility of using machine learning to detect malaria parasites in blood samples with accuracy of over 94%.
However, the dataset may contain parasites other than malaria. As such, this a limitation that is worth investigating in the future in addition to detecting each parasite (i.e., Plasmodium Ovale, Plasmodium malariae, Plasmodium vivax, and Plasmodium falciparum).