ROENet: A ResNet-Based Output Ensemble for Malaria Parasite Classification

(1) Background People may be infected with an insect-borne disease (malaria) through the blood input of malaria-infected people or the bite of Anopheles mosquitoes. Doctors need a lot of time and energy to diagnose malaria, and sometimes the results are not ideal. Many researchers use CNN to classify malaria images. However, we believe that the classification performance of malaria parasites can be improved.(2) Methods In this paper, we propose a novel method (ROENet) to automatically classify malaria parasite on the blood smear. The backbone of ROENet is the pretrained ResNet-18. We use randomized neural networks (RNNs) as the classifier in our proposed model. Three RNNs are used in ROENet, which are random vector functional link (RVFL), Schmidt neural network (SNN), and extreme learning machine (ELM). To improve the performance of ROENet, the results of ROENet are the ensemble outputs from three RNNs.(3) Results We evaluate the proposed ROENet by five-fold cross-validation. The specificity, F1 score, sensitivity, and accuracy are 96.68 ± 3.81%, 95.69 ± 2.65%, 94.79 ± 3.71%, and 95.73 ± 2.63%, respectively.(4) Conclusions The proposed ROENet is compared with other state-of-the-art methods and provides the best results of these methods.


Introduction
People may be infected with an insect-borne disease (malaria) through the blood input of malaria-infected people or the bite of Anopheles mosquitoes. Colds, periodic and regular attacks, fever, and sweating are several characteristics of patients infected with malaria. After many attacks for a long time, it can cause anemia and splenomegaly. Thus far, the harmful impact of malaria in the world is still very serious. People living in malaria-endemic areas account for about 40% of the total human population. Malaria is one of the most feared diseases for people living on the African continent. Every year, malaria patients on the African continent account for 90% of malaria patients in the world. Malaria is the cause of death of more than two million people worldwide every year. Malaria is also prevalent in Southeast and Central Asia. The infection sources of malaria are roughly divided into two categories: (i) malaria patients and (ii) asymptomatic carriers containing gametophytes. The transmission probability of malaria increases with the increase in gametophyte density.
The earlier malaria is diagnosed, the more conducive it is for treatment of patients. Thus far, the diagnostic methods of malaria in the hospital are as follows: (i) clinical manifestation, (ii) molecular biological technology diagnosis, and (iii) therapeutic diagnosis. However, these diagnostic methods require doctors to spend a long period of time when diagnosing. This long diagnosis process may delay the timely treatment of patients. Moreover, doctors are easily disturbed by some factors in the diagnosis process, such as lack ofsleep, illness, and so on. Therefore, the process of diagnosing malaria by doctors is very inefficient.
Many researchers are willing to apply computer technology to malaria diagnosis. Manescu et al. [ From the above description of malaria diagnosis by computer technology [11], it can be concluded that most scholars use the CNN model for experiments. However, we believe that the classification performance of malaria parasite can be improved. We propose a new model (ROENet) to automatically classify malaria parasites on the blood smears. ROENet means that the model is a ResNet-based output ensemble for malaria parasite classification. The main contributions of this study are the following:

•
A novel method (ROENet) is proposed to automatically classify malaria parasite on the blood smear.

•
The fine-tuned ResNet-18 is the feature extraction.
• Three RNNs are selected to replace the last five layers of the fine-tuned ResNet-18.
• Three RNNs are selected as the classifier of the proposed ROENet.

•
The final outputs of ROENet are the ensemble outputs from three RNNs.
The remainder of the study is organized as follows: the public dataset is demonstrated in Section 2; Section 3 details the method; the experiment settings and results are presented in Section 4; Section 5 is the conclusion.

Materials
The malaria images are available on the NIH website, which the Chittagong Medical College Hospital prepared. This dataset is public. Giemsa-stained thin blood smears were from 150 patients infected with Plasmodium falciparum and 50 healthy patients in Chittagong Medical College Hospital, Bangladesh. These stained thin blood smears were photographed. The slide image of each micro field of view was taken through the built-in camera in the smartphone. These images were manually marked by experts from Oxford Tropical Medicine Research Center in Mahilon, Bangkok, Thailand. There are 27,558 malaria images in total, including 13,779 images of parasitized and 13,779 images of uninfected. Some of the parasitized and uninfected images are provided in Figure 1. The image processing method is used in this open dataset to find parasites in the digital image of blood film. The typical shape, data, and visual appearance of parasites are marked manually by experts. If there is no expert mark, the image is uninfected.

Proposed ROENet
The acronym and full explanation table is provided in Table 1. As more and more scholars research image analysis, image analysis technology continues to progress [12]. One of the most significant steps in the analysis of the image is to extract features from images [13]. However, each image contains too much content. Extracting useful features in the image quickly and accurately has been perplexing to scholars. Previously, scholars manually extracted features from the image [14]. However, the process of manually extracting features was very time consuming, and the results were often not ideal. More and more scholars were applying computer technology to image analysis and proposed many CNN models [15], such as AlexNet, ResNet, etc. In the CNN model, the convolution layers and pooling layers can reduce the number of parameters. In this situation, calculation and experimental time are greatly reduced. This is one of the reasons why the CNN model is so popular [16].
This paper proposes a novel method (ROENet) to automatically classify malaria parasite on the blood smear. The public dataset can be downloaded on the NIH website. The backbone of ROENet is the pre-trained ResNet-18. ResNet-18 was pre-trained on the ImageNet. Therefore, the output nodes of the pre-trained ResNet-18 are 1000. However, the output nodes are two in this paper. Therefore, we perform some modifications to the backbone. We chose randomized neural networks (RNNs) as the classifier in our proposed model. Three RNNs are used in ROENet, which include random vector functional link (RVFL) [17], Schmidt neural network (SNN) [18], and extreme learning machine (ELM) [19]. The results of ROENet are the ensemble outputs from three RNNs to improve the performance. Table 2 demonstrates the pseudocode of ROENet. The flowchart of ROENet is provided in Figure 2.

Backbone of ROENet
The depth of the network has a great influence on the performance of the CNN model. In theory, with the deepening of the network depth, the model should achieve better performance [20]. However, the performance of deeper networks is not necessarily good. When the depth of the CNN model continues to increase, the CNN model may encounter the degradation problem [21]. The performance of the model stagnates or even decreases when the number of layers of the CNN model increases. This is the problem of degradation [22].
The degradation problem at least shows that the network model is difficult to train. By adding new layers, the network model increases the depth continuously [23]. Sometimes, the newly added layers learn nothing and simply copy the features of the previous layer. This is identity mapping [24]. This can ensure that the performance of the network model will not degrade. Residual learning solves the degradation problem in this way [25]. For a CNN structure (formed by stacking several layers), X is the input, P(X) is the learned feature, and the feature extracted by residual learning is recorded as Q(X).
Zhu et al. Page 4 P X )= Q X )+ X (2) It can be seen from the above formula that the worst case for residual learning is that the residual is 0, so the stacking layer just completes the identity mapping [26]. When the residual is not 0, the stacking layer will learn new features to improve the network's performance. Therefore, residual learning can cope with the degradation problem [27]. The residual learning structure is demonstrated in Figure 3.
The backbone of ROENet is the pre-trained ResNet-18. The output nodes of the pretrained ResNet-18 are 1000. However, the output nodes are two in this paper. Therefore, we perform some modifications to the backbone. The modifications of the backbone of the proposed ROENet are provided in Figure 4. FC1000, softmax, and the classification layer are removed. We add FC128, ReLU, BN, FC2, softmax, and the classification layer.

Classifier of ROENet
There are many layers in the CNN model, and each layer has many parameters. The randomized neural networks (RNNs) have only three simple layers: input layer, hidden layer, and output layer. Only the shallow structure of the three-layer RNN model can effectively alleviate the overfitting problem. The parameters (the randomized weights and biases) in the RNN model are also trained quickly. Because RNN has good classification performance, it has been applied to many machine learning tasks, such as geography, big data analysis, chemistry, and so on. Three RNNs are used in this paper, which are ELM, RVFL, and SNN. ELM projects the input features into the hidden space randomly and does not need gradientbased backpropagation to adjust the weights [28]. The most obvious structural difference between RVFL and ELM is that there is a quick connection between input and output in RVFL [29]. This quick connection can effectively improve the classification performance of RVFL and the robustness of the model. SNN [30] was an RNN model proposed by Schmidt, Kraijveld, and Duin 30 years ago. The structure of SNN is consistent with that of ELM. However, in the SNN model, the output layer has a learnable output bias. These three RNN models are very classical and have achieved excellent classification performance since they were proposed. Their structures are provided in Figure 5.
As can be seen from the above figure, the structures of these three RNNs used in this paper are different. RNNs have only a three-layer structure; thus, the calculation method is almost identical. Suppose there is a dataset (t i , y i ) and the dataset contains i-th sample: y i = y i1 , …, y im T ∈ R m , i = 1, …, N, (4) where N is the number of different samples, the input dimension is represented by n, and m is the output dimension.
For ELM, we chave the following: M ELM(i) = ∑ j = 1 u l v j t i + K j , i = 1, …, N (5) where V j represents the weight from the input node to the j-th node in the hidden layer, K j represents the bias of the j-th node in the hidden layer, the sigmoid function is demonstrated by I, and u is the number of hidden nodes in the hidden layer.
For RVFL, this calculation is a step further: M RVFL(i) = concat(T, E) (6) where the input matrix is T = (t t ,…, t N ) T .
For SNN, we have the following.
For ELM and RVFL, the final output weights are calculated as follows: where r is the final output weight, the pseudo-inverse matrix of M net is M net + , and the ground-truth label of the dataset is Y = (y 1 ,…,y N ) T .
For SNN, there are biases (b) between the hidden layer and output layer: where M net + is the pseudo-inverse matrix of M net Although the RNN model is simple, bad weights and biases will seriously affect the classification performance. Therefore, in this paper, we combine the results of three RNN models to obtain the final classification model based on majority voting. Because the three RNN models used in this paper have some differences, it is more helpful to obtain diversified information in order to further improve the performance and robustness of the system.

Evaluation
The parasitized images are defined as the positive, and the uninfected images are defined as the negative. We evaluate the proposed ROENet by five-fold cross-validation. Five measures are selected, which are sensitivity (Se), accuracy (Ac), F1 score (F1), and specificity (Sp).  (11)

Experiment Settings
We set the max-epoch to 4 to prevent the overfitting problem. The learning rate is set as 10 -4 . The minibatch size is 128. The dataset used in this paper is small and the batch size is large; thus, the convergence is fast. The number of the hidden nodes (u) in the hidden layer is 400. Table 3 provides the hyper-parameters of the proposed ROENet.

The Performance of ROENet
The classification performance of ROENet is provided in Table 4. In this paper, we evaluate the proposed ROENet by five-fold cross-validation. The F1 score (F1), specificity (Sp), accuracy (Ac), and sensitivity (Se) are 95.69 ± 2.65%, 96.68 ± 3.81%, 95.73 ± 2.63%, and 94.79 ± 3.71%, respectively. The average values of all results are greater than 94%. These results prove that the model in this paper is a good choice to classify malaria parasite on the blood smear.

Comparison of Different Backbones
We test different backbones, which are AlexNet and ResNet-50. The classification performances of these different backbones are presented in Table 4. The different backbones comparison figure is shown in Figure 6. Because ResNet-18 can achieve the best classification results when it is used as the backbone model based on the experimental results, ResNet-18 is selected as the preferred architecture in this paper.
Our model achieves the best results in accuracy, specificity, and F1 score in comparison to the other two models. AlexNet contains too many parameters, which could cause degradation problems. ResNet-50 has more layers than ResNet-18. Therefore, ResNet-50 may be more likely to meet gradient vanishing problems. Therefore, our model obtains better results than other models.

Effects of Output Ensemble
In this paper, the results of ROENet are the ensemble outputs from three RNNs. To verify the superiority of the proposed ROENet, the proposed model is compared with three individual models. The classification performances of three individual models are provided in Table 4. For a clearer comparison, the comparison figure is presented in Figure 7. Compared with the other three individual networks, our proposed network achieves the best results in accuracy, sensitivity, and F1 score. Although it is not the best in specificity, it is only 0.05% lower than the best.

Comparison with the Fine-Tuned ResNet-18
We compare the ROENet with the fine-tuned ResNet-18. The results of the fine-tuned model are presented in Table 4. The comparison of the proposed model with the fine-tuned model is provided in Figure 8. Our proposed model achieves better results than the fine-tuned ResNet-18. This proves that our model is an effective tool to classify malaria parasite on the blood smear.
The parameters and layers of RNNs are less than those of ResNet-18. Because the dataset in this paper is not very large, RNN is unlikely to have the problem of overfitting. Therefore, the proposed ROENet overperforms the fine-tuned ResNet-18.

Comparison with Other State-of-the-Art Methods
ROENet is compared with other state-of-the-art (SOTA) methods, which are Deep-MCNN [1], Customized CNN [2], DCGAN [3], Computer-Automated CNN [4], and three-layer CNN [9], respectively. DCGAN, Computer-Automated CNN, and three-layer CNN used the same dataset as this paper. Other SOTA methods used different datasets. The comparison is provided in Table 5. The comparison figure is presented in Figure 9. From the table and figure, we can see that our model obtains the best results in comparison with other SOTA methods.
There are three reasons why our model can achieve better results than other SOTA methods. (i) ResNet-18 is the backbone of our model, which can accurately extract features. (ii) We use RNN as the classifier, which can avoid overfitting problems. (iii) The results of ROENet are the ensemble outputs from three RNNs, which can improve the classification performance.

Conclusions
This paper proposes a new method (ROENet) to classify malaria parasite on the blood smear automatically. The backbone of the ROENet is the pre-trained ResNet-18. The output nodes of the pretrained ResNet-18 are 1000. However, the output nodes are two in this paper. Therefore, we perform some modifications to the backbone. We use randomized neural networks (RNNs) as the classifier in our proposed model, because the structure of RNN is simpler than ResNet-18. Three RNNs are used in ROENet, which are random vector functional link (RVFL), extreme learning machine (ELM), and Schmidt neural network (SNN). To improve the performance of ROENet, the results of ROENet are the ensemble outputs from three RNNs. We evaluate the proposed ROENet by fivefold cross-validation. The specificity (Sp), F1 score (F1), sensitivity (Se), and accuracy (Ac) are 96.68 ± 3.81%, 95.69 ± 2.65%, 94.79 ± 3.71%, and 95.73 ± 2.63%, respectively. The proposed ROENet is compared with other SOTA methods and provides the best classification performance among these methods, which proves that our model is an effective tool to classify malaria parasite on the blood smear. Even though our model obtains excellent classification performance, there are still some limitations. (i) The dataset is still small. (ii) We only tested on one public dataset.
In future work, we will collect more datasets to test our model and continue to improve the performance and robustness of our model to better classify malaria parasite on the blood smear. Furthermore, we will try other latest methods to classify malaria parasite on the blood smear, such as VIT.  Table 2 The pseudocode of ROENet.  Table 3 The hyper-parameters of the ROENet.  Table 4 The classification performance.  Table 5 The comparison with other SOTA methods.