Impact of Misclassification Rates on Compression Efficiency of Red Blood Cell Images of Malaria Infection Using Deep Learning

Malaria is a severe public health problem worldwide, with some developing countries being most affected. Reliable remote diagnosis of malaria infection will benefit from efficient compression of high-resolution microscopic images. This paper addresses a lossless compression of malaria-infected red blood cell images using deep learning. Specifically, we investigate a practical approach where images are first classified before being compressed using stacked autoencoders. We provide probabilistic analysis on the impact of misclassification rates on compression performance in terms of the information-theoretic measure of entropy. We then use malaria infection image datasets to evaluate the relations between misclassification rates and actually obtainable compressed bit rates using Golomb–Rice codes. Simulation results show that the joint pattern classification/compression method provides more efficient compression than several mainstream lossless compression techniques, such as JPEG2000, JPEG-LS, CALIC, and WebP, by exploiting common features extracted by deep learning on large datasets. This study provides new insight into the interplay between classification accuracy and compression bitrates. The proposed compression method can find useful telemedicine applications where efficient storage and rapid transfer of large image datasets is desirable.


Introduction
Malaria occurs in nearly 100 countries worldwide, imposing a huge toll on human health and heavy socioeconomic burdens on developing countries [1]. The agents of malaria are mosquito-transmitted Plasmodium parasites. Microscopy is the gold standard for diagnosis; however, manual blood smear evaluation depends on time-consuming, error-prone, and repetitive processes requiring skilled personnel [2]. Ongoing research has therefore focused on computer-assisted Plasmodium characterization and classification from digitized blood smear images [3][4][5][6][7]. Traditional algorithms labeled images using manually designed feature extraction, with drawbacks in both time-to-solution and accuracy [4]. Newly proposed methods aim to apply automated learning to large-size wholeslide images. Leveraging high-performance computing, deep machine learning algorithms could potentially drive true artificial intelligence in malaria research. Concurrently, the convergence of mobile computing, the Internet, and biomedical instrumentation now allows the worldwide transfer of biomedical images for telemedicine applications. Consultation or screening by specialists located in geographically different locations is now possible. low computational complexity. Wavelet-based features were used in [30] for classification with Support Vector Machine (SVM), where wavelet transform and run length coding were used for compression. Neither of these two papers mentioned the interaction between classification flow and compression flow. Furthermore, several papers [31][32][33] address classification of hyperspectral images (HSI) or multispectral image (MSI) in order to improve the compression performance. Several classification trees were constructed in [31] to study the relationship between compression rate and classification accuracy for lossy compression on HSI. The results showed that high compression rates could be achieved without degrading classification accuracy too much. HSI were also used in [32], where several lossy compression methods were compared on how they would impact classification using pixel-based support vector machine (SVM). Compression of MSI was achieved in [33] by segmentation of image into regions of homogeneous land covers. The classification was conducted via tree-structured vector quantization, and residues were coded using transform coding techniques. The method proposed in [34] is similar to that in [32]. Pixel classification and sorting scheme in wavelet domain was used for image compression. Pixels were classified into several quantized contexts, so as to exploit the intra-band correlation in wavelet domain. Compression and classification of images were combined in [35]. The compressed image incorporated implicit classification information, which can be used directly for low-level classification. Some other researchers [36][37][38] worked with vector quantizer based classifiers to improve compression performance. On the other hand, researchers use neural network [39][40][41][42] for joint classification/compression. A classifier based on wavelet and Fourier descriptor features was employed in [39] to promote lossless image compression. The neural network in [40] was accelerated by compressing image data with an algorithm based on the discrete cosine transform. Singular Value Decomposition (SVD) was used in [41] as compression method that can reduce the size of fingerprint images, while improving the classification accuracy. Two unsupervised data reduction techniques, Autoencoder and self-organizing maps, were compared in [42] to identify malaria from blood smear images.
To the best of our knowledge, there is no in-depth study on the interplay between misclassification rate and compression ratio for lossless image compression methods, in particular, for compression methods based on deep-learning based pattern classification. In this work, to achieve efficient compression of red blood cell images, we use autoencoders to learn the correlations of the image pixels, as well as the correlations among similar images. We train separate autoencoders for images belonging to different classes. Autoencoders can automatically generate hierarchical feature vectors, which reflect common features shared by the images from the same class. We can then recover the original images from the feature vectors. By coding the residues, we can achieve lossless compression on the images. We study how misclassification rate affects the overall compression efficiency.

Construction of the Dataset of Malaria-Infected Red Blood Cell Images
As the result of collaborative research with a group of pathologists from the Medical School of the University of Alabama at Birmingham, we built a dataset of red blood cell (RBC) images extracted from a wholeslide image (WSI) with 100× magnification [43]. The images belong to either one of the following two classes: malaria infected cells and normal cells. Figure 1 shows the glass slide of thin blood smear and the scanned WSI under its highest resolution. The WSI was divided into more than 80,000 image tiles, each with 284 × 284 pixels. Image morphological transforms were applied onto each tile to separate cell samples from the background, as shown in Figure 1 [44]. Some overlapped cells can be separated using Hough circle transform [45]. Finally, all samples were resized into 50 × 50 images, with some examples shown in Figure 2. The entire dataset can be found on our website [46]. For simplicity, we only used red channel for training neural network. The rectangle delineated in green was cropped out to be the image on the right. After zooming in the area with 100× magnification, we can see the normal cells and infected cells (with the parasites in the ring form) in the leftmost image in the second row. The remaining five grayscale images are the result of step-by-step processing of the leftmost image in the second row. First, the color image is converted into a grayscale image. Then a thresholding operation removes irrelevant info and converts the image into a binary image. The next two steps fills the isolated pixels in both foreground and background. After filling all the holes, we finally got the binary mask. Applying the mask onto the color image, we can extract each single cell image as shown in Figure 2.

Lossless Compression Using Autoencoders
An autoencoder is an artificial neural network that performs unsupervised learning [47], which consists of an encoder and a decoder. The encoder converts the high dimensional input data into a low dimensional feature vector. By reversing this process, the decoder attemps to recover the original data, typically with loss. Back propagation is used when traing the autoencoder to minimize the loss. A more complicated network can be built by stacking several autoencoders together, which will generate a more hierarchical representations of the input data. A fine-tuned autoencoder is able to perform data dimensionality reduction, while extracting features shared by the input data. Thus autoencoders can be used for lossless compression, if the differences between the input data and the reconstructed version are retained and coded efficiently. The flow chart of using stacked autoencoders (SAE) on malaria-infected RBC images is shown in Figure 3. Two separate stacked autoencoders (SAE) were assigned to images belonging to normal and infected cell classes, respectively, each with 400 samples. Since cell images in the same class share more common features, higher compression efficiency can be acquired than using one SAE for all samples. Each SAE consists of an encoder and a decoder. A cell image of 50 × 50 was reshaped into a vector of 2500 points, and then fed into encoder. The encoder consists of four layers: The input layer takes in 2500-point vectors, which are reduced by the remaining encoder layers to 1500, 500 and 30 points respectively. Therefore, the stacked autoencoder reduces the input vector into a very low-dimension vector of only 30 entries. Then the decoder attempts to reconstruct the original image from the 30-point vector. The training of the entire autoencoder takes many iterations in order to reduce the difference between the reconstructed image and the original image to a very small value. The resulting residues, along with the 30-point vector are coded to ensure the compression is lossless. Specifically, the residues are compressed efficiently using the Golomb-Rice Code [48].
Unlike most conventional lossless image compression methods such as JPEG2000 [49], which exploits correlations within a single images to be compressed, the autoencoder based method is able to extract common features among a group of similar images. This will allow for potentially more efficient compression on these similarly looking images in a dataset.

Golomb-Rice Coding
If the autoencoder is well trained on the input dataset, the differences (residues) between the reconstructed images and original images tend to center around zero. If the residues are converted to non-negative integers using the following equation: then the resulting non-negative values n can be approximated by the geometrical distribution with the following probability mass function parameterized by p: where p is a real number within the range of (0, 1). Golomb-Rice codes are optimal to compress the geometrically distributed source with p m = 1 2 , where m is a coding parameter. The entropy H(p), and expected value E[n] of n's are given below.
Using Equation (3), the parameter p can be estimated from the sample mean as follows: The Golomb-Rice coding procedure can be summarized by the following steps: 1. Each non-negative integer n to be coded is decomposed into two numbers, q and r, where n = mq + r, q is the quotient of (n/m), and r is the remainder. 2. Unary-coding q by generating q "1"s, followed by a "0". 3. Coding of r depends on if m is a power of two: • If m = 2 s , r can be simply represented using an s-bit binary code.
• If m is not power of two, the following thresholds should be calculated first: If m = 2 s , then s can be estimated from the sample mean of the input data as and the average codeword length (ACWL) of the Golomb-Rice codes is: where E[q] is the expected value of the quotients q.

Joint Classification and Compression Framework
Previously, we used autoencoders to exploit the correlations of similar images to achieve high compression on red blood cell images [26]. For this sake, two separate autoencoders were trained using images known in advance to belong to one of the two classes (either normal cells, or malaria infected cells). However, the compression performance suffers if the images fed to the autoencoders actually come from different classes, which is typically the case, where classifiers are not perfect. Therefore, in this work, we study a more realistic framework, as shown in Figure 4, where the input images are first classified before being compressed using autoencoders. So after classification, each class may have some samples that are incorrectly classified. In the following, we conduct an analysis on how the accuracy of the classifiers would affect the overall compression ratios.

Theoretical Analysis
We employ a binary channel model as illustrated in Figure 5 to characterize the four possible cases of cell image classification, with the meanings of the symbols explained in Table 1. Since there are only two possible classes of input images, we have the source probabilities summing up to unity: Similarly, the misclassification rates (P(C1|S0) and P(C0|S1)) are related to correct classification rates as: The source probabilities and the conditional probabilities can be estimated from the image datasets and the pattern classifiers used. We can then derive the joint probabilities of the four possible cases of image classification as listed in Table 1. For example, the joint probability of a cell being normal and correctly classified can be calculated as Figure 5. A binary state transition model for cell image classifications. The symbols "1" and "0" to the left represent input source images belonging to either one of two possible classes (infected and normal cells, respectively). The symbols "1" and "0" to the right represent the type of the images an input image is classified into. Arrows represent transitions, e.g., the transition from "1" to "1" means an infected cell is correctly classified. In contrast, the transition from "1" to "0" means an infected cell is incorrectly classified as a normal cell, where the misclassification rate can be described by the conditional probability P(C0|S1) for each class. See Table 1 for the meanings of other probabilities involved. Table 1. Meanings of the probabilities involved in the binary channel model.

P(S0)
Source probability of a normal cell image P(S1) Source probability of an infected cell image P(C0|S0) Conditional probability of a normal cell being correctly classified P(C1|S0) Cond. prob. of a normal cell being incorrectly classified as an infected cell P(C0|S1) Cond. prob. of an infected cell being incorrectly classified as a normal cell P(C1|S1) Cond. prob. of an infected cell being correctly classified P(S0, C0) Joint probability of a cell being normal and correctly classified P(S0, C1) Joint prob. of a cell being normal but incorrectly classified as an infected cell P(S1, C0) Joint prob. of a cell being infected but incorrectly classified as a normal cell P(S1, C1) Joint prob. of a cell being infected and correctly classified Following the joint image classification/compression framework in Figure 4, subsequent to image classification, we use stacked autoencoders to generate residues. As shown in Figure 6, corresponding to different cases of image classifications (S i , C j ), we can distinguish four distinct probabilistic distributions of residues R ij . where i, j = 0, 1. Figure 6. Image compression using stacked autoencoders (SAEs) after pattern classification. "SAE0" and "SAE1" stand for stacked autoencoders trained for normal and infected cells, respectively. R ij , where i, j = 0, 1, denotes the probability distributions of the residues to be entropy coded using Golomb-Rice codes.
Given that the input images are either for normal cells or infected cells, the following two conditional entropies, H0 and H1, can provide estimates of the compressed bitrates. Specifically, which is a function of the misclassification rate P(C1|S0). Similarly, which is also a function of the misclassification rate P(C0|S1). The overall bitrate (BR) in theory can be obtained as follows by probabilistically combining the individual bitrates for the four cases. The individual bitrates can be represented by the entropies of the residues H(R ij ) since lossless compression is used.
We can see that the overall bitrate can also be obtained by probabilistically combining the conditional entropies H0 and H1 in Equations (13) and (15) as follows: BR = H0 · P(S0) + H1 · P(S1), which shows that the overall bitrate is a function of the misclassification rates. In practice, the residue sources can be modeled by the geometric distributions with varying parameters p ij (corresponding to one of the four possible cases of image classifications (Si, C j )). That is, the probability mass functions of the residue sources are where n denotes the values of residues, and i, j = 0, 1. Therefore, we can use Equation (2) to replace H(R ij ) with the entropy of the geometric source: Furthermore, we can derive the following formula for estimating the average codeword lengths (ACWL in bits, which is the practically achievable bitrates) over all four cases when we employ Golomb-Rice codes to compress the residues.
where ACW L(R i,j ) denotes the average codeword length of Golomb-Rice coding the residue source R ij , which can be estimated by using Equation (7). We can see that the overall average codeword length is a function of the misclassification rates P(C1|S0) and P(C0|S1).

Results and Discussion
For the purpose of visualizing this relation revealed by the foregoing theoretic analysis, we simply assume that the cells are equally likely to be either normal or infected, i.e., P(S0) = P(S1) = 1 2 . Note here the theoretical results obtained in the previous section can handle other more general situations, e.g., the there will be more normal cells than infected cells, or the two misclassification rates are different. However, making the above simplifying assumptions can allow for 2D plotting of the relations between compression performance and a single misclassification rate.
We use two image datasets (with 400 images for each class) to estimate the compression performance. We first train two stacked autoencoders, one for normal cells and the other for infected cells. Then we vary the misclassification rates from 0.01 to 0.2 with a step size of 0.01. We then formulate the mixed images datasets according to the misclassification rates. For example, if the misclassification rate P(C1|S0) = P(C0|S1) = 0.1, then we will feed an image dataset consisting of 360 normal cells and 40 infected cells to the stacked autoencoders trained to compress normal cell images. Similarly, another image dataset consisting of 360 infected cells and 40 normal cells will be fed to the other stacked autoencoders trained to compress infected cell images.

Conditional Entropies Versus Misclassification Rates
We first use Equations (13) and (15) to obtain the empirical entropies of the residues (conditional upon whether the inputs are normal or infected cells) as an estimate of the compressed bitrates.
The results are plotted in Figure 7. We can see that the infected cells tend to be "easier" to compress than the normal cells. This can be attributed to the fact that infected cells share some common features, e.g., the existence of the ring form characteristic of parasite infection. While the autoencoders have been trained effectively capture the common features of the input images belonging to the same class, more and more "wrong" inputs from the other class due to misclassification lead to larger prediction residues, which translate to larger entropies, or lower compression. Thus for both classes of input images, we can see the apparent trend of lower and lower compression performance with an increasing misclassification rate, as expected.

Joint Entropy Versus Misclassification Rates
Here we still assume that the cells are equally likely to be either normal or infected, i.e., P(S0) = P(S1) = 1 2 , but allow the misclassification rates P(C1|S0), P(C0|S1) to change freely within the range. Based on Equation (16), we can plot a 3D surface as shown in Figure 8. We can see the general trend remains the same as the conditional entropies: when misclassification rates increase, the joint entropy (overall bitrates in theory) also increase.

Average Codeword Lengths Versus Misclassification Rates
We use Golomb-Rice codes to compress the residues and use Equation (21) to calculate the average codeword lengths (ACWL in bits, which is the practically achievable bitrates) over all four cases (as shown in Figure 6). Figure 9 shows the relation between the overall ACWL (bitrates) and the misclassification rates. Again, the curve clearly shows the general trend of increased bitrates (less compression) when the misclassification rate increases, which is what we expected. In the following, we compare the compression performance of deep learning based method with some popular lossless image compression methods.

Comparisons with Mainstream Lossless Compression Methods
We compare with four well known lossless image compression methods. A brief introduction to these methods is given below.
• JPEG2000 [49] is an image compression standard designed to improve the performance of JPEG compression standard, albeit at the cost of increased computational complexity. Instead of using DCT in JPEG, JPEG2000 uses discrete wavelet transform (DWT). • JPEG-LS is a lossless image compression standard. JPEG-LS improves the compression by using more context pixels (pixels already encoded) to predict the current pixel [50]. We use the codec based on the LOCO-I algorithm [51]. • CALIC (Context-based, adaptive, lossless image codec) uses a large number of contexts to condition a non-linear predictor, which makes it adaptive to varying source statistics [52]. • WebP [53] is an image format currently developed by Google. WebP is based on block prediction, and a variant of LZ77-Huffman coding is used for entropy coding.
The comparison results are shown in Figure 10. We can see that our method significantly outperforms other four conventional compression methods, which are not sensitive to the change of the misclassification rates. This is because these standard methods are designed to be as generic as possible, without taking advantage of the correlations among images belonging to the same classes, which can be captured by sufficiently trained autoencoders. Here we take into account practical scenarios where there will be mismatch between the input images and the autoencoders of the corresponding class. For example, the autoencoders pre-trained to compress infected cell images would suffer from degrading performance as more and more normal cell images (due to increasing misclassification rates) are mixed with the infected cells as the input. However, even at a very low misclassification rate of 20% (which a reasonably good pattern classifier can easily do better in terms of accuracy), the curve Figure 10 shows the deep learning based method still has better performance than the four other methods. The result highlights the advantage of our data-specific approach of "train once and then compress many times", where deep learning seems to be very effective in extracting common features within the dataset, thereby providing more efficient data compression. Nonetheless, in practical implementations of an end-to-end compression/decompression system, the parameters of the stack autoencoders already trained have to be provided as side information to the decoder to ensure lossless decompression. Fortunately, this one-time cost of bitrates for the side information can be amortized over a large number of images to be compressed in the dataset. The other side information is the 30-point vector for each image at the output of the autoencoder at the last stage. Again, the bits needed for coding the vector is a one-time cost for the entire image, representing an negligible increase in the average bitrates (in bits/pixel).
It should also be noted that this deep learning based approach has some limitations. First, the approach is more suitable for achieving good compression on average over an entire dataset, where images can be grouped into different classes by a reasonably well trained classifier. The images within the same class share some common features, which can be exploited to achieve higher compression than would be possible by considering only individual image statistics. Therefore, this joint classification/compression approach is not intended for compression of individual images, for which mainstream lossless compression methods are more suitable, since they optimize their performance based on individual image statistics. Second, training stacked autoencoders on large dataset tend to be expensive computationally. Therefore, the high computational cost will only justify the "train once and then compress many times" approach applied on the entire dataset. Finally, the autoencoder parameters (e.g., the weights and biases of each layer) have to be made available to the decoder as a side information. Therefore, the advantage of the deep learning based method would be more pronounced for large datasets, where the impact of the side information overhead on the overall bitrates will become less noticeable for the entire dataset.
In the literature, existing work on deep learning for image compression is fairly sparse, mostly with the goal of achieving low bit rates and higher visual quality for lossy compression. For example, Toderici et al. proposed a general framework for variable-rate image compression based on convolutional and deconvolutional long short-term memory (LSTM) recurrent networks [54]. They reported better visual quality than JPEG2000 and WebP on 32 × 32 thumbnail images. Their follow-up work in [55] proposed a hybrid of Gated Recurrent Unit (GRU) and ResNet as a full-resolution lossy image compression methods. Jiang et al. [56] proposed an end-to-end lossy compression framework consisting of two convolutional neural networks (CNNs) for image compaction, albeit still requiring the main compression engine to be a standard compression method such as JPEG. Li et al. proposed a CNN-based content-weighted lossy compression method, which outperforms traditional methods on low bit rate images [57]. Generative Adversarial Networks (GANs) were used in [58] for lossy image compression, achieving good reconstructed image quality at very low bit rates (e.g., below 0.1 bit per pixel). In contrast, this work focuses on lossless compression. Our results shows that autoencoders are capable of capturing inter-image correlations in a large datasets, which are beneficial to efficient lossless compression of the entire dataset. It would be a good research direction to study how to integrate autoencoders with other deep learning architectures such as CNNs and GANs to exploit also local image statistics, as well as recurrent neural networks (RNNs) and LSTM networks to take advantage of pixel dependence within an image.

Conclusions
In this paper, we study how the performance of lossless compression on red blood cell images is affected by an imperfect classifier in a realistic setting where images are first classified prior to being compressed using deep learning methods based on stacked autoencoders. We provide an in-depth analysis on the impact of misclassification rates on the overall image compression performance and derive formulas for both empirical entropy and average codeword lengths based on Golomb-Rice codes for residues. These formulas provide new insight into how the overall compression efficiency are affected by different source probability and misclassification rates. We also use malaria infection image datasets to evaluate the relations between misclassification rates and actually obtainable compressed bit rates. The results show the advantage of our data driven approach of "train the neural network once and then compress the data many times", where deep learning seems to be very effective in extracting common features within the dataset, thereby providing more efficient data compression than conventional methods, even at elevated misclassification rates. This special feature will be useful when only some important parts (regions of interest) of a large high-resolution (e.g., a wholeslide image) are required for lossless compression, while the rest (e.g., the background) only need lossy compression, or can simply be discarded. In the case of computer assisted malaria diagnosis, pathologists are mainly interested in red blood cell images. So we can classify the infected and normal cells, which can lead to more efficient compression of an entire image datasets. Thus, the proposed compression method can find useful applications in telemedicine where efficient storage and rapid transfer of large image datasets is sought after. As future work, we aim to study the compression performance and computational efficiencies of an end-to-end classification/compression system, taking into account the overhead associated with the descriptions of the neural network structure and feature vectors. Funding: The first and the second author received no external funding for this research. The support for the third author might be better described by the Acknowledgments section, with the statement in its entirety provided by the funding agency.
Acknowledgments: Dongsheng Wu's research has been supported in part by Mission Support and Test Services, LLC, with the U.S. Department of Energy, National Nuclear Security Administration, NA-10 Office of Defense Programs, and the Site-Directed Research and Development Program. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published content of this manuscript, or allow others to do so, for United States Government purposes. The U.S. Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-publicaccess-plan). The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Conflicts of Interest:
The authors declare no conflict of interest.