Machine Learning Based Lens-Free Shadow Imaging Technique for Field-Portable Cytometry

The lens-free shadow imaging technique (LSIT) is a well-established technique for the characterization of microparticles and biological cells. Due to its simplicity and cost-effectiveness, various low-cost solutions have been developed, such as automatic analysis of complete blood count (CBC), cell viability, 2D cell morphology, 3D cell tomography, etc. The developed auto characterization algorithm so far for this custom-developed LSIT cytometer was based on the handcrafted features of the cell diffraction patterns from the LSIT cytometer, that were determined from our empirical findings on thousands of samples of individual cell types, which limit the system in terms of induction of a new cell type for auto classification or characterization. Further, its performance suffers from poor image (cell diffraction pattern) signatures due to their small signal or background noise. In this work, we address these issues by leveraging the artificial intelligence-powered auto signal enhancing scheme such as denoising autoencoder and adaptive cell characterization technique based on the transfer of learning in deep neural networks. The performance of our proposed method shows an increase in accuracy >98% along with the signal enhancement of >5 dB for most of the cell types, such as red blood cell (RBC) and white blood cell (WBC). Furthermore, the model is adaptive to learn new type of samples within a few learning iterations and able to successfully classify the newly introduced sample along with the existing other sample types.


Introduction
The lens-free shadow imaging technique (LSIT) is a well-established technique for the characterization of microparticles and biological cells [1]. This technique is widely popular for its simple imaging structure and cost-effectiveness. It comprises a lens-less detector, such as a complementary metal-oxide semiconductor (CMOS) image sensor, a semi-coherent light source, such as light-emitting diode (LED), and a disposable cell chip (C-Chip). The absence of a lens or other optical arrangements allows it to fit into a very small space, thereby reducing the size of the overall system (as described in Figure 1a in the LSIT platform (Cellytics) built within a dimension of 100 × 120 × 80 mm 3 ). Since this arrangement consists of a few components, most of which are easily available at a low price, it therefore reduces the overall cost of the system [2]. This simple and cost-effective Recent advancements in machine learning, especially deep learning, have facilitated many applications concerning medical diagnostics [6][7][8][9][10][11][12], and have been widely adopted in the field of microscopy [13][14][15]. In particular, deep learning has been incorporated with the LSIT [14], where it is has been used to enhance the resolution of the LSIT micrographs [16] and enabled polarization-based holographic microscopy [17].
In our previous work, we have successfully developed the LSIT imaging system for the complete blood count using an analytical model based on handcrafted features [3] that can automatically segment out the individual cells from a whole frame LSIT micrograph and subsequently analyze them based on the handcrafted parameters. However, the performance of the system is dependent on the uniform illumination as well as the strong signatures of the microparticle samples. Since the diffraction signature of a microparticle depends on the size as well as the signal-to-noise ratio of the particle, therefore any background noise can affect the overall performance of the auto characterization system. Further, the handcrafted approach of finding the features for every additional cell line is timeconsuming and prone to subjective errors. To address these limitations, in this work, we have developed an artificial intelligence (AI) powered signal enhancement scheme for the LSIT micrographs that can enhance the signal quality (signal to noise ratio (SNR)) for various cell lines in a heterogeneous cell sample. For this, we employed the autoencoderbased denoising scheme [18]. Further, we have developed an auto characterization method based on a convolutional neural network [19,20] (CNN) architecture to classify Recent advancements in machine learning, especially deep learning, have facilitated many applications concerning medical diagnostics [6][7][8][9][10][11][12], and have been widely adopted in the field of microscopy [13][14][15]. In particular, deep learning has been incorporated with the LSIT [14], where it is has been used to enhance the resolution of the LSIT micrographs [16] and enabled polarization-based holographic microscopy [17].
In our previous work, we have successfully developed the LSIT imaging system for the complete blood count using an analytical model based on handcrafted features [3] that can automatically segment out the individual cells from a whole frame LSIT micrograph and subsequently analyze them based on the handcrafted parameters. However, the performance of the system is dependent on the uniform illumination as well as the strong signatures of the microparticle samples. Since the diffraction signature of a microparticle depends on the size as well as the signal-to-noise ratio of the particle, therefore any background noise can affect the overall performance of the auto characterization system. Further, the handcrafted approach of finding the features for every additional cell line is time-consuming and prone to subjective errors. To address these limitations, in this work, we have developed an artificial intelligence (AI) powered signal enhancement scheme for the LSIT micrographs that can enhance the signal quality (signal to noise ratio (SNR)) for various cell lines in a heterogeneous cell sample. For this, we employed the autoencoderbased denoising scheme [18]. Further, we have developed an auto characterization method based on a convolutional neural network [19,20] (CNN) architecture to classify the various cell lines from the LSIT micrograph. Here, we have first introduced the transfer of learning scheme in a neural network, which can leverage the feasibility to introduce new cell types to the algorithm and thus learn their characteristics within a few iterations. Thus, the LSIT platform saves time and computation resources required to learn to classify the additional cell types along with the existing ones.
In this article, we have described the detailed methods adopted for the designing as well as optimization of various parameters to design a suitable model with better accuracy. These optimized models are simple and light-weight, and require a smaller number of samples for effectively learning the cell signatures. The details are as given in the following sections.

LSIT Imaging Setup
The schematic of our proposed setup (Figure 1a) is as shown in Figure 1. When light from the coherent or semi-coherent source passes through a micro-object, it produces characteristic diffraction, i.e., shadow, pattern of the object [21,22] as shown in Figure 1b. These diffraction patterns are prominent just beneath the sample, typically a few hundred micrometers away from the sample plane, from where they are captured using a highdensity image sensor such as CCD or CMOS [21] (Figure 1c). As these signatures are significant enough to be captured by the bared image sensor, it does not require any kind of lens arrangement [23]. In our proposed setup, we used a pinhole conjugated semicoherent LED light source with a peak wavelength of 470 ± 5 nm (HT-P318FCHU-ZZZZ, Harvatek, Hsinchu, Taiwan). The diffraction patterns were captured using a 5-megapixel CMOS image sensor (EO-5012M, Edmund Optics, Barrington, NJ, USA), and a customdeveloped C-Chip (Infino, Seoul, Korea) was used to hold the cell samples [2,5]. All of these components can fit in a compact dimension of 100 mm × 120 mm × 80 mm. Due to the absence of a lens-based setup, the field-of-view of this system is about 20 times that of a conventional optical microscope at 100×. This high-throughput nature provides an extra advantage to characterize several thousand cells within a single digital frame.

Preparation of Various Cell Lines
In this work we used various cell lines, starting from red blood cell (RBC), white blood cell (WBC), cancer cell lines HepG2 (human liver cell-line) and MCF7 (human breast cancer cell-line), and polystyrene microbeads of 10 µm and 20 µm. The preparations of these cell lines are as follows [2,3]. The use of human whole blood in the experiment was approved by the Institutional Review Board (Approval No. # 2021AN0040 of Korea University Anam Hospital (Seoul, Korea).
RBC: The RBC samples were prepared from the whole blood samples that were collected from the Korea University Anam Hospital under IRB approval. The samples were diluted about 16,000 times by using RPMI solution (Thermo Scientific, Waltham, MA, USA) [2,3].
WBC: First, Ficoll solution (Ficoll-Paque™ Plus, GE Healthcare, Chicago, IL, USA) was used to isolate mononuclear cells from the whole blood. The samples of peripheral blood mononuclear cells (PBMCs) obtained using the Ficoll solution are mixtures of lymphocytes and monocytes. To separate these two cell types, the MACS (Magnetic-activated cell sorting) device and antibodies (Miltenyi Biotec, Bergisch Gladbach, Germany) were utilized. The helper-T cells in the lymphocytes were separated using the CD4 antibody (#130-090-877), and the cytotoxic-T cells with the CD8 antibody (# 130-090-878). Finally, 10 µL of this solution was then loaded into the unruled C-Chip cell counting chamber [2,3].
HepG2: The HepG2 cell lines were prepared from the American Type Culture Collection (ATCC HB-8065) and incubated in a high-glucose medium (DMEM, Merck, Darmstadt, Germany) with 10% heat-inactivated fetal bovine serum, 0.1% gentamycin, and a 1 penicillin/streptomycin solution under 95% relative humidity and 5% CO 2 at 370 • C. The developed cells were then trypsinized and separated from 24 well pate and incubated from 2-5 min at 370 • C. These cells were then diluted with DMEM solution [2,3].

MCF7:
The MCF7 cell samples were prepared from the American Type Culture Collection (ATCC HTB-22). The cells were preserved in a solution of DMEM containing 1% penicillin/streptomycin solution, 0.1% gentamycin, and 10% calf serum at 95% relative humidity and 5% CO 2 at 370 • C. These cells were then trypsinized and separated from the 24 well pate. These separated cells were then incubated for 2-5 min at 370 • C. The cells were then washed with DMEM solution. 10 µL of this solution was then loaded in the C-Chip [2,3].

Dataset Creation
A whole frame LSIT image (of cell diffraction patterns) contains an average of 500 diffraction patterns of microparticles. Deep learning-based architectures utilize the features of each class, and typically require a minimum of a few hundred diffraction patterns of each cell type for optimal learning. Therefore, we cropped individual diffraction patterns of each cell type (that were verified using a traditional microscope) with a window of 66 × 66 pixels as shown in Figure 1d. This window size included the complete sample signature along with a minimal background that would provide complete cell-line information during the auto-feature selection process in learning algorithms. We further augment this base sample set by rotating the individual diffraction patterns with an increasing angle of 10 degrees clockwise. Finally, a dataset of 1980 samples for each of the six cell lines and microparticles was created, totaling 11,880 samples for all of the classes under study. The typical architecture of a CNN is illustrated in Figure 1e. As many learning algorithms are black-box models, it is difficult to ascertain the optimal cell-signal size that covers the majority of the information and minimal background. Naturally, a smaller cell-size would need lesser computation and have lesser noise. Hence, we created the dataset for 60 × 60, 56 × 56, 50 × 50, 46 × 46, 40 × 40, and 36 × 36 cell sizes as input sets, with each set further divided into training and test folds. As the data augmentation used sample rotation, the splitting of the dataset into the train, validation, and test folds needs to be carried out while keeping a check on data leakage. Augmented samples distributed across the train and test sets may bias the model and may give a wrong estimate of its performance as the test data may not be of entirely "unseen" samples. Accounting for this, the 1980 samples of each class were carefully split into 1490 training samples, 166 validation samples, and 324 testing samples.
Though the cell-lines may seem visually similar, there are significant differences in the statistical distributions of the pixel illumination intensity in the cell diffraction pattern as revealed in our exploratory data analysis. The 2D contour plots (in Figure 2) show the observed variances. Hence, it is possible for intelligent algorithms to automatically identify and utilize the descriptive features in signal enhancement as well as classification.

Denoising Modality
For denoising of the LSIT micrographs, we adopted the concept of autoencoder [24]. An autoencoder is an unsupervised scheme that scuffles to recreate the input at its output. It consists of an input layer (x), an output layer (r), and a hidden layer (h). The hidden layer h termed as a code layer stands for the input in a revised dimension. The whole network structure can be labelled into two parts. The first part is an encoder, which tries to code the input as h = f(x), and the second part is a decoder which tries to recreate the input from the reduced code layer as r = g(h), where r is the recreated assortment of input x. Basically, it tries to attain r = g(f(x)). However, this is not a linear transformation since the model is enforced to learn the significant features of the input to encode it into the code layer.
In this work, we specifically used the denoising version of the autoencoder. Traditionally, the autoencoders try to reduce the loss as L(x, g(f(x))). However, the denoising autoencoder attempts to reduce the cost as L(x, g(f(x'))) where x' is the noisy form of the input x. We tried two different methods to design the denoising architectures, namely, extreme learning machine (ELM) and convolutional neural network (CNN).

ELM
This is a single hidden layer fully connected architecture [25]. In this method, the input weights are initiated randomly and kept intact. Only the output weights take part in the learning process through a straightforward learning method [25][26][27]. For N arbitrary input samples x i ∈ R n and their counterpart targets t i ∈ R m , the ELM achieves this mapping using the following relation as shown in Equation (1).
Here, H is the hidden layer output matrix, β is the output weight matrix (i.e., between the hidden layer and the output layer) and T is the target matrix or matrix of desired output [26]. From Equation (1), we can obtain the β using Moore-Penrose pseudoinverse [25] as shown in Equation (2).
In the extended sequential learning form of ELM, the β can update sequentially. This provides an added advantage of updating the learning whenever a new type of sample is available, thus providing the flexibility of transfer of learning. The β update mechanism [28,29] is as shown in Equation (3). Here For n = 1, Here H 0 is the hidden layer output with the first sample or first batch of samples [30].

CNN
Convolutional neural networks (CNN) [6,19,20] are a type of neural network widely used in the analysis of spatial data such as image classification and object segmentation. In this network, two-dimensional kernels are used to extract the spatial features from the input patterns, using a convolution operation between the kernel and the input. The typical architecture of a CNN is as shown in Figure 1e. Here, the kernel is shared spatially by the input or by the feature map. The feature at the location (i, j) in the kth feature map of the lth layer can be evaluated as shown in Equation (6).
Here, W l K and b l k are the weights and the bias vector of the kth filter in the lth layer. Here the weight layer is shared spatially which reduces the complexity. X l i,j is the value of the input at location (i, j) of the lth layer. The nonlinearity in this network can be obtained by introducing the activation function, denoted here as g(.). The activated output can be represented as shown in Equation (7).
Additionally, there are pooling layers that introduce shift-invariance by reducing the resolution of the activated feature maps. Each pooling layer connects the feature map to the preceding convolutional layer. The expression for pooling is as shown in Equation (8).
Here P(.) is a pooling operation for the local neighborhood R ij around the location (i, j). In this work, we used CNN for both denoising as well as classification. The details of their architectures and their impacts are discussed in the Section 3.

Performance of Denoising Algorithms
For efficient and adaptive denoising, we analyzed various autoencoder schemes, starting with the fully-connected autoencoder. In our first iteration, we experimented with the fully connected network having three hidden layers with 512, 256, and 512 neurons, respectively. The input layer is the 1D vectorized array of the input cell diffraction pattern, e.g., of 66 × 66 pixels. The input to the model is the noisy version of the input cell diffraction pattern and the expected target output is the original cell diffraction pattern. The noisy cell diffraction patterns were created using a Gaussian distribution with variance ranging from 100 to 600 with zero mean (refer to the supplementary section for detail). Further, we experimented with an increased network size having five hidden layers with 256, 128, 64, 128, and 256 neurons, respectively. In all of these networks, rectified linear unit (ReLU) was used as the activation function while mean squared error (MSE) [31][32][33][34] was used to calculate the loss. The Adam optimizer [35,36] was found to deliver better convergence and hence used to perfect the weight and biases. The denoising performance was quantified in terms of the improvement in SNR, measured in dB, denoted here by SNR imp , as given by Equation (9) [37].
where SNR out = 10log 10 Here x i is the value of sampling point i in the original LSIT signal, x ι is the value of sampling point i in the noisy LSIT, andx ι is the value of sampling point i in the denoised version of the same cell diffraction pattern. N is the total number of sample points in that LSIT image (cell diffraction patterns).
The fully connected network for both the above configuration shows no significant improvement in SNR imp after reaching saturation at around −10.08 dB. For further improvement, we experimented with CNN architecture using various models with a different number of convolution layers and distinct kernel sizes. The configuration of the model which accomplished the best outcomes is 3 × 3, 3 × 3, 5 × 5, 5 × 5, 7 × 7, 7 × 7, 1 × 1 with 32 filters in each layer except the last layer. The last layer consists of a single pixel filter (1 × 1 filter) that is used to condense the output across all the 32 filters. Here, the input and output size are the same. Padding was used to maintain the original size after the output of each convolutional layer. The Adam optimizer was used to optimize the network to reduce the mean squared error loss. The CNN results show a better reconstruction as shown in Figure 2.
The CNN network has been optimized for various parameters. First, the optimization of the network for various design parameters, such as varying the convolution layers and the kernel sizes, was carried out. The results in Figure 3a show that the architecture with kernel sizes 3 × 3, 3 × 3, 5 × 5, 5 × 5, 7 × 7, 7 × 7, 1 × 1 has a better performance in terms of SNR imp . The performance of the optimized network for various noise parameters, as shown in Figure 3b, indicates the network performs better reconstruction with increasing noise variance in the image (cell diffraction pattern). An increase in the variance results in a noisier image (cell diffraction pattern), which warrants a detailed reconstruction to reverse it to the original form, and hence larger the value of SNR imp . Therefore, a higher improvement in SNR imp implies the network has learned the optimal representational features for the cell types which enables it to perform a better qualitative reconstruction. Figure 3c compares the reconstruction performance of the model on different sizes of the input image (cell diffraction pattern). Due to the black-box nature of deep learning methods, we had to create datasets with multiple cell-signature dimensions, such that the smallest size just covered the central signature of the cell and increased the window size till it covered a significant background portion as well. The models were evaluated across 7 of 15 varying cell sizes to determine the optimal signal to background ratio, the spatial extent up to which the models covered the features, and to study its effects on the model performance. This analysis is critical in understanding the model explainability and interpretability since having a size larger than the optimum increases the inclusion of background artifacts that affect denoising as well as overpower the cell signal while having a smaller one could exclude the important deterministic features of the cell signature. The convergence in the training phase of the network is as shown in Figure 3d. The results depict that the loss across the first epoch, with a high variation in the initial phase, gets smoother towards the end of the first iteration. The advantage of this system is that it generalizes well for all of the types of cell lines using the same model. The CNN network has been optimized for various parameters. First, the optimization of the network for various design parameters, such as varying the convolution layers and the kernel sizes, was carried out. The results in Figure 3a show that the architecture with kernel sizes 3 × 3, 3 × 3, 5 × 5, 5 × 5, 7 × 7, 7 × 7, 1 × 1 has a better performance in terms of SNRimp. The performance of the optimized network for various noise parameters, as shown in Figure 3b, indicates the network performs better reconstruction with increasing noise variance in the image (cell diffraction pattern). An increase in the variance results in a noisier image (cell diffraction pattern), which warrants a detailed reconstruction to reverse it to the original form, and hence larger the value of SNRimp. Therefore, a higher improvement in SNRimp implies the network has learned the optimal representational features for the cell types which enables it to perform a better qualitative reconstruction. Fig.  3c compares the reconstruction performance of the model on different sizes of the input image (cell diffraction pattern). Due to the black-box nature of deep learning methods, we had to create datasets with multiple cell-signature dimensions, such that the smallest size just covered the central signature of the cell and increased the window size till it covered a significant background portion as well. The models were evaluated across varying cell sizes to determine the optimal signal to background ratio, the spatial extent up to which the models covered the features, and to study its effects on the model performance. This analysis is critical in understanding the model explainability and interpretability since having a size larger than the optimum increases the inclusion of background artifacts that affect denoising as well as overpower the cell signal while having a smaller one could exclude the important deterministic features of the cell signature. The convergence in the training phase of the network is as shown in Figure 3d. The results depict that the loss across the first epoch, with a high variation in the initial phase, gets smoother towards the end of the first iteration. The advantage of this system is that it generalizes well for all of Further, we tried the ELM architecture which is well known for its fast convergence [25]. The results in Figure 3e-h show the performance of the ELM architecture with varying number of neurons in the hidden layer. As it can be concluded from Figure 3e, the model with 2000 neurons provides better performance in terms of SNR imp . Further, the optimized model has been used to test the performance across various noise levels as shown in Figure 3f. It is observed that the model maintains the SNR imp value on increasing the noise in the input image (cell diffraction pattern), i.e., the image (cell diffraction pattern) quality of the output relative to the input remains the same. The results of the model performance across different sizes of the input image (cell diffraction pattern), as shown in Figure 3g, indicate that the 40 × 40 is having a higher value of SNR. However, the variation is of 2 as compared to the variation for the size 50 × 50, which is of 1.5, representing the lowest compared to all of the other sizes. Since CNN shows a substantial performance with lower variance for the 50 × 50 input size, therefore we fixed it as optimal for all of the further studies and comparisons. Figure 3h shows the loss across the first epoch for ELM which is remarkably high initially but converges faster, as compared to CNN, after training with only a few thousand samples. This faster convergence may help save time and resources during incremental training phases for newer cell types. The performances of these optimized models have been compared with the traditional denoising methods as shown in

Performance of Classification Algorithm
Since the diffraction patterns of cells and microparticles in a LSIT micrograph depend upon their physical and optical properties, therefore, the diffraction patterns carry the unique signatures of each of the cell types as shown in the 2D contour plot in Figure 2. These unique signatures can be utilized for the classification of these cell types. Since our previous inference concludes that CNN works better for denoising, therefore we experimented with the same modality for the classification as well. In this work, in order to determine the optimal architecture of CNN for cell-line recognition, we first proceeded to find the optimal depth of the network by studying the classification performance of the model on increasing the depth, by adding convolutional and pooling layers, as well as by varying number of kernels and kernel size, till we reached performance saturation. We have experimented with and evaluated various shallow and deep CNN models to classify cell lines. The details of the model architecture are as described in Figure 4.  The Deep Model starts with a convolutional (Conv2D) layer having 512 kernels of size 3 × 3, followed by a max-pooling layer of the same kernel size. The output from this is further convoluted with 128 kernels of 3 × 3 size with a dropout rate of 0.5, and then a max pool with 2 × 2 kernel. This output goes to a Conv2D layer with 64 of 3 × 3 sized kernels and a dropout rate of 0.2. We further reduce the dimension using a 2 × 2 max pool kernel. The output of this layer further convolves with 32 of 3 × 3 kernels, then a The Deep Model starts with a convolutional (Conv2D) layer having 512 kernels of size 3 × 3, followed by a max-pooling layer of the same kernel size. The output from this is further convoluted with 128 kernels of 3 × 3 size with a dropout rate of 0.5, and then a max pool with 2 × 2 kernel. This output goes to a Conv2D layer with 64 of 3 × 3 sized kernels and a dropout rate of 0.2. We further reduce the dimension using a 2 × 2 max pool kernel. The output of this layer further convolves with 32 of 3 × 3 kernels, then a dropout of 0.2. The output dimension from this convoluted layer is further reduced by using the max pool with a 3 × 3 kernel. This again convolves with 16 of 3 × 3 kernels, and a max pool layer with 3 × 3 kernel. The output of which is again convoluted with 8 of 3 × 3 kernels, followed by a 3 × 3 max pool. This output is then vectorized and input to a fully connected (FC) layer having 256 nodes, and then to another FC with 128 nodes and having a dropout of 0.2. The final layer is a SoftMax function, with six output nodes. The model architectures used for studying the impact of the network depth and breadth on performance are well described in Figure 4. Once the approximate optimal depth and breadth had been determined, we proceeded to fine-tune the hyper-parameters such as the number of kernels, kernel size, and dropouts, to reach the best performance of the models across varying cell sizes (i.e., input dimension of cells). In all of the models, Adam [38] provided better convergence as compared to other optimizers and has been used as the model optimizer, with categorical cross-entropy [39,40]  From Figure 5a, it is inferred that Model 3 shows better classification performance, on the validation fold, of all of the models. The results depict that there is consistency in performance for the input sizes 40 × 40 to 66 × 66, with a very small variance in the accuracy. The performance of this optimized model is further evaluated over the test dataset containing 324 samples of each cell type. The per-class performance of this model is shown in the confusion matrix of Figure 5b. The results depict that the model can classify RBC, WBC, 10 µm, and 20 µm bead with over 99% accuracy. However, the comparatively poor performance of about 90% for the cancer cells, HepG2 and MCF7, can be attributed to the non-homogeneity in their signature characteristics as well as the lack of sufficient original samples which further complicates the issue. This is well depicted in our previous work [4] (see Figure 2 of the reference). From the receiver operating characteristic (ROC)  Figure 5b. The results depict that the model can classify RBC, WBC, 10 µm, and 20 µm bead with over 99% accuracy. However, the comparatively poor performance of about 90% for the cancer cells, HepG2 and MCF7, can be attributed to the non-homogeneity in their signature characteristics as well as the lack of sufficient original samples which further complicates the issue. This is well depicted in our previous work [4] (see Figure 2 of the reference). From the receiver operating characteristic (ROC) curve for all of the cell lines shown in Figure 5c, the area under the curve (AUC) for all of the cell lines is >0.99, except MCF7 (~0.95) and HepG2 (~0.96). From these results, it can be inferred that the classifier is working well, especially for RBC, WBC, 10 µm, and 20 µm beads. The visualization of the internal activation maps, as shown in the Supplementary Information, implies that the network is learning core descriptive features in the diffraction signatures rather than using some random features.
The performance evaluation of the proposed Al model with various matrices such as true positive (TP), true negative (TN), false positive (FP), false negative (FN), accuracy, recall, specificity, sensitivity, F1 score, positive predictive value (PPV) and negative predictive value (NPV) is summarized in Table 2. Additionally, we also investigate the transfer of learning to gauge the ability of the trained network to adapt to newer cell types ( Figure 6). For this scenario, the CNN was initially trained with all of the cell lines except RBC. From the epoch vs. accuracy graph in Figure 6a, the transfer training achieved higher accuracy with the same number of epochs compared to the initial training. This is also validated by the epoch vs. loss graph in Figure 6b. From these results, it can be inferred that the network can be effectively used to adapt to newer cell lines with very less amount of training. From the per-class test accuracy shown in Figure 6c, it is observed that the model misclassified all of the RBC samples as WBC. In the transfer of the learning phase, the initially trained network is frozen except for the last layer, which is modified to accommodate the newer class and kept trainable. The network is then re-trained with a mix of RBC samples. The per-class test accuracy of the re-trained model is shown in the confusion matrix of Figure 6d, where it is inferred that the re-trained network can classify RBC correctly with substantial accuracy.
The comparison between the proposed AI method and the manual method for counting various cell types from a heterogeneous sample is shown in Figure 7. This comparison shows the robustness of the model.
to adapt to newer cell lines with very less amount of training. From the per-class test accuracy shown in Figure 6c, it is observed that the model misclassified all of the RBC samples as WBC. In the transfer of the learning phase, the initially trained network is frozen except for the last layer, which is modified to accommodate the newer class and kept trainable. The network is then re-trained with a mix of RBC samples. The per-class test accuracy of the re-trained model is shown in the confusion matrix of Figure 6d, where it is inferred that the re-trained network can classify RBC correctly with substantial accuracy. The comparison between the proposed AI method and the manual method for counting various cell types from a heterogeneous sample is shown in Figure 7. This comparison shows the robustness of the model.

Conclusions
In conclusion, we have explored the advantages of using neural networks in the characterization of LSIT micrographs. Here, we have perfected neural networks that can automatically improve the signal quality and classify the cell types. We find that this neural network can classify the RBC and WBC with great accuracy (i.e., over 98%), and the cancer

Conclusions
In conclusion, we have explored the advantages of using neural networks in the characterization of LSIT micrographs. Here, we have perfected neural networks that can automatically improve the signal quality and classify the cell types. We find that this neural network can classify the RBC and WBC with great accuracy (i.e., over 98%), and the cancer cell with an accuracy of about 90%. This network is also flexible for adapting to newer cell lines by retraining the trained network with very few samples. Together with this algorithm, the lightweight and cost-effective LSIT setup can be utilized as a point of care system for the diagnosis of pathological disorders in the resource-limited setup of our world. In our future work, we aim to combine the denoising and classification modalities due to the significant overlap in their operation. This will remove the dual training times as well as minimize computation costs. Also, we aim to work on improving their performance and deploying it in real scenarios.