Classiﬁcation of Red Blood Cells Using Time-Distributed Convolutional Neural Networks from Simulated Videos

: The elasticity of red blood cells (RBCs) plays a vital role in their efﬁcient movement through blood vessels, facilitating the transportation of oxygen within the bloodstream. However, various diseases signiﬁcantly impact RBC elasticity, making it an important parameter for diagnosing and monitoring health conditions. In this study, we propose a novel approach to determine RBC elasticity by analyzing video recordings and using a convolutional neural network (CNN) for classiﬁcation. Due to the scarcity of available blood ﬂow recordings, computer simulations based on a numerical model are employed to generate a substantial amount of training data. The simulation model incorporates the representation of RBCs as elastic objects within a ﬂuid ﬂow, allowing for a detailed understanding of their behavior. We compare the performance of different CNN architectures, including ResNet and EfﬁcientNet, for video classiﬁcation of RBC elasticity. Our results demonstrate the potential of using CNNs and simulation-based data for the accurate classiﬁcation of RBC elasticity.


Introduction
Red blood cells (RBCs) play a crucial role in the transportation of oxygen within the blood. The health of RBCs is vital to their ability to efficiently flow through blood vessels, facilitated by their deformable membrane [1]. Elasticity is a key characteristic of healthy RBCs and is influenced by various natural factors, including the age of the cells. However, the impact of different diseases such as malaria [2], leukemia [3], diabetes [4], or sickle cell disease [5][6][7][8] significantly affects RBC elasticity.
Classifying RBCs can provide valuable insights and benefits in various scientific and medical contexts. Firstly, the accurate classification of RBCs enables the identification and characterization of different cellular types, such as normal RBCs, sickle cells, and other abnormal cell morphologies [9]. This classification can aid in the diagnosis and monitoring of various blood disorders, including sickle cell disease, thalassemia [10], and hereditary spherocytosis [11]. By distinguishing between different types of RBCs, healthcare professionals can better understand disease progression, assess the severity of conditions, and tailor treatment strategies accordingly. A more general overview of the techniques and importance of capturing cells of different elasticity can be found in [12,13].
Furthermore, RBC classification can contribute to the understanding of physiological and pathological processes within the human body. For instance, variations in RBC morphology may be indicative of certain underlying health conditions or physiological changes [14]. By analyzing and classifying RBCs, researchers can investigate the impact of factors such as nutrition, disease states, genetic variations, and environmental exposures on RBC characteristics. This knowledge can provide valuable insights into the mechanisms underlying various diseases and conditions, as well as facilitate the development of novel therapeutic approaches.

Materials and Methods
Video classification involves the assignment of one or more labels to a video based on its content. The utilization of NNs for video classification has gained popularity due to their ability to learn intricate patterns within the data. CNNs are particularly suitable for video classification tasks as they can effectively capture spatial and temporal features. CNNs consist of multiple layers of neurons that are trained to extract relevant features from images or video frames.
Typically, NNs used for video classification are trained on a large dataset of labeled videos. These videos are partitioned into training and validation sets to assess the performance of the model. To increase the diversity of the training data, various data augmentation techniques, such as flipping, rotating, and scaling, can be employed. Additionally, transfer learning can expedite the training process by utilizing pre-trained models that were previously trained on different datasets.
However, video classification poses challenges due to variations in lighting conditions, camera angles, and object appearances. Domain adaptation techniques can address these challenges by transferring knowledge from a source domain to a target domain.
The application of video classification using NNs spans various fields, including surveillance, entertainment, and healthcare. In healthcare, video classification can aid in the analysis of medical videos such as endoscopic videos, facilitating the detection of abnormalities and assisting in diagnosis. The evaluation of video classification performance using NNs can be measured using metrics such as accuracy, precision, and recall. The choice of evaluation metric depends on the specific application and the relative importance of different types of errors. Overall, video classification using NNs is an actively evolving field with numerous promising applications and associated challenges.

ResNet
Residual Network' (ResNet) [21] is a popular type of CNN architecture that was originally designed for image classification; see Figure 1. However, ResNet can also be used for video classification by adapting it to process multiple frames of a video sequence.
In a traditional CNN, each layer processes the output of the previous layer to extract increasingly complex features from the input image. However, as the network becomes deeper, it can become harder to train and can suffer from the vanishing gradient problem. This problem occurs when the gradients used to update the weights of the network become very small, making it difficult to learn from the data.
ResNet solves this problem by introducing residual connections between layers. These connections allow the network to "skip" over certain layers, allowing for information to pass through unchanged. This helps to prevent the gradients from becoming too small, allowing the network to learn more effectively.
To use ResNet for video classification, we can apply it to each frame of the video sequence and then combine the outputs from each frame to obtain a final classification. This can be achieved by either averaging the outputs or using an attention mechanism to focus on the most relevant frames.
One popular implementation of ResNet for video classification is the Two-Stream ResNet, which consists of two ResNet networks: one for processing spatial information (the appearance of the objects in the video) and one for processing temporal information (the motion of the objects in the video). The spatial network processes each frame of the video independently, while the temporal network processes pairs of frames to capture motion information. The outputs from both networks are then combined to obtain a final classification.
Overall, using ResNet for video classification can be an effective approach due to its ability to learn complex features and its use of residual connections to prevent the vanishing gradient problem.

EfficientNet
EfficientNet [23,24] is a family of CNNs that were specifically designed to be more efficient in terms of computation and parameter usage than existing CNNs. EfficientNet achieves this by using a novel scaling method that optimizes the network architecture (Table 1) based on the available computational resources.
EfficientNet can also be used for video classification by adapting it to process multiple frames of a video sequence. One way to do this is to use a 3D CNN architecture, which can capture both spatial and temporal information from the video frames.
To use EfficientNet for video classification, we can apply the 3D CNN to each frame of the video sequence and then combine the outputs from each frame to obtain a final classification. This can be achieved by either averaging the outputs or using an attention mechanism to focus on the most relevant frames.
One advantage of using EfficientNet for video classification is that it is highly efficient in terms of computation and parameter usage. This can be important for applications where resources are limited, such as on mobile devices or in real-time video analysis. EfficientNet can also achieve high accuracy on a variety of image classification tasks, which suggests that it may also be well-suited for video classification.
However, there are some challenges associated with using EfficientNet for video classification. One challenge is that video classification typically requires processing a large number of frames, which can be computationally intensive. Another challenge is that EfficientNet may not be as effective at capturing temporal information as other CNN architectures that were specifically designed for video classification, such as the Two-Stream CNN or the 3D ResNet.
Overall, using EfficientNet for video classification can be an effective approach due to its efficiency and high accuracy in image classification tasks. However, it may require additional optimization and tuning to achieve an optimal performance on video classification tasks.

Data Preparation
The source data used to classify the health of RBCs were obtained from multiple simulation experiments. These experiments were conducted using the open-source software ESPResSo [25], incorporating its lattice-Boltzmann and object-in-fluid modules [26].
Numerical simulations consisted of a model of an elastic object representing an RBC embedded in a flowing fluid and interactions between individual objects (cell-cell and cell-wall/obstacle). The model utilized the lattice-Boltzmann method to represent the fluid, a spring network model to simulate the cell membrane, and a dissipative version of the Immersed Boundary Method (IBM) to connect them. In addition to the fluid force, elastic forces are exerted on the cell mesh points that are evaluated from the deformation of the cell. The resultant force F tot is the driving force according to which the mesh points are propagated in space following Newton's equation where m is the mass of the mesh points. The sources of are the elasto-mechanical properties of the cell membrane, the fluid-cell interaction or possibly other external stimuli. All simulations were performed under consistent channel and fluid flow parameters. The channel had a cuboid shape with dimensions of 104 × 60 × 40 µm, and the fluid was discretized into a three-dimensional grid with a spatial step of 1 µm. The elasticity of RBCs becomes most apparent when they come into contact with other objects. Therefore, we designed a simulated channel topology where RBCs flow through a space with obstacles. The simulated channel featured five cylinders, acting as obstacles that restricted the area of blood flow and induced the manifestation of RBC elasticity (see Figure 2). This channel design aimed to replicate a realistic laboratory environment. The kinematic viscosity of the fluid was 1.3 × 10 −6 m 2 /s, and the density was 1.025 × 10 3 kg/m 3 . To initiate fluid flow, external forces were applied, with values chosen to achieve a maximum velocity of approximately 0.03 m/s. Cell-cell interactions were simulated using the membrane_collision potential, while interactions between cells and the channel walls were modeled using the soft_sphere potential. RBCs were represented by a surface network consisting of 374 nodes. The elastic properties of the cells were simulated using five types of elastic forces (shown in Table 2, each corresponding to a different elastic modulus. Table 2. Overview of the used simulation parameters.

Coefficient
Value In this study, we focused on four levels of RBC elasticity. Two levels represented healthy RBCs, with the most elastic RBCs having a stiffness coefficient (k s ) of 0.005, and the least elastic RBCs representing malaria-infected cells at stage 3 of the disease with a k s value of 0.03 (The value of k s = 0.03 was chosen based on the reduced elasticity observed in malaria-infected cells at stage 3, as determined by an optical tweezers stretching experiment [27]). The remaining two levels of RBC elasticity were evenly distributed between the healthy and malaria-infected RBCs, with k s values of 0.0133 and 0.0216, respectively.
The size of the training dataset plays a critical role in the training of machine learning (ML) models. When the dataset is small, the model may struggle to capture the intricate patterns and nuances present in the data, resulting in a poor generalization performance on unseen data. Conversely, a larger dataset provides the model with more examples to learn from, leading to better generalization and reduced overfitting where the model becomes too specialized to the training data. Therefore, it is crucial to have a sufficiently large training dataset to develop accurate and reliable ML models.
However, it is important to consider the computational resources required to train models on large datasets. Larger datasets demand more computational power and longer training times. Thus, striking a balance between dataset size and available computational resources is necessary for successful model training.
Each simulation in our study involved 36 RBCs, with nine cells representing each level of elasticity. The simulated channel has a periodic structure, meaning that when an RBC exits the channel, it reappears at the beginning. We remove the initial and final passes of RBCs due to their incompleteness. As a result, each RBC completes approximately 20-21 passes through the simulated channel ( Figure 3). Consequently, each part of the train/validation datasets contains approximately 4 types of RBC × 9 from each type × 21 passes = 756 samples. Given the small number of training examples, we have two options to address this limitation. First, we can utilize data augmentation techniques to augment our training set and increase its size. Data augmentation involves applying transformations to the existing examples to create new variations. In our case, we can perform vertical flips (to preserve the direction of blood flow) and rotations (while considering the preservation of blood flow direction). By applying these augmentations, we can generate additional training examples and enhance the diversity of the dataset.
The second option is to leverage pretrained models, which do not require training from scratch and hence need fewer training examples. Pretrained models are pre-trained on large-scale datasets and have learned general features. We chose two pretrained models, EfficientNet v2 B0 and ResNet50, which have demonstrated successful results in various image and video tasks [28][29][30][31]. These models only require a sufficient number of training examples to fine-tune the last few layers of the NN, making them well-suited to our small training dataset.
To prepare the dataset, we generated video recordings of individual RBCs from the simulation data. These videos serve the purpose of training and validation. Since the simulations performed in ESPResSo provide three-dimensional information about the flow of simulated RBCs, we projected these data onto a two-dimensional plane. We achieved this by generating a 2D video that captures the width and length of the channel while disregarding differences in depth. This transformation enabled us to effectively analyze and classify the RBC behavior in the videos using image and video classification techniques.

Results and Discussion
Our network used video samples in the shape of N × T × H × W × C, where N represents the batch size, T is the number of frames in the video, H is the height, W is the width, and C is the number of channels, which is 3 (red, green, blue). Then, we rescaled the video into black and white format, which reduced the number of channels from 3 to 1.
For the base of our models, we used EfficientNet v2 B0 and ResNet50, which are pretrained models. Since the two pretrained models we used are intended for image classification, we used a time-distributed layer, which enables the model to classify video recordings. The pretrained model is not trainable. This was followed by 3D Average Pooling. We finished our network with a sequence of Flatten, Dropout, and Dense layers (Figure 4), which led to an output with the number of neurons equal to the number of classes; in our case, four neurons. We used Dropout and regularization for the last dense layer to avoid overfitting. The network's optimizer is either Adam or SGD. We optimized the hyperparameters of our network using the Hyperband class from the keras_tuner module [32]. The optimized hyperparameters differ slightly based on the network's optimizer, and the differences can be seen in Table 3.
The result of the hyperparameter optimization for each type of network and each type of optimizer is shown in Table 4. The best accuracy is achieved with EfficientNet v2 B0 model using Adam optimizer for the hyperparameters shown in Table 5. The validation results for each type of model are poor while the training accuracies are high, which indicates two things. First, the networks are overfitting for each architecture and optimizer, despite the regularization method used to avoid this effect. Second, the presence of overfitting suggest there is some information in the data that can be learnt.  We observe from the confusion matrix for the validation set ( Figure 5) that the main problem of classification is in distinguisishing between RBCs with the reduced elasticity, while its ability to distinguish between the healthy and sick RBCs is more precise. Therefore, we decided to train a binary classification NN, where the first class contains the healthy RBCs (k s = 0.005) and the second class consists of all the cells with reduced elasticity. We ran the hyperparameter optimization for 2 types of NNs with 2 different optimizer options.
From the confusion matrix shown in Figure 6 we discovered that our hypothesis is not true. After changing the classification problem from 4 classes to 2 classes, the final accuracies increased, but not as significantly as the confusion matrix from Figure 5 suggests. We also tried a weighted classification for a combination of EfficientNet v2 B0 and the Adam optimizer where the final accuracy insignificantly increased to 61.74%, showing only a 0.02% increase.
There is an option to separate healthy and sick RBCs after they are classified into 4 classes, where class 0 is healthy and classes 1, 2 and 3 are sick. In this manner, we achieved a classification accuracy of 93.54%. We elaborate on this in the conclusion.

Adding Physical Information
In order to enhance the performance of the network, we added information about the underlying physics-specifically, about the velocity of the fluid flowing in the channel. Our hypothesis was that by adding the physical information, the NN would be able to learn to classify the RBCs better than without it.
First, using ESPReSso, we calculated the velocities of each point of the fluid (which is represented as a mesh; more information is available in [33]) in the empty channel-the channel with no RBCs in it. We obtained a 3D data of velocities, since the channel is a 3D object. Then, we took a layer H × W × D/2 that corresponds to the velocities in the middle of the channel along the depth axis. This information can be used in two ways: we can either use it as it is, meaning 3D data with dimensions W × H × 3, where 3 represents the x, y, and z components of a velocity vector, or we can create a heatmap where the colors represent the velocity of the flow.
In both options, we added the physical information by creating a new branch of our NN that has the physical information about the flow as input. This information is passed through a Rescale layer, three 2D convolution layers with a number of filters 32, 16, 8, followed by a max pooling layer, a flattening layer, and a dense layer with 1024 hidden units. Then, it is concatenated with the penultimate output of the main branch of our NN, which is passed to the last dense layer.
Generally, it is not effective to add the same information to each training example for a NN. The reason for this is that NNs learn patterns and relationships in the data through the variations and differences between the examples. When all examples contain the same information, the network is unable to distinguish between them and may not learn the relevant patterns that are necessary for accurate predictions. However, there may be some cases where adding the same information to each training example can be helpful. For example, if the added information provides some contextual or background information that is relevant to all examples, it may help the network learn more effectively. In general, it is important to carefully consider the information that is added to each training example and how it may affect the network's ability to learn and generalize. We trained both versions of PINNs using the optimized hyperparameters described in Table 5, with the best performing model being EfficientNet v2 B0 with the Adam optimizer. The final results are presented in Table 6. Table 6. The validation accuracies were evaluated for six distinct classes of models using optimized hyperparameters. The names of these neural network (NN) models consist of two parts. The first part represents the number of classes used during training, while the second part indicates the type of physical information employed for the physics-informed neural network (PINN). The abbreviations "4c", "2c", and "2cw" denote four classes, two classes, and two weighted classes, respectively. Additionally, "Heatmap" and "Values" signify the utilization of heatmap information and velocity vectors from the middle flow layer, respectively. The obtained results highlight several important observations regarding the performance of different classification approaches. Firstly, when comparing the two-class classification (whether weighted or unweighted) to the four-class classification converted to two-class classification, it is evident that the latter achieves significantly better results. This suggests that the additional information present in the four-class classification helps to improve the overall accuracy of the model. Furthermore, the inclusion of physics-related information in the form of a heatmap appears to be less effective than the use of velocity vectors as input. This implies that the velocity vectors carry more meaningful and discriminative information for accurate classification. The use of velocity vectors as input likely allows for the model to capture dynamic patterns and better understand the motion characteristics of the analyzed data.

NN Model
However, it is interesting to note that even though velocity vector information is incorporated, it does not lead to a substantial improvement in the final accuracy of the network. Specifically, the NN that solely relies on the provided input without any additional physical information achieves an accuracy of 93.56%. In contrast, the network utilizing velocity vectors achieves a slightly lower accuracy of 91.48%. This suggests that while the inclusion of velocity vectors may provide some useful information, it does not necessarily translate into a significant boost in classification performance.

Up-Scaling of the Healthy Examples
Our simulations include four types of RBCs, of which three types have reduced elasticity and are considered sick. Although the ratio of healthy to sick RBCs is 1:3, this does not correspond to the observed ratio in reality [34]. Therefore, we used data augmentation to upscale the minority class and balance the class sizes. As described in Section 3, we used horizontal flip and rotation to add two new training examples for each original example. This augmentation, together with the original healthy examples, makes the healthy class three times larger.
We train the best performing model from our previous experiments, a four-class classification NN with no additional information, which is then reduced to a two-class classification. We augmented the training dataset and the validation dataset to maintain consistency between the class distribution of the datasets. After training and validating the model on the dataset with the same distribution, we cross-validated the model on the dataset with a different class ratio. Table 7 provides a comparison of three models: the original model (O_4x1) and the best-performing model from previous experiments, along with two models trained on datasets with equal ratios of healthy and unhealthy RBCs. The first model uses unmodified class weights (A_3_1_1_1), resulting in equal weights for each class. The second model (AW_3_1_1_1), on the other hand, uses class weights proportional to the sizes of the classes. We observed that the second model outperforms the first model in terms of classification accuracy.
The results indicate that the original dataset achieved the highest accuracy on the original dataset for all three models. The O_4x1 model achieved an accuracy of 93.54%, while the A_3_1_1_1 and AW_3_1_1_1 models achieved lower accuracies of 88.88% and 81.55%, respectively.
When comparing the accuracies on the augmented dataset, it can be observed that the A_3_1_1_1 model performed slightly better, with an accuracy of 93.91%, compared to the 91.01% accuracy of the O_4x1 model. However, the AW_3_1_1_1 model achieved an accuracy of 92.21%, which was slightly lower than both the original dataset and the A_3_1_1_1 model.
Based on these findings, it can be concluded that augmenting the dataset through upsampling had a mixed impact on the model performance. While the A_3_1_1_1 model showed a slight improvement, the addition of class weights in the AW_3_1_1_1 model did not yield significant improvements and even resulted in a slightly decreased accuracy compared to the original dataset.

Four-Class to Two-Class Classification
The obtained results reveal important insights into the performance of various classification approaches. One notable observation is the comparison between the two-class classification and the four-class classification converted to two-class classification. The twoclass classification refers to the classification task where the model distinguishes between two specific classes (e.g., healthy and sick), while the four-class classification converted to two-class classification involves combining multiple classes into two broader categories.
The results indicate that the four-class classification converted to two-class classification yields a significantly better performance than two-class classification. This suggests that the inclusion of additional classes in the training process provides valuable information that enhances the overall accuracy of the model. By training the model on a dataset consisting of multiple classes, it can capture a wider range of patterns, variances, and characteristics present in the data. This broader understanding and increased complexity of the model contribute to an improved classification accuracy when distinguishing between the two broader categories in the four-class classification converted to two-class classification scenario.
The enhanced performance achieved with the four-class classification converted to two-class classification demonstrates the importance of considering a more comprehensive representation of the data during model training. By incorporating additional classes, the model can learn more nuanced and discriminative features, leading to better differentiation between the target categories. This finding highlights the value of leveraging the full range of available classes and their associated information when constructing classification models.
The obtained results highlight the importance of carefully designing the classification task and selecting the appropriate representation of classes to achieve optimal performance. By leveraging the additional information provided by the four-class classification, the model gains a deeper understanding of the underlying data, resulting in improved accuracy when distinguishing between broader categories. This underscores the significance of considering the relevance and inclusion of additional information in classification tasks.
Furthermore, the findings also underscore the importance of selecting appropriate input features in classification models. In this case, the inclusion of velocity vectors as input features was found to be more effective compared to the use of a heatmap representation. This suggests that the choice of input features plays a critical role in capturing relevant patterns and characteristics for accurate classification.
Moreover, the study highlights that the network's architecture and optimization techniques are crucial factors influencing the final performance. It is important to consider the interplay between the model's architecture, training algorithms, and the specific classification task at hand. Fine-tuning these components and exploring alternative approaches may further improve the accuracy and performance of the classification model.
Overall, the results emphasize the need for the careful consideration of various factors, including the classification task design, representation of classes, selection of input features, network architecture, and optimization techniques, to achieve optimal classification accuracy. Further analysis, experimentation, and refinement of methods are warranted to advance our understanding and improve the accuracy of classification tasks.

Discussion
Accurate physical and numerical measurements of RBC deformation rate are very complicated and difficult to implement when testing a blood sample that is not technologically and financially demanding. Therefore, in medical experiments, there is a prevailing tendency to replace the measurement of elastic properties with the observation of other, more easily detectable and correlated properties of RBC behavior in the flow of a blood sample in a suitable microfluidic device [15,35].
The use of CNN and ML in the field of research and simulation of RBC properties and behavior is not the ultimate method. Our intention is to gradually verify the potential of this approach, which we addressed in [17,19]. Similarly, in this article, our goal was to find out whether NNs can compensate the loss of accuracy and dimension of the image data when processing video recordings with computational complexity. The issue of using NNs in biology and medicine is the lack of training and verification data, and the need for manual labeling. In our case, this difficulty is circumvented by the use of a verified simulation model that produces a large amount of data with known parameters of the used RBCs. After verifying the correctness of the predictions of the CNN trained in this way, the next step will be to determine whether the data obtained exclusively from image recordings of real experiments are equally sufficient.

Conclusions
In this study, we investigated the classification of RBC elasticity using video recordings and CNNs. We addressed the scarcity of available blood flow recordings by employing computer simulations based on a numerical model. Our simulation model successfully captured the behavior of RBCs as elastic objects within a fluid flow, generating a large amount of training data for CNN-based classification. By analyzing the geometric characteristics captured in video recordings, we classified RBC elasticity using different CNN architectures, including ResNet and EfficientNet.
Our results demonstrate that CNNs can effectively classify RBC elasticity, providing a potential diagnostic tool for various diseases impacting RBC health. The integration of computer simulations and CNN-based video analysis offers a valuable approach to understanding the behavior of RBCs and developing new diagnostic techniques. The use of simulation-generated data provides a rich source of training examples, enabling the training of CNNs even with limited video recordings.
Furthermore, our study compared the performance of different CNN architectures, and the results showed that the EfficientNet model achieved the best accuracy for RBC elasticity classification. However, we observed that overfitting was a challenge for both ResNet and EfficientNet architectures, indicating the need for further regularization techniques to improve the model's generalization performance.
We have observed that training the network in a higher target space (four classes) and subsequently mapping the results to a lower target space (two classes) yields the best classification performance for our problem. Another important finding is that crossvalidation, where a model trained on a dataset with a different ratio of class examples is validated on another dataset with a different ratio of class examples, results in a significantly lower performance compared to validation on the same type of dataset. This observation highlights the importance of training the model on a dataset with a class ratio that matches the ratio observed in real-world scenarios. In our experiments, the RBC health is determined only by elasticity, but in reality, other RBC properties can also change, such as volume, the shape of the blood cell, etc. For this reason, further research is necessary.
In conclusion, our study highlights the potential of combining computer simulations, video analysis, and CNNs for the classification of RBC elasticity. Further research and optimization of CNN architectures and regularization techniques are warranted to enhance the accuracy and robustness of the classification model. The development of accurate and reliable diagnostic tools for assessing RBC health can have significant implications for disease diagnosis and monitoring, ultimately contributing to improved healthcare outcomes.

Data Availability Statement:
The data used in this study are freely available in the GIT repository: https://github.com/molcan23/RBC_NN (accessed on 13 July 2022).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: