Semantic Segmentation of Conjunctiva Region for Non-Invasive Anemia Detection Applications

: Technology is changing the future of healthcare, technology-supported non-invasive medical procedures are more preferable in the medical diagnosis. Anemia is one of the widespread diseases a ﬀ ecting the wellbeing of individuals around the world especially childbearing age women and children and addressing this issue with the advanced technology will reduce the prevalence in large numbers. The objective of this work is to perform segmentation of the conjunctiva region for non-invasive anemia detection applications using deep learning. The proposed U-Net Based Conjunctiva Segmentation Model (UNBCSM) uses ﬁne-tuned U-Net architecture for e ﬀ ective semantic segmentation of conjunctiva from the digital eye images captured by consumer-grade cameras in an uncontrolled environment. The ground truth for this supervised learning was given as Pascal masks obtained by manual selection of conjunctiva pixels. Image augmentation and pre-processing was performed to increase the data size and the performance of the model. UNBCSM showed good segmentation results and exhibited a comparable value of Intersection over Union (IoU) score between the ground truth and the segmented mask of 96% and 85.7% for training and validation, respectively.


Introduction
Anemia is a high prevalence blood-related disorder developed due to the decrease in the number of red blood cells or hemoglobin level or a lowered oxygen-carrying capacity of the blood [1-3].Severe anemia can cause damage to vital organs until death [4][5][6][7]; it is also responsible for increased morbidity in pre-school children and pregnant women.Iron deficiency anemia is considered to be among the most important contributing factors to the global burden of disease [8][9][10][11][12][13].In fact, one-third of anemia cases in adults are attributable to iron deficiencies, folate, and vitamin B12.Iron deficiency anemia summarizes approximately 50% of nutrient-scarred anemia cases, of which bleeding caused by gastrointestinal lesions is the first cause.
It is a subtle disease as has a slow evolution, does not show manifest symptoms until it becomes severe, because the human body compensates lack of oxygen.The symptomatology varies but recurrent symptoms include fatigue, dizziness or light-headedness, headache, pallor, chest pain, weakness, irregular heartbeat, shortness of breath, and cold hands and feet.Prevention or early detection of anemia will reduce serious complications and helps to lead a healthy life.
To carry out a preliminary and instantaneous diagnosis, the study of anamnesis and the physical examination are often used in clinical practice [14][15][16][17][18][19][20][21][22].Health professionals perform a physical examination of anatomical sites such as conjunctiva, tongue, palmar crease, nail beds since they are indicators to raise suspicion of anemia, but it must also be said that there are less optimistic findings, for example, Da Silva et al. [21] evaluated the correlation between the opinions expressed by different doctors and pointed out that the ideal condition for examining skin color is with natural daylight or without direct light on the skin: Therefore the different ambient lighting can influence a correct diagnosis.However, also the result conceived by the examiner may differ from one another depending upon their expertise level.
The gold standard test is blood cell counting that is normally invasive or at least minimally invasive.Often this test is not recommended as an example in the case of infants, the elderly, and pregnant women.Furthermore, frequent sampling is uncomfortable and expensive.Then, the techniques described in this paper are considered to be extremely important, particularly for patients who need frequent blood tests or who have difficulty going frequently to test labs.More generally, this type of analysis is suitable in the increasingly frequent cases of managing patient diseases directly in their homes, through appropriate diagnostic and therapeutic care pathways and services such as medical records [23,24].For this reason, it is of great interest to study methods and design tools that allow us to monitor the hemoglobin concentration in a non-invasive way, with reduced costs, both in the laboratory and at home.This is the reason why many authors [25,26] show interest in the pallor of the exposed tissues of the human body to estimate anemia.Pallor is characterized by a lack of color in the skin and mucous membranes due to a low level of circulating hemoglobin.This may be evident on the entire body, but, as we have said above, is easily observed in areas where blood vessels are close to the surface, such as the palm, the nail bed, and mucous membranes such as the tongue or conjunctivae.
The technology-assisted diagnosis is increasing in the last years versus the traditional one, due to its higher reliability and unbiased results.In the last years, speech and image analysis and computer vision also powered by deep learning and its ability to rapidly extract the required information from digital images or videos allowed the development of interesting medical applications and diagnostic tools to support specialists [27][28][29][30][31][32][33][34].
Deep learning models can perform classification, classification with localization, semantic segmentation, instance segmentation, and object detection tasks with a high level of understanding.
Image segmentation, the process of identifying or partitioning the needed information from the digital images, nowadays is applied to microscopic images to satellite images for different applications.Thresholding, Clustering, Histogram-based method, Edge detection, Region-growing, Watershed transformation are a few traditional image processing techniques used for segmentation.Specifically, semantic segmentation is the process of labeling each pixel of an image with a predefined class.In recent years, researchers focusing towards deep learning-based image segmentation due to the availability of online resource materials, easy accessibility of high computational power, availability of computer vision and other supporting libraries, and the potential of Convolution Neural Network architecture in obtaining effective segmentation results [35][36][37][38][39].In the literature, an interesting application involving neural networks and more broadly computer vision techniques is microscopy image analysis [40,41], the analogy with our domain stands in various environmental factors which can lead to a false interpretation of the results by human professionals.
As reported by earlier works, a pale conjunctiva is considered an accurate sign of anemia [42][43][44].Hence, it is considered as a Region of Interest (ROI), and typically manual cropping is performed on eye images to extract its color features, while automatic segmentation tried for this application using image processing techniques still needs some improvement.To fully automate the diagnostic support and automatically detect the suspicion of anemia, as an example on a mobile device or web application, an automatic segmentation of the ROI is required to avoid the subjective choice of the ROI by the patients.
In this paper, the transfer learning on U-Net architecture for semantic image segmentation is applied, as well as the process of labeling each pixel of an image with a predefined class.Section 2 Electronics 2020, 9, 1309 3 of 13 describes the materials and methods used for the proposed U-net based segmentation model.Section 3 explains the details about the structure and organization of layer modules of the proposed U-net architecture.Section 4 discusses the performance of the model during training and validation and the segmentation results and Section 5 concludes the work.

U-Net Based Conjunctiva Segmentation Model (UNBCSM)
Researchers from related works used hundreds of samples for conducting and validating their study [45][46][47][48][49][50] and most of the conjunctiva images were not publicly available such as retinal fundus images.The acquisition system consists of a macro-lens assembled into a specially designed, 3D-printed lightened spacer and a smartphone.The lens can take high-resolution images being attached to a smartphone (Aukey PL-M1 25 mm 10 × macro lens).This device allows obtaining high resolution images of the eye, insensitive to the ambient lighting conditions in conformity with diagnostic physical examination procedures.The model uses 135 eye images with a clear visible lower eyelid for training and validation.The semantic segmentation of conjunctiva is challenging due to the presence of fluids, nerves, folds, light reflections, and shadows in the eye region which motivated us to apply deep learning techniques.For this application we have chosen an architecture based on U-Net, a popular convolutional neural network model which showed fast and better segmentation results for modest data and promising results for biomedical image segmentation tasks.In Figure 1, we show the steps involved in developing the proposed U-Net Based Conjunctiva Segmentation Model (UNBCSM), which will be furtherly explained in Section 3.
Electronics 2020, 9, x FOR PEER REVIEW 3 of 13 describes the materials and methods used for the proposed U-net based segmentation model.Section 3 explains the details about the structure and organization of layer modules of the proposed U-net architecture.Section 4 discusses the performance of the model during training and validation and the segmentation results and Section 5 concludes the work.

U-Net Based Conjunctiva Segmentation Model (UNBCSM)
Researchers from related works used hundreds of samples for conducting and validating their study [45][46][47][48][49][50] and most of the conjunctiva images were not publicly available such as retinal fundus images.The acquisition system consists of a macro-lens assembled into a specially designed, 3Dprinted lightened spacer and a smartphone.The lens can take high-resolution images being attached to a smartphone (Aukey PL-M1 25 mm 10 × macro lens).This device allows obtaining high resolution images of the eye, insensitive to the ambient lighting conditions in conformity with diagnostic physical examination procedures.The model uses 135 eye images with a clear visible lower eyelid for training and validation.The semantic segmentation of conjunctiva is challenging due to the presence of fluids, nerves, folds, light reflections, and shadows in the eye region which motivated us to apply deep learning techniques.For this application we have chosen an architecture based on U-Net, a popular convolutional neural network model which showed fast and better segmentation results for modest data and promising results for biomedical image segmentation tasks.In Figure 1, we show the steps involved in developing the proposed U-Net Based Conjunctiva Segmentation Model (UNBCSM), which will be furtherly explained in Section 3.

Segmentation Mask Creation
UNBCSM architecture uses a RGB image in JPEG format provided to 'input image' block and a manually segmented mask in a NumPy array format as the ground truth for training.Each digital image underwent a manual segmentation process, isolating and cropping regions of palpebral and forniceal conjunctiva, providing the labels to the supervised deep learning model using Pascal masking.The mask selection has been supported by an interactive tool using the CV2 library, that we have developed.The labeling of pixels is performed by choosing the eyelid region manually through the mouse input.When an image is labeled, the mouse strokes are saved as a binary NumPy array.We call this two-dimensional NumPy array a 'mask'.The original segmentation masks, which are non-binary RGB triplets, have been converted to PASCAL masks by grey scale conversion and thresholding.We stored the segmented mask as a binary matrix with zeros referring to the background and ones to the conjunctiva or meaningful ROI.Both the right and left corner area of the conjunctiva were not included in the selection if affected by light reflections or shadows in the segmentation mask since these portions are more likely affected by flaws caused by fluids and color changes.In a few images the lower portion of the bulbar conjunctiva is similar in color with the palpebral conjunctiva hence these portions are omitted.All the bright and dark spots are omitted during manual segmentation, giving to the network the capability of focusing on meaningful details.Mask creation for all 135 images was performed manually for augmentation by a specific program and the resulted mask was overlaid with the corresponding augmented image and verified.

Segmentation Mask Creation
UNBCSM architecture uses a RGB image in JPEG format provided to 'input image' block and a manually segmented mask in a NumPy array format as the ground truth for training.Each digital image underwent a manual segmentation process, isolating and cropping regions of palpebral and forniceal conjunctiva, providing the labels to the supervised deep learning model using Pascal masking.The mask selection has been supported by an interactive tool using the CV2 library, that we have developed.The labeling of pixels is performed by choosing the eyelid region manually through the mouse input.When an image is labeled, the mouse strokes are saved as a binary NumPy array.We call this two-dimensional NumPy array a 'mask'.The original segmentation masks, which are non-binary RGB triplets, have been converted to PASCAL masks by grey scale conversion and thresholding.We stored the segmented mask as a binary matrix with zeros referring to the background and ones to the conjunctiva or meaningful ROI.Both the right and left corner area of the conjunctiva were not included in the selection if affected by light reflections or shadows in the segmentation mask since these portions are more likely affected by flaws caused by fluids and color changes.In a few images the lower portion of the bulbar conjunctiva is similar in color with the palpebral conjunctiva hence these portions are omitted.All the bright and dark spots are omitted during manual segmentation, giving to the network the capability of focusing on meaningful details.Mask creation for all 135 images was performed manually for augmentation by a specific program and the resulted mask was overlaid with the corresponding augmented image and verified.

Image Augmentation and Pre-Processing
To increase the data size for training the segmentation model image augmentation was performed.The augmentation process utilized CV2 and NumPy libraries to create slightly different images of the original ones using the following rotational and non rotational techniques:

•
Angular Rotation (between −45 to 45 degrees) at angle increments of 5 degrees These techniques are randomly chosen and applied to the randomly selected original images of training data (Figure 2).Each of these augmentation techniques has been applied to both the feature images as well as the label NumPy array pixel-wise correspondingly.The image augmentation technique is applied only for the training set images and its corresponding masks.

Image Augmentation and Pre-Processing
To increase the data size for training the segmentation model image augmentation was performed.The augmentation process utilized CV2 and NumPy libraries to create slightly different images of the original ones using the following rotational and non rotational techniques:

•
Angular Rotation (between −45 to 45 degrees) at angle increments of 5 degrees These techniques are randomly chosen and applied to the randomly selected original images of training data (Figure 2).Each of these augmentation techniques has been applied to both the feature images as well as the label NumPy array pixel-wise correspondingly.The image augmentation technique is applied only for the training set images and its corresponding masks.
To improve the training time and to reduce the memory, the required image resolution was reduced to 512 × 384 by a resizing technique.The segmentation masks are also resized to the same resolution to make it compatible with the resized images.Input data are normalized in order to feed them to the model.Vertical flipping, contrast enhancement, and warp shifting transformation techniques are also used for image augmentation.Since the images are captured by the specially designed spacer, vertical flipping, warp shifting, and angular rotations with a higher angle increments have not shown significant improvement in results.These transformation techniques can be helpful in segmenting the conjunctiva region from the unmodified smartphone captured eye images.

U-Net Architecture and Fine-Tuning
Traditional U-Net architectures use a dual path approach for retrieving and localize the overall image context.A contracting path has been used to capture the context and a symmetric expanding layer has been used for enabling precise localization [35].This paper discusses a model that differs from the traditional U-Net model in the following ways:

•
Usually max-pooling layers appear at each stage whereas in this model, max pooling is attempted only at the first stage in the first layer group.Convolutional layers with stride two are replacing the max pooling layers.

•
With this model we introduce more activation and normalization layers than the traditional model.

•
Dropout layers have been removed in favor of making this model specific to the eye segmentation application.To improve the training time and to reduce the memory, the required image resolution was reduced to 512 × 384 by a resizing technique.The segmentation masks are also resized to the same resolution to make it compatible with the resized images.Input data are normalized in order to feed them to the model.
Vertical flipping, contrast enhancement, and warp shifting transformation techniques are also used for image augmentation.Since the images are captured by the specially designed spacer, vertical flipping, warp shifting, and angular rotations with a higher angle increments have not shown significant improvement in results.These transformation techniques can be helpful in segmenting the conjunctiva region from the unmodified smartphone captured eye images.

U-Net Architecture and Fine-Tuning
Traditional U-Net architectures use a dual path approach for retrieving and localize the overall image context.A contracting path has been used to capture the context and a symmetric expanding layer has been used for enabling precise localization [35].This paper discusses a model that differs from the traditional U-Net model in the following ways:

•
Usually max-pooling layers appear at each stage whereas in this model, max pooling is attempted only at the first stage in the first layer group.Convolutional layers with stride two are replacing the max pooling layers.

•
With this model we introduce more activation and normalization layers than the traditional model.

•
Dropout layers have been removed in favor of making this model specific to the eye segmentation application.
The basic U-Net architecture consists of a contracting path which performs down sampling and the right side as an expanding path responsible for the transposed convolution.The details about the layers are clearly visible in Figure 3.
Electronics 2020, 9, x FOR PEER REVIEW 5 of 13 The basic U-Net architecture consists of a contracting path which performs down sampling and the right side as an expanding path responsible for the transposed convolution.The details about the layers are clearly visible in Figure 3.The layers of this UNBCSM architecture shown in Figure 4 can be grouped as six Layer Modules (LM), apart from these, there are some individual layers used as coupling between two Layer Modules.The layers of this UNBCSM architecture shown in Figure 4 can be grouped as six Layer Modules (LM), apart from these, there are some individual layers used as coupling between two Layer Modules.
Electronics 2020, 9, x FOR PEER REVIEW 5 of 13 The basic U-Net architecture consists of a contracting path which performs down sampling and the right side as an expanding path responsible for the transposed convolution.The details about the layers are clearly visible in Figure 3.The layers of this UNBCSM architecture shown in Figure 4 can be grouped as six Layer Modules (LM), apart from these, there are some individual layers used as coupling between two Layer Modules.Layer module 1 consists of normalization, padding, convolution, activating, and max-pooling layers, being the only one with a max-pooling layer.This is to reduce the model's sensitivity to the location while keeping the overall context.As this application tries to detect a single class of object by segmentation (eyelid/lower palpebral conjunctiva), the one max-pooling layer is sufficient.Layer Module 2 has two subtypes where Layer Module 2a differs from Layer Module subtype 2b by the stride value.Layer Module 2a is also known as Identity Block having a Conv 2D layer with stride 1 and Layer Module 2b having the Conv2D layer with stride 2 for dimensionality reduction.We have employed ResNet-34 as a backbone, due to its efficiency obtained in the performance analysis.
As the model progresses downwards on the left side, the down sampling and convolutional layers perform the segmentation task.This information is inherently saved in the model weights and a schema arises which gets remapped back to the original size as the model progresses upwards on the right side.The Layer Modules 4, 5, and 6 are fairly simpler compared to their left side counterparts and they comprise of up-sampling, normalization, and activation layers.The output layer consists of a final convolutional layer followed by a sigmoid activation function used to detect the labels for the binary classification task.
Google Colab execution runtime with GPU has been used to train, validate, and assess the performance of the segmentation model.UNBCSM was trained with 581 images which includes 108 original images and 473 images obtained by image augmentation techniques.The split validation technique was used instead of the N-fold cross validation technique, since the augmented images of the validation set present in the training set will over-fit the segmentation model.To avoid this issue, 20% of the original images, 27 images were randomly selected and kept exclusive for validation.The details about the number of images used at each stage are shown in Figure 5.
Electronics 2020, 9, x FOR PEER REVIEW 6 of 13 Layer module 1 consists of normalization, padding, convolution, activating, and max-pooling layers, being the only one with a max-pooling layer.This is to reduce the model's sensitivity to the location while keeping the overall context.As this application tries to detect a single class of object by segmentation (eyelid/lower palpebral conjunctiva), the one max-pooling layer is sufficient.Layer Module 2 has two subtypes where Layer Module 2a differs from Layer Module subtype 2b by the stride value.Layer Module 2a is also known as Identity Block having a Conv 2D layer with stride 1 and Layer Module 2b having the Conv2D layer with stride 2 for dimensionality reduction.We have employed ResNet-34 as a backbone, due to its efficiency obtained in the performance analysis.
As the model progresses downwards on the left side, the down sampling and convolutional layers perform the segmentation task.This information is inherently saved in the model weights and a schema arises which gets remapped back to the original size as the model progresses upwards on the right side.The Layer Modules 4, 5, and 6 are fairly simpler compared to their left side counterparts and they comprise of up-sampling, normalization, and activation layers.The output layer consists of a final convolutional layer followed by a sigmoid activation function used to detect the labels for the binary classification task.
Google Colab execution runtime with GPU has been used to train, validate, and assess the performance of the segmentation model.UNBCSM was trained with 581 images which includes 108 original images and 473 images obtained by image augmentation techniques.The split validation technique was used instead of the N-fold cross validation technique, since the augmented images of the validation set present in the training set will over-fit the segmentation model.To avoid this issue, 20% of the original images, 27 images were randomly selected and kept exclusive for validation.The details about the number of images used at each stage are shown in Figure 5.

Results and Discussion
The performance of the UNBCSM for this segmentation task can be visualized with the help of graphs as shown in Figure 6.The learning curve shows that the model is neither suffering from an under fitting or overfitting problem.
For this critical biomedical application, the Intersection over Union (IoU) score was chosen over pixel accuracy and Dice Index (F1 score) due to the fact that the IoU metric quantitatively penalizes every single instance of bad classification more than the F1 score even though both metrics recognize

Results and Discussion
The performance of the UNBCSM for this segmentation task can be visualized with the help of graphs as shown in Figure 6.The learning curve shows that the model is neither suffering from an under fitting or overfitting problem.
For this critical biomedical application, the Intersection over Union (IoU) score was chosen over pixel accuracy and Dice Index (F1 score) due to the fact that the IoU metric quantitatively penalizes every single instance of bad classification more than the F1 score even though both metrics recognize the bad classification in a given pixel the same way.The IoU score deals with the class imbalance issue better than pixel accuracy.
The Intersection over Union (IoU) score is the ratio of the area of overlap between the predicted segmentation and the ground truth to the area of union between the predicted segmentation and the ground truth.Mathematically, IoU is calculated using the formula mentioned in Equation (1), where G represents the number of pixels present in the groundtruth segmentation mask, P represents the number of pixels in the segmentation mask predicted by the model, IoU being an overlapping measure ranging from 0 to 1, gives us a useful perspective about the quality of the segmentation.
The model proposed in this paper produced an average IoU score of 85.7% with standard deviation of (+/−) 5.3% on unseen samples as shown in Table 2. Except for a single image all other samples are having an IoU score of more than 0.77.Usually, a model producing an average IoU above 50% is considered a good segmentation model.Since it achieved a mean IoU score of 96% and 85.7% for training and validation respectively, this model will be more suitable for anemia detection application over manual cropping.With the aim of estimating the performance of the trained model, we compared for each sample the proposed segmentation results with the manually selected ROI according to the IoU score.The latter process is resumed in Figure 7 utilizing a sample from the test set.Figure 7a displays the original eye image; Figure 7b shows the overlay of the original sample and the manually selected ROI providing the label or ground truth for training.In Figure 7c, we show the cropped region of the manually segmented region and its corresponding mask.The segmentation output from the trained model as shown in Figure 7d,e is the overlay of Figure 7c,d for a better understanding of segmentation results.
Electronics 2020, 9, x FOR PEER REVIEW 8 of 13 the bad classification in a given pixel the same way.The IoU score deals with the class imbalance issue better than pixel accuracy.The Intersection over Union (IoU) score is the ratio of the area of overlap between the predicted segmentation and the ground truth to the area of union between the predicted segmentation and the ground truth.Mathematically, IoU is calculated using the formula mentioned in Equation (1), where G represents the number of pixels present in the groundtruth segmentation mask, P represents the number of pixels in the segmentation mask predicted by the model, IoU being an overlapping measure ranging from 0 to 1, gives us a useful perspective about the quality of the segmentation.
The model proposed in this paper produced an average IoU score of 85.7% with standard deviation of (+/−) 5.3% on unseen samples as shown in Table 2. Except for a single image all other samples are having an IoU score of more than 0.77.Usually, a model producing an average IoU above 50% is considered a good segmentation model.Since it achieved a mean IoU score of 96% and 85.7% for training and validation respectively, this model will be more suitable for anemia detection application over manual cropping.With the aim of estimating the performance of the trained model, we compared for each sample the proposed segmentation results with the manually selected ROI according to the IoU score.The latter process is resumed in Figure 7 utilizing a sample from the test set.Figure 7a displays the original eye image; Figure 7b shows the overlay of the original sample and the manually selected ROI providing the label or ground truth for training.In Figure 7c, we show the cropped region of the manually segmented region and its corresponding mask.The segmentation output from the trained model as shown in Figure 7d,e is the overlay of Figure 7c,d for a better understanding of segmentation results.Figure 8 shows the segmentation capability of the model for the validation samples.Row I shows the manually segmented conjunctive region of the images A, B, C, and D, respectively.Similarly, row II shows the manually segmented conjunctive region and Row III images are the overlay of both ground truth and the segmentation results.Column A and B images show a very good performance, Column C and D images show the average and mediocre performance of the model.In Table A1 we have reported the values of the above mentioned metrics in a complete manner for each sample included in the test set.

Conclusions
Non-invasive anemia detection applications require a manual segmentation or a manual cropping procedure of the region of interest as a preliminary step.Due to the existence of correlation between pixel parameters of the conjunctiva region and the hemoglobin value, the segmentation capability of this model will help in accurate diagnosis of anemia.The proposed model can also be serialized and compressed to be installed in a mobile application to work in an offline mode, this can be useful in low-end medical facilities in poor areas.Most of the existing conjunctiva-based hemoglobin prediction algorithms are trained using a manually cropped portion of the conjunctiva region.By introducing an automatic conjunctiva segmentation step, we pave the way for this to be used as a pre-processing step for the existing works.A poor selection of ROI will make way for inaccurate predictions affecting the automatic diagnostic tool or the examination carried out by the medical personnel.This paper discussed automatic segmentation as a viable solution for this problem of conjunctiva segmentation to be used in conjunction with other diagnosis methods.The proposed work identified the suitability of a fine-tuned U-Net model (UNBCSM) for conjunctiva image segmentation with a mean IoU score of 84.5% for the validation set.Since this model does not drop any information that may be correlated to the hemoglobin level (for example, the nerve pattern, color of the nerves in the conjunctiva region, etc.) more accurate models can be derived from the existing research work by using this as a pre-processing step.This research work opens a range of

Conclusions
Non-invasive anemia detection applications require a manual segmentation or a manual cropping procedure of the region of interest as a preliminary step.Due to the existence of correlation between pixel parameters of the conjunctiva region and the hemoglobin value, the segmentation capability of this model will help in accurate diagnosis of anemia.The proposed model can also be serialized and compressed to be installed in a mobile application to work in an offline mode, this can be useful in low-end medical facilities in poor areas.Most of the existing conjunctiva-based hemoglobin prediction algorithms are trained using a manually cropped portion of the conjunctiva region.By introducing an automatic conjunctiva segmentation step, we pave the way for this to be used as a pre-processing step for the existing works.A poor selection of ROI will make way for inaccurate predictions affecting the automatic diagnostic tool or the examination carried out by the medical personnel.This paper discussed automatic segmentation as a viable solution for this problem of conjunctiva segmentation to be used in conjunction with other diagnosis methods.The proposed work identified the suitability of a fine-tuned U-Net model (UNBCSM) for conjunctiva image segmentation with a mean IoU score of 84.5% for the validation set.Since this model does not drop any information that may be correlated to the hemoglobin level (for example, the nerve pattern, color of the nerves in the conjunctiva region, etc.) more accurate models can be derived from the existing research work by using this as a pre-processing step.This research work opens a range of opportunities for using a wider spectrum of the image features for hemoglobin level detection rather than just color averages in color spaces (RGB, YCrCb, etc.).acquired with all the required authorizations.Furthermore, each patient signed a form to provide consent for this study and each acquired image has been treated anonymously.

Figure 1 .
Figure 1.Flow graph of training the segmentation model.

Figure 1 .
Figure 1.Flow graph of training the segmentation model.

Figure 3 .
Figure 3. Layer details of the layer modules of UNBCSM.

Figure 3 .
Figure 3. Layer details of the layer modules of UNBCSM.

Figure 3 .
Figure 3. Layer details of the layer modules of UNBCSM.

Figure 5 .
Figure 5. Training and validation data.

For a fixed
batch size, epochs, and learning rate the model was trained and validated with different combinations of backbones such as ResNet-18, ResNet-34, ResNet-50, and ResNet-101 with 'imagenet' weights.The model trained with ResNet-34 showed a better performance showing competitive execution time over ResNet-50 and ResNet-101.We performed an extensive model selection phase in order to select the architecture that better minimizes the estimated risk, based on validation results.The model was trained with different batch sizes (4, 8, 16, 32, and 64).As we would expect lower batch sizes tend to increase the training time with no significant improvements in performance and batch size 16 shows a comparable performance with less execution time.Balanced Cross Entropy_Jaccard Loss (bce_jaccard_loss) is considered for training since it eliminates the class unbalanced issue during training.The model with a 'ResNet-34′ backbone with pre-initialized weights from 'imagenet' was trained with different learning rates.The lower learning rate reduces the difference between the training and validation score but results less in the overall score.The higher learning rate (>0.001) creates spikes in the learning curve.The learning rate for the model is assigned by the Learning Rate (LR) scheduling technique, it trains the model with four different learnings LR = 10 for epochs 1 to 10, LR = 10 for epochs 11 to 20, LR = 10 for epochs 21 to 80, and LR = 10 for 81 to 100, and

Figure 5 .
Figure 5. Training and validation data.

For a fixed
batch size, epochs, and learning rate the model was trained and validated with different combinations of backbones such as ResNet-18, ResNet-34, ResNet-50, and ResNet-101 with 'imagenet' weights.The model trained with ResNet-34 showed a better performance showing competitive execution time over ResNet-50 and ResNet-101.We performed an extensive model selection phase in order to select the architecture that better minimizes the estimated risk, based on validation results.The model was trained with different batch sizes (4, 8, 16, 32, and 64).As we would expect lower batch sizes tend to increase the training time with no significant improvements in performance and batch size 16 shows a comparable performance with less execution time.Balanced Cross Entropy_Jaccard Loss (bce_jaccard_loss) is considered for training since it eliminates the class unbalanced issue during training.The model with a 'ResNet-34 backbone with pre-initialized weights from 'imagenet' was trained with different learning rates.The lower learning rate reduces the difference between the training and validation score but results less in the overall score.The higher learning rate (>0.001) creates spikes in the learning curve.The learning rate for the model is assigned by the Learning Rate (LR) scheduling technique, it trains the model with four different learnings LR = 10 −3 for epochs 1 to 10, LR = 10 −4 for epochs 11 to 20, LR = 10 −5 for epochs 21 to 80, and LR = 10 −6 for 81 to 100, and showed better learning curves as shown in Figure 6.The selected parameters are shown in

Figure 7 . 13 Figure 8 13 Figure 7 .
Figure 7. Segmentation results.(a) Original eye image, (b) overlay of original image and ground truth, (c) manually segmented region of interest (ROI) and its mask, (d) segmentation of ROI by the model and its mask, (e) overlay of 'd' and 'e' (IoU score = 0.8864).

Figure 8 .
Figure 8. Four samples (A-D) of the segmentation results.Row I: The manually segmented conjunctive region, Row II: Segmented conjunctive region by the model, and Row III: Overlay of both the masks and IoU.

Figure 8 .
Figure 8. Four samples (A-D) of the segmentation results.Row I: The manually segmented conjunctive region, Row II: Segmented conjunctive region by the model, and Row III: Overlay of both the masks and IoU.

Table 1 .
The model requires 345 ms to run a single batch step and 14 ms for an epoch.

Table 2 .
Performance of the model for validation set (27 images).

Table 2 .
Performance of the model for validation set (27 images).

Table A1 .
Each sample from the test set underwent a calculation of the segmentation metrics proposed in this research with respect to the ground truth manually segmented images.