Using Slit-Lamp Images for Deep Learning-Based Identification of Bacterial and Fungal Keratitis: Model Development and Validation with Different Convolutional Neural Networks

In this study, we aimed to develop a deep learning model for identifying bacterial keratitis (BK) and fungal keratitis (FK) by using slit-lamp images. We retrospectively collected slit-lamp images of patients with culture-proven microbial keratitis between 1 January 2010 and 31 December 2019 from two medical centers in Taiwan. We constructed a deep learning algorithm consisting of a segmentation model for cropping cornea images and a classification model that applies different convolutional neural networks (CNNs) to differentiate between FK and BK. The CNNs included DenseNet121, DenseNet161, DenseNet169, DenseNet201, EfficientNetB3, InceptionV3, ResNet101, and ResNet50. The model performance was evaluated and presented as the area under the curve (AUC) of the receiver operating characteristic curves. A gradient-weighted class activation mapping technique was used to plot the heat map of the model. By using 1330 images from 580 patients, the deep learning algorithm achieved the highest average accuracy of 80.0%. Using different CNNs, the diagnostic accuracy for BK ranged from 79.6% to 95.9%, and that for FK ranged from 26.3% to 65.8%. The CNN of DenseNet161 showed the best model performance, with an AUC of 0.85 for both BK and FK. The heat maps revealed that the model was able to identify the corneal infiltrations. The model showed a better diagnostic accuracy than the previously reported diagnostic performance of both general ophthalmologists and corneal specialists.


Introduction
Microbial keratitis (MK) is a serious corneal disease that can lead to reduced vision and even blindness [1,2]. The annual incidence of MK as a cause of monocular blindness ranges from 1.5 to 2 million cases worldwide [3]. It is considered an epidemic, particularly within South Asia, Southeast Asia, and East Asia, and in regions where fungal keratitis (FK) accounts for more than 50% of all MK cases [4]. The management of FK is challenging and may require surgical intervention, and FK has been reported to have poor visual outcomes [5,6]. Hence, early diagnosis is essential for avoiding devastating vision-threatening outcomes.
However, the early diagnosis of FK is also challenging. Although some predisposing factors and clinical features could lead ophthalmologists to suspect fungal infection [7], Diagnostics 2021, 11, 1246 2 of 11 culture-based methods, which are laborious and time consuming, remain the cornerstone of diagnosis [8]. The time gap between patient presentation and diagnosis may lead the patient to miss the optimal time for treatment initiation, resulting in deep fungal invasion. Therefore, ophthalmologists may start antifungal medication based on distinctive lesions of each pathogenic microorganism identified on the cornea. Previous studies have reported that with an image-only diagnosis, general ophthalmologists are only able to correctly distinguish FK from bacterial keratitis (BK) 49.3-67.1% of the time [9,10], and this percentage ranges from 66.0% to 75.9% among corneal specialists [10,11].
Deep learning algorithms with artificial intelligence (AI) have demonstrated an exceptional performance at detecting various ocular conditions through different image modalities, such as fundus photos for diabetic retinopathy [12], age-related macular degeneration [13], retinal nerve fiber layer thickness and visual field for glaucoma [14], and topography for keratoconus [15]. AI appears to be a promising tool for first-line medical care, especially in scenarios in which ophthalmologists are not readily available. However, only limited studies have applied AI for the diagnosis of FK by using slit-lamp images, [9,10] and the accuracy was approximately 69% [10]. In this study, we aimed to develop a deep learning model that uses cropped slit-lamp images and to improve the prediction in differentiating BK and FK.

Identification of Microbial Keratitis
Patients with culture-proven MK who presented to Chang Gung Memorial Hospital, Linkou Medical Center, and Kaohsiung Medical Center in Taiwan between 1 January 2010 and 31 December 2019 were recruited. MK diagnosis was corroborated by the clinical manifestations of corneal infection and pathogen identification of the sample from the infection site. Corneal scrapings obtained from patients with presumed MK underwent smear and culture examinations for the detection of bacteria, mycobacteria, and fungi through the use of standard microbiological culture techniques [16,17], including blood and chocolate agar (Nippon Becton Dickinson Co. LTD, Akasaka Garden City, Japan), inhibitory mold agar (IMA) and IMA supplemented with chloramphenicol and gentamicin (Creative CMP ® New Taipei City, Taiwan), Lowenstein-Jensen agar slants (Creative CMP ® New Taipei City, Taiwan), and thioglycolate broth (BioStar, Taichung City, Taiwan). Cultures were defined as positive if microbial growth was observed on two media, microbial elements were observed in smears and growth on one medium, or confluent microbial growth was observed on one medium.

Exclusion Criteria
Patients were excluded if they had mixed bacterial and fungal infections; corneal perforation; no documented slit-lamp images; poor-quality or fluorescein-staining images; a history of ocular surface surgery, such as penetrating keratoplasty, and amniotic membrane transplantation; or the presence of other corneal diseases, such as viral keratitis, Acanthamoeba keratitis, marginal keratitis, corneal opacity, chemical burn, Stevens-Johnson syndrome, mucous membrane cicatricial pemphigoid, or bullous keratopathy.

Image Collection
We obtained slit-lamp images from two centers by using the same standard procedure certified by ophthalmic technicians. Images from Linkou Medical Center were captured with a Canon EOS 7D camera mounted on Haag-Streit BX900 slit-lamp microscopy, and images from Kaohsiung Medical Center were captured with a Nikon D100 camera mounted on a Topcon SL-D8 slit-lamp biomicroscope (before May 2015) and Canon EOS 7D camera mounted on a Haag-Streit BX900 slit-lamp microscope (after May 2015) [10]. Images with white light illumination without slit-beam enhancement from each patient were used for image classification.

Algorithm Architecture
The algorithm architecture is illustrated in Figure 1. The algorithm was divided into two main parts, namely the segmentation model and classification model. We trained the segmentation model by using U square Net (U 2 Net) to crop the image of the cornea (sample in Figure 2). The U 2 Net model performed better than the U-net and U-net++ models did because it could access more information and preserve complete features of the cornea [18]. All of the images were then resized to a resolution of 512 × 512 × 3 before being input into the U 2 Net model. We also normalized each image into (0,1) and then augmented the image by subtracting 0.485 and dividing by 0.299, which enabled the model to converge more quickly and steadily. A total of 100 patients were randomly selected and divided into training, validation, and testing sets, which consisted of 70 patients (183 images), 20 patients (45 images), and 10 patients (15 images), respectively. The U 2 Net Intersection over Union model achieved accuracies of 93% on the validation set and 95% on the independent testing set. Furthermore, the trained U 2 Net model was applied to a total of 580 patients (1330 images).
7D camera mounted on a Haag-Streit BX900 slit-lamp microscope (after May 2015) [10] Images with white light illumination without slit-beam enhancement from each patien were used for image classification.

Algorithm Architecture
The algorithm architecture is illustrated in Figure 1. The algorithm was divided into two main parts, namely the segmentation model and classification model. We trained the segmentation model by using U square Net (U 2 Net) to crop the image of the cornea (sam ple in Figure 2). The U 2 Net model performed better than the U-net and U-net++ model did because it could access more information and preserve complete features of the cornea [18]. All of the images were then resized to a resolution of 512 × 512 × 3 before being inpu into the U 2 Net model. We also normalized each image into (0,1) and then augmented the image by subtracting 0.485 and dividing by 0.299, which enabled the model to converge more quickly and steadily. A total of 100 patients were randomly selected and divided into training, validation, and testing sets, which consisted of 70 patients (183 images), 20 patients (45 images), and 10 patients (15 images), respectively. The U 2 Net Intersection over Union model achieved accuracies of 93% on the validation set and 95% on the inde pendent testing set. Furthermore, the trained U 2 Net model was applied to a total of 580 patients (1330 images).   In the classification model, we used five-fold cross-validation to divide the c images from the aforementioned U 2 Net segmentation model. For data randomiza set different seeds to simulate the same sequence of number randomization in ea We categorized the images in the data set according to the patient. The whole data then split into six parts; one-sixth of the data set was reserved for testing, and the ing five parts were used for cross-validation training. For each patient, the data w arated into training, validation, and testing data sets at a ratio of 4:1:1. We also use augmentation approaches including random brightness adjustment, saturation ment, contrast adjustment, horizontal flipping, rotation, and normalization, as sh Supplementary Figure S1.
For the classification model, we applied various convolutional neural n (CNNs), including ResNet50, ResNet101, DenseNet121, DenseNet161, Dense DenseNet201, InceptionV3, and EfficientNetB3. DenseNet161 was used as the bas classification model, as shown in Figure 1, with preweights from ImageNet Lar Visual Recognition Competition [19,20]. Age and sex were also input into fully co layers and yielded two output vectors; the output vectors were then concatenated vectors produced from the global average pooling layer [21]. The model was tra to 100 epochs and established on the basis of maximum accuracy and minimum the validation set.

Performance Interpretation and Statistics
For visualizing heat maps, the gradient-weighted class activation mapping CAM) technique [22], in which the model's attention scores are computed accor the calculation of the gradients of the model's output and the last convolutional lay used to plot the heat map of the model. Receiver operating characteristic (ROC were illustrated to discriminate between BK and FK, and the area under the curve was measured. From the ROC curve, Youden's index was used to obtain the sen and the specificity. The accuracy of the model was also calculated. Statistical analy performed with IBM SPSS Statistics Version 23 (SPSS, Inc., Chicago, IL, USA).

Patient Classification and Characteristics
A total of 580 patients (420 male and 160 female) with 1330 images (with o In the classification model, we used five-fold cross-validation to divide the cropped images from the aforementioned U 2 Net segmentation model. For data randomization, we set different seeds to simulate the same sequence of number randomization in each seed. We categorized the images in the data set according to the patient. The whole data set was then split into six parts; one-sixth of the data set was reserved for testing, and the remaining five parts were used for cross-validation training. For each patient, the data were separated into training, validation, and testing data sets at a ratio of 4:1:1. We also used image augmentation approaches including random brightness adjustment, saturation adjustment, contrast adjustment, horizontal flipping, rotation, and normalization, as shown in Supplementary Figure S1. For the classification model, we applied various convolutional neural networks (CNNs), including ResNet50, ResNet101, DenseNet121, DenseNet161, DenseNet169, DenseNet201, InceptionV3, and EfficientNetB3. DenseNet161 was used as the basis of our classification model, as shown in Figure 1, with preweights from ImageNet Large Scale Visual Recognition Competition [19,20]. Age and sex were also input into fully connected layers and yielded two output vectors; the output vectors were then concatenated into the vectors produced from the global average pooling layer [21]. The model was trained up to 100 epochs and established on the basis of maximum accuracy and minimum loss in the validation set.

Performance Interpretation and Statistics
For visualizing heat maps, the gradient-weighted class activation mapping (Grad-CAM) technique [22], in which the model's attention scores are computed according to the calculation of the gradients of the model's output and the last convolutional layer, was used to plot the heat map of the model. Receiver operating characteristic (ROC) curves were illustrated to discriminate between BK and FK, and the area under the curve (AUC) was measured. From the ROC curve, Youden's index was used to obtain the sensitivity and the specificity. The accuracy of the model was also calculated. Statistical analysis was performed with IBM SPSS Statistics Version 23 (SPSS, Inc., Chicago, IL, USA).

Patient Classification and Characteristics
A total of 580 patients (420 male and 160 female) with 1330 images (with only one eye involved) were included. The average patient age was 55.4 years. According to the culture results, 346 patients (824 images) were classified as having BK and 234 patients (506 images) were classified as having FK. The final data set consisted of 388 patients (904 images) for training, 96 patients (212 images) for validation, and 96 patients (214 images) for testing. The distribution and characteristics of the patients are shown in Table 1.

Performance of Different Models
We evaluated the performance of the models by using the validation and testing data sets; the average accuracy was approximately 80%. Details regarding the accuracy, sensitivity, and specificity of all of the models are presented in Table 2. The rate of diagnostic accuracy for BK ranged from 79.6% to 95.9%, and that for FK ranged from 26.3% to 65.8%. DenseNets, EfficientNets, and InceptionV3 exhibited a similar performance; the average accuracy ranged from approximately 76% to 79%, and the diagnostic rates for FK were all between 56% and 66%. By contrast, ResNet achieved a diagnostic rate of over 90% for BK but below 50% for FK. BK-bacterial keratitis; FK-fungal keratitis; PPV-positive predictive value; NPV-negative predictive value.
DenseNet161 achieved the best performance in the prediction of BK and FK, with an AUC of the ROC curve of 0.85 for both BK and FK (Figure 3). The diagnostic accuracy for BK was 87.3%, and that for FK was 65.8%. A sample heat map generated with Grad-CAM for model visualization is presented in Figure 4. With the cropped corneal images, the model was able to identify the corneal infiltrations and focus on the pathology of MK while ignoring most of the reflected light on the cornea.

Discussion
In this study, we developed a deep learning model to differentiate MK into BK and FK. The model using cropped slit-lamp images of the cornea with white light illumination achieved an AUC of 0.85. The accuracy approached 80%, exceeding that of general ophthalmologists and is comparable to that of corneal specialists [9,10].
Early diagnosis of FK is important but challenging. No pathognomonic features can wholly support a physician's diagnosis, and delayed diagnosis can increase the difficulty of FK management, often necessitating surgical intervention and resulting in poor visual outcomes [5]. Culture-based methods are the current cornerstone for FK diagnosis; however, a time lag exists between patient presentation and diagnosis. Dalmon et al. reported that the clinical signs of BK and FK can be used to identify their causative organisms. In their study, 15 corneal specialists assessed 80 slit-lamp images and were able to correctly differentiate between BK and FK 66% of the time [11].
Few studies have applied AI and deep learning models for FK diagnosis by using slit-lamp images. Kuo et al. developed a slit-lamp image-based deep learning model for FK diagnosis. Although their work yielded promising results, the average reported accuracy was 69.4%, and the ROC curve was only 0.65; moreover, the diagnostic rate was lower than the rate reported by corneal specialists in their own study [10]. According to their findings, the wrong prediction of the model was due to incorrect focusing on the eyelid, eyelash, and sclera [10]. The use of fluorescein-staining images was reported in only one study; however, the researchers aimed to identify early corneal ulcers by recognizing point-like patterns, which could not differentiate fungal ulcers that were difficult to manage from other corneal ulcers [23]. Xu et al. reported diagnostic rates of 80% for MK, 53.3% for BK, and 83.3% for FK by using a deep sequential-level learning model with slit-beam slit-lamp images. The accuracy of their model exceeded that of ophthalmologists (49.3% ± 11.9%) for over 120 test images [9]. However, the model required sophisticated patch sampling over the cornea and the lesion, as well as the application of an additional sequential model, constituting a relatively complicated approach.
In the present study, our model achieved an average accuracy of approximately 80% and diagnostic accuracies of approximately 80% and 60% for BK and FK, respectively (Table 2), and we used approximately 1.5-times more training images for BK than for FK. To

Discussion
In this study, we developed a deep learning model to differentiate MK into BK and FK. The model using cropped slit-lamp images of the cornea with white light illumination achieved an AUC of 0.85. The accuracy approached 80%, exceeding that of general ophthalmologists and is comparable to that of corneal specialists [9,10].
Early diagnosis of FK is important but challenging. No pathognomonic features can wholly support a physician's diagnosis, and delayed diagnosis can increase the difficulty of FK management, often necessitating surgical intervention and resulting in poor visual outcomes [5]. Culture-based methods are the current cornerstone for FK diagnosis; however, a time lag exists between patient presentation and diagnosis. Dalmon et al. reported that the clinical signs of BK and FK can be used to identify their causative organisms. In their study, 15 corneal specialists assessed 80 slit-lamp images and were able to correctly differentiate between BK and FK 66% of the time [11].
Few studies have applied AI and deep learning models for FK diagnosis by using slit-lamp images. Kuo et al. developed a slit-lamp image-based deep learning model for FK diagnosis. Although their work yielded promising results, the average reported accuracy was 69.4%, and the ROC curve was only 0.65; moreover, the diagnostic rate was lower than the rate reported by corneal specialists in their own study [10]. According to their findings, the wrong prediction of the model was due to incorrect focusing on the eyelid, eyelash, and sclera [10]. The use of fluorescein-staining images was reported in only one study; however, the researchers aimed to identify early corneal ulcers by recognizing point-like patterns, which could not differentiate fungal ulcers that were difficult to manage from other corneal ulcers [23]. Xu et al. reported diagnostic rates of 80% for MK, 53.3% for BK, and 83.3% for FK by using a deep sequential-level learning model with slit-beam slit-lamp images. The accuracy of their model exceeded that of ophthalmologists (49.3% ± 11.9%) for over 120 test images [9]. However, the model required sophisticated patch sampling over the cornea and the lesion, as well as the application of an additional sequential model, constituting a relatively complicated approach.
In the present study, our model achieved an average accuracy of approximately 80% and diagnostic accuracies of approximately 80% and 60% for BK and FK, respectively (Table 2), and we used approximately 1.5-times more training images for BK than for FK. To alleviate the incorrect focusing reported in the previous study [10], we used cropped corneal images to train the model. We also evaluated the performance using slit-lamp images without cropping to train and test the model, but the average accuracy was decreased to approximately 70% (data not shown), which was comparable with the previous study [10]. The decreased performance may be due to inappropriate focusing on the area without clinically relevant features. (Figure 5). alleviate the incorrect focusing reported in the previous study [10], we used cropped corneal images to train the model. We also evaluated the performance using slit-lamp images without cropping to train and test the model, but the average accuracy was decreased to approximately 70% (data not shown), which was comparable with the previous study [10]. The decreased performance may be due to inappropriate focusing on the area without clinically relevant features. (Figure 5). The other previous model developed by Xu et al. achieved a higher model accuracy and a better diagnostic rate for FK than our model did; their overall accuracy was enhanced by the diagnostic accuracy for herpes simplex keratitis and other corneal diseases, which reached 93.3% and 90.0%, respectively. Furthermore, approximately 1.3-times more training images were used for FK than for BK. These results also indicate that the model accuracy was strongly influenced by the number of training images used, which supports our expectation that the integration of more images can help deep learning models become a robust tool for assisting early FK diagnosis [24].
We also tested various models by using different CNNs in our study. All of the models had an average accuracy between 76% and 80%. Because we applied five-fold crossvalidation, both the validation and test data sets were independent of the training data set; thus, we also considered the validation accuracy and selected the best model based on the average validation and test performance. Among all of the models, DenseNet161 achieved the highest FK diagnostic accuracy (65.8%) and a relatively high average accuracy (78.6%). DenseNet outperformed other CNNs because of its unique architectural design, which efficiently took the features of every layer and reused them in all the subsequent layers, thereby enhancing the model [20]. In comparison, ResNet101 had the highest average diagnostic accuracy (80.0%) for BK and FK; however, its diagnostic rate for FK was relatively low (49.1%). The diagnostic accuracy of ResNet101 also had the largest standard deviation, indicating that the results varied considerably between layers.
From the heat maps generated with Grad-CAM, the model could identify the pathology of infection and ignore artifacts from reflected light by using cropped images of the cornea alone. We noticed that when the slit-lamp images were not cropped, the model focused on regions outside the cornea, such as the eyelid or conjunctiva. Therefore, we cropped the cornea by training a U 2 segmentation model. To differentiate FK from BK, the model focused on the feature of corneal infiltration. The other previous model developed by Xu et al. achieved a higher model accuracy and a better diagnostic rate for FK than our model did; their overall accuracy was enhanced by the diagnostic accuracy for herpes simplex keratitis and other corneal diseases, which reached 93.3% and 90.0%, respectively. Furthermore, approximately 1.3-times more training images were used for FK than for BK. These results also indicate that the model accuracy was strongly influenced by the number of training images used, which supports our expectation that the integration of more images can help deep learning models become a robust tool for assisting early FK diagnosis [24].
We also tested various models by using different CNNs in our study. All of the models had an average accuracy between 76% and 80%. Because we applied five-fold cross-validation, both the validation and test data sets were independent of the training data set; thus, we also considered the validation accuracy and selected the best model based on the average validation and test performance. Among all of the models, DenseNet161 achieved the highest FK diagnostic accuracy (65.8%) and a relatively high average accuracy (78.6%). DenseNet outperformed other CNNs because of its unique architectural design, which efficiently took the features of every layer and reused them in all the subsequent layers, thereby enhancing the model [20]. In comparison, ResNet101 had the highest average diagnostic accuracy (80.0%) for BK and FK; however, its diagnostic rate for FK was relatively low (49.1%). The diagnostic accuracy of ResNet101 also had the largest standard deviation, indicating that the results varied considerably between layers.
From the heat maps generated with Grad-CAM, the model could identify the pathology of infection and ignore artifacts from reflected light by using cropped images of the cornea alone. We noticed that when the slit-lamp images were not cropped, the model focused on regions outside the cornea, such as the eyelid or conjunctiva. Therefore, we cropped the cornea by training a U 2 segmentation model. To differentiate FK from BK, the model focused on the feature of corneal infiltration.
In this study, we demonstrated the promising role of AI in the diagnosis of infectious keratitis through the use of images only. Although microbiological culture remains the gold standard for FK diagnosis, early clinical detection of potential FK could aid the subsequent initiation of empirical treatment or referral management. Because most ophthalmologists and general practitioners may not have extensive experience in FK diagnosis, the deep learning-based model may help clinicians improve their diagnostic accuracy and subsequently initiate early and appropriate treatment. Moreover, AI can provide disease screening or telemedicine in places where prompt medical or ophthalmologic evaluation is infeasible.
This study has some limitations. First, we excluded patients with poor-quality slitlamp images before training the model. However, poor-quality images are encountered in daily clinical practice, and the model performance may have thus been affected by factors such as patient cooperation, light reflection in the images, and photographer experience. Second, although corneal images are relatively easy to capture, environmental factors may render their real-world application challenging [25]. Third, fewer FK than BK cases and images were documented, which caused an imbalance in the data set used for training the model to differentiate between BK and FK, and subsequently affected the model's accuracy [15]. Fourth, selection bias could occur in a referral medical center where a proportion of patients have received medical treatment before presentation; treatments could alter lesion features and affect the model's learning process and performance. Fifth, we did not perform patient matching between the training, validation, and testing groups; thus, differences in clinical characteristics might also have affected the performance of the model. Sixth, we did not integrate pertinent clinical information such as risk factors for infectious keratitis, which are indispensable for clinical diagnosis. Finally, the model's function lies in assisting in the differentiation of FK from BK, and we did not subclassify the dataset to different pathogens, which may have different clinical characteristics. Viral and parasitic keratitis were not included in this study, either. In clinical practice, cultures remain crucial for final species identification.

Conclusions
In conclusion, we developed a deep learning model for differentiating between FK and BK. The accuracy of the model was better than that previously reported for both general ophthalmologists and corneal specialists. Owing to the high accessibility of corneal images, we anticipate that the inclusion of more images would help deep learning models become a robust tool for aiding in early FK diagnosis.

Informed Consent Statement:
The requirement of informed consent from patients was waived because all data were deidentified.

Data Availability Statement:
The data presented in this study are available upon request. The data are not publicly available due to the data security policy of Chang Gung Memorial Hospital, Taiwan.