Separate Detection of Stromal and Epithelial Corneal Edema on Optical Coherence Tomography Using a Deep Learning Pipeline and Transfer Learning

: The accurate detection of corneal edema has become a topic of growing interest with the generalization of endothelial keratoplasty. Despite recent advances in deep learning for corneal edema detection, the problem of minimal edema remains challenging. Using transfer learning and a limited training set of 11 images, we built a model to segment the corneal epithelium, which is part of a three-model pipeline to detect corneal edema. A second and a third model are used to detect edema on the stroma alone and on the epithelium. A validation set of 233 images from 30 patients consisting of three groups (Normal, Minimal Edema and important Edema) was used to compare the results of our new pipeline to our previous model. The mean edema fraction (EF), deﬁned as the number of pixels detected as edema divided by the total number of pixels of the cornea, was calculated for each image. With our previous model, the mean EF was not statistically different between the Normal and Minimal Edema groups ( p = 0.24). With the current pipeline, the mean EF was higher in the Minimal Edema group compared to the Normal group ( p < 0.01). The described pipeline constitutes an adjustable framework for the detection of corneal edema based on optical coherence tomography and yields better performances in cases of minimal or localized edema.


Introduction
Corneal edema is a feature shared by several ophthalmic conditions. It induces a loss of visual acuity and quality and, in advanced cases, pain and photophobia. Its early diagnosis usually ensures a better care for the patient through faster treatment. Its most common cause is endothelial failure induced by Fuchs endothelial corneal dystrophy (FECD) [1]. Today, no specific tool exists to detect corneal edema. Its evaluation is mainly clinical through slit lamp examination [2], helped by the measurement of central and peripheral corneal thickness [3,4]. These methods are insufficient in cases of subclinical edema or naturally thin or thick corneas. Recently, corneal densitometry [4] has been described as an alternative, showing increased densitometry at different depths of the cornea in case of edema. However, it currently lacks standardization and is not specific to corneal edema. Scheimpflug backscatter images can also be used to enhance the visualization of areas of accentuated loss of corneal endothelial cells in FECD [5]. Subclinical edema in cases of FECD can be detected with Scheimpflug tomography using a classification described by Sun et al. [6][7][8] involving the displacement of the thinnest point, a loss of parallel isopach and a focal depression of the posterior surface. This classification is adequate when the edema is caused by an endothelial dysfunction, but has not been tested in other situations and probably is not optimal in cases of anterior corneal edema. Its interpretation is also quite subjective and requires some amount of training to obtain accurate results. A recent study showed the potential to use a combination of densitometry and Scheimpflug tomographic features to predict corneal edema resolution [9]. Deep learning applied to corneal optical coherence tomography (OCT) was recently suggested as new approach toward corneal edema detection. The rationale for using OCT images for corneal edema detection is justified by the quasi-histological images obtained by OCT. Indeed, specific signs of chronic edema have been described histologically [10], namely intra and sub-epithelial bullae, sub-epithelial fibrosis and Descemet membrane thickening, all of which are visible on OCT images. Measuring the three-dimensional endothelium/Descemet membrane complex thickness has recently been described as a promising technique to diagnose early and late FECD [11] and graft rejection [12] using OCT. Enlarged spaces between collagen lamellae and their disorganization are also visible in cases of corneal edema. OCT densitometry studies also found the hyperreflectivity of the epithelium and stroma in corneal edema [13]. Thus, applying deep learning models to OCT images has the potential to improve corneal edema detection. Eleiwa et al. [14] described a classifier based on OCT images which is able to detect clinical corneal edema in cases of advanced FECD. This approach provides no information about the location of edema on the images and is not built to be generalized to other causes of edema. We recently described a different approach [15] to detect edema at the pixel level on OCT images using a segmentation algorithm. Despite having a better potential for generalizability, we observed certain limitations to our model through error analysis. Specifically, we believe that the simultaneous analysis of the epithelium and the stroma might hamper its diagnostic capabilities in cases of minimal stromal edema. In this study, we address this problem and focus on improving the deep learning methodology for corneal edema detection based on OCT images.

Pipeline Description
We developed a deep learning pipeline as described in Figure 1, using three models, each having a U-Net architecture.

Model Architectures
Epi Edema Detector and Stroma Edema Detector have the same U-Net architecture as our previous model. Epi Detector has the same architecture, except that it only has two output channels instead of three to match the two classes encountered during training.
Full model architectures are described in Supplementary Table S1.

1.
A first model (Epi Detector) is used to detect the epithelium in the original image.

2.
Then, this segmentation is used to create a modified version of the original image in which all pixels of the detected epithelium are set to 0 (black). The modified image obtained in step 2 goes through a third model (Stroma Edema Detector) which detects edema on the whole stroma. 5.
Finally, the results from steps 3 and 4 are combined to obtain the edema detection results on the whole image.

Model Architectures
Epi Edema Detector and Stroma Edema Detector have the same U-Net architecture as our previous model.
Epi Detector has the same architecture, except that it only has two output channels instead of three to match the two classes encountered during training.
Full model architectures are described in Supplementary Table S1.

Development Sets
For the models Epi Edema Detector and Stroma Edema Detector, all 199 images from our previously described [15] development set were used. Briefly, the development set was composed of OCT (Avanti XR, Optovue, Fremont, CA, USA) images from normal corneas (n = 88) and images from patient who underwent Descemet membrane endothelial keratoplasty (DMEK) surgery the day before (n = 111). At this point, we expected edema of the whole cornea on all images, all pixels representing the cornea were segmented manually and labeled either as "normal" or "edema". Pixels not belonging to the cornea were labeled as "background". It should be noted that Avanti provides an axial resolution of 5 µm.
The training and test sets were also identical, as for our previous model. For Stroma Edema Detector, the images were first modified using Epi Detector to remove the epithelium.
For the Epi Detector model, the development set was a subset of 11 images (6 normal and 5 edema corneas) selected from our previous development set. Each image was selected from different patients. The training and test set for Epi Detector were composed of the images selected from the original training set or test set, respectively. The training set had 9 images (5 normal and 4 edema corneas) and the test set had 2 images (1 normal and 1 edema cornea).
New ground truth masks were segmented manually on these images, delineating the epithelium only, and the rest of the image was considered as the background. Both normal and edematous epithelium were grouped in a single class, "epithelium".
No pre-processing technique was applied to any image in the development set.

Model Training Transfer Learning
Epi Detector used our previous model [15] as a starting point for training. Only the last layer was replaced with two output channels and re-initialized weights to account for the two classes instead of three. Epi Edema Detector was trained from scratch (with no pre-training). Stroma Edema Detector was trained using Epi Edema Detector as a starting point. No weight of any model was frozen during training.
For all models, we used a cross-entropy loss function, a fixed learning rate of 0.1, a stochastic gradient descent optimizer and a batch size of 2. Epi Detector and Stroma Edema Detector were trained using whole images, whereas Epi Edema Detector was trained using random crops of 120 × 120 pixels in an attempt to make it more sensitive to small, localized edema. Data augmentation was performed in the same way for each model with random horizontal flips and random rotation within a range of −15 • to +15 • . Each image was presented 3 times in each epoch, with their brightness set to 0.5, 1 and 1.5 times their original intensity. Individual Dice coefficients for each class were used during training for evaluation.
All code was written in python 3.6 using Pytorch library for neural network construction and training.

Validation Set
The validation set included 3 groups of patients to test our hypothesis that our new model performed better than our previous one in cases of minimal edema. All images and patients' data were collected retrospectively from our database. For each patient, we used all available scans from the 8 radial scans constituting the "Pachymetry" or "Pachymetry Wide" examination. The first group (Normal) contained 75 images from 10 patients with clinically normal corneas and no known corneal condition. The second and third groups (respectively called Minimal and Important Edema) contained 80 and 78 images from 10 FECD patients, each exhibiting various amounts of edema scheduled for DMEK surgery. To reduce the subjectivity in preoperative edema grading, we did not rely on any clinical parameter-rather, it was quantified by calculating the differential central corneal thickness (DCCT) between before and 6 months after surgery. Inclusion criteria for the Minimal Edema group were a DCCT below 100 microns and failure to detect corneal edema by our previous model, whereas for the Important Edema group, patients had a DCCT value above 150 microns.
The first and third group served as a control group to ensure that the new model does not perform worse than the previous one in easier cases (no edema or important edema).

Image Processing
All "Pachymetry Wide" images were cropped laterally to their central 1020 pixels to match the width of the 6 mm "Pachymetry" scans. Thus, all images were 1020 × 640 pixels in size. No other pre-processing or post-processing was performed on the validation set images.

Evaluation Metrics
For each patient, each available radial scan was processed separately by the model. The results were then combined to calculate the mean edema fraction (EF). EF was defined as the sum of all pixels detected as edema divided by the sum of all pixels detected as cornea for each scan averaged over all images of each patient.

Visual Representation of the Results
Color heat maps using the same methodology [15] as previously were used to visualize the output probabilities of the model on the OCT scans. Hot colors indicate higher probabilities of edema.

Statistical Analysis
Pairwise comparisons of EF values of each group were conducted using a Mann-Whitney U test. Bonferroni adjustment was applied to account for multiple comparisons. Results were considered statistically significant for p values < 0.05. All statistics, calculations and figures were done in Python 3.6.

Results on the Development Sets
Individual Dice coefficients for both the training and test sets during training are shown in Figure 2 for Epi Detector. For both Epi Edema Detector and Stroma Edema Detector, individual Dice coefficients during training are shown in Figure 3 for the training set and in Figure 4 for the test set.

Results on the Development Sets
Individual Dice coefficients for both the training and test sets during training are shown in Figure 2 for Epi Detector. For both Epi Edema Detector and Stroma Edema Detector, individual Dice coefficients during training are shown in Figure 3 for the training set and in Figure 4 for the test set. Epi Detector was trained for 12 epochs and the final Dice coefficients for the "Epithelium" class were 0.891 and 0.896 in the training and test set, respectively.
Epi Edema Detector was trained for 158 epochs. The final Dice coefficients for the "Edema", "Normal" and "Background" classes were, respectively, 0.908, 0.989 and 0.989 in the training set and 0.982, 0.985 and 0.990 in the test set.   Epi Detector was trained for 12 epochs and the final Dice coefficients for the "Epithelium" class were 0.891 and 0.896 in the training and test set, respectively.
Epi Edema Detector was trained for 158 epochs. The final Dice coefficients for the "Edema", "Normal" and "Background" classes were, respectively, 0.908, 0.989 and 0.989 in the training set and 0.982, 0.985 and 0.990 in the test set.
Stroma Edema Detector was trained for 12 epochs. Dice coefficients for the "Edema", "Normal" and "Background" classes were, respectively, 0.986, 0.991 and 0.994 in the training set and 0.989, 0.985 and 0.994 in the test set. Stroma Edema Detector was trained for 12 epochs. Dice coefficients for the " "Normal" and "Background" classes were, respectively, 0.986, 0.991 and 0.994 in ing set and 0.989, 0.985 and 0.994 in the test set.

Results on the Validation Set
Patients' characteristics and main EF results in each group are summarized in Table 1, and the EF results are shown in Figure 5.   Figure 7, the subepithelial fibrosis (hyperreflectivity) is not detected as edema and no epithelial edema is detected over that region.   Figure 7, the subepithelial fibrosis (hyperreflectivity) is not detected as edema and no epithelial edema is detected over that region.

Discussion
We presented an improved version of our deep learning model for the detection of corneal edema at the pixel level on OCT scans.
Apart from our previous work, only one other study [14] tackled the problem of corneal edema detection on OCT scans using deep learning. They used a convolutional network to perform classification based on OCT images of normal cases, mild FECD (with no clinical edema) and severe FECD (with clinical corneal edema). This approach does not give any information on the location of edema on the images. It is also specific to FECD and is not designed to detect subclinical edema. Their results are diificult to compare to ours as there is no objective quantification of edema. Nonetheless, they achieved good discriminative performance between all groups (AUCs above 0.997). Only one other study quantified edema in the same manner as we did, using differential central corneal thickness [9]. The authors described a statistical model to predict edema resolution (in µm) after DMEK surgery with Scheimpflug-based tomographic features and corneal densitometry. They achieved an AUC of 0.97 to differentiate patients having less than 50 µm or more than 50 µm of edema resolution. In our work, Figure 5 shows there is no overlap between EF values of the Important Edema group and the other groups. Additionally, there is little overlap between the minimal edema group (<100 µm) and the normal group suggesting the good discriminative performance of our approach.
We used the exact same development set as for our previous model to show that we could improve its performance by better engineering the data rather than adding more training samples. Nonetheless, a larger training set would certainly improve its performance.
Our previous model showed good performance in cases of full thickness corneal edema, but had some limitations in cases of minimal or localized edema. This was caused by the nature of the training data and modalities. Indeed, we used full images (no crop) of completely normal corneas and completely edematous corneas. Thus, our first model was not exposed to cases of localized edema during training and was more likely to fail in such cases. As it is challenging even for corneal experts to find a precise limit between normal and edema cornea when both coexist in the same image, we deliberately chose to include images of full corneal edema and to label all pixels of the cornea as edema. This ensured a good quality of training data and avoided the so-called problem of noisy labels. Nevertheless, we suspected that a strongly normal appearance could suppress the signal of nearby mild edema or inversely. Specifically, a normal epithelial aspect could induce false negatives of mild underlying stromal edema, and inversely, false aspects of epithelial edema could induce false positives in the underlying stroma.
Edema can be localized to the deep stroma in cases of mild endothelial failure or to the epithelium and the anterior stroma in cases of advanced endothelial failure or elevated intra ocular pressure (IOP). A model dedicated to edema detection should be able to perform well in all possible scenarios and not only in cases of full-thickness edema. Therefore, our proposed solution to the observed limitations of our previous model was to delete the epithelium on OCT images and train a new model to detect edema on the stroma alone on the one hand and on the epithelium on the other hand. To do so, we first created a model using transfer learning, capable of detecting the epithelium. Transfer learning is the process of using a model trained for a certain task and retrain it to perform a different task. It usually uses the features learned for the first task to speed up the training process for the second task. In our case, our previous model had been trained on a larger dataset of OCT scans to label pixels as "normal", "edema" or "background". Conceptually, we can imagine that our previous model already knew how to encode corneal OCT scans and was familiar with its specific features and that it needed adjustments to reassign the epithelial pixels to the "epithelium" class rather than either the "normal" or "edema" class. This allowed us to use a very small dataset of 11 images which were quickly manually labeled and to train the model for 12 epochs only. Epithelium detection is classically performed using image processing [16]. It is, however, a much longer process to design an efficient image processing algorithm compared to labelling a few examples and retraining an existing model. The resulting model yielded sufficient performances for its intended use. This is an example of how transfer learning can be used to easily create specific tools to improve existing models.
Regarding the Epi Edema Detector model, we first tried to train it with modified images containing only the epithelium (deleting all stromal pixels), but this approach failed to produce a satisfactory result. This could be explained by the aspect of epithelial edema on OCT. Important epithelial edema can easily be distinguished from the normal epithelium as it exhibits specific features such as overall hyperreflectivity and intraepithelial bullae. However, mild epithelial edema presents a very similar aspect to normal epithelium, only slightly more hyperreflective. As OCT image intensity is not normalized, it can be globally increased or decreased depending on the examination setting. Local variation in intensity is also possible due to the position of the incident beam and possible shadows (from eyelashes, for example). Therefore, a difference in epithelium intensity alone is not specific enough to differentiate normal epithelium from mild epithelial edema. It appears that some context relating to the underlying stroma helps in achieving a good detection performance of the epithelial edema. Using random crops of the original training examples successfully improved the detection of localized edema. In the selected examples (Figures 6-8), no epithelial edema could be detected. It is hard to know if these are true negatives or false negatives for epithelial edema. Indeed, in cases of mild edema caused by endothelial failure, or in cases of subepithelial fibrosis, it is possible that the epithelium does not exhibit any edema. Nevertheless, in the specific case of endothelial failure, the detection of epithelium edema is of limited interest as it only appears in late stages and can be detected clinically. However, the proposed pipeline constitutes a complete framework for the detection of corneal edema in all clinical scenarios. Components of the pipeline can now be adjusted independently through error analysis and improving training data and process. It is certainly easier to optimize independent models designed for simpler problems.
We chose to train Stroma Edema Detector using Epi Edema Detector as a starting point, as the features learned on the original training set could probably be re-used for the detection of stromal edema alone. Indeed, only 12 more epochs were needed to achieve good results on the modified images with the epithelium removed.
Regarding our training data, it should be noted that corneal opacities visible in edematous corneas were also labeled as edema. Since this version of the model can only classify pixels as "normal" or "edema", when opacities exist in an image, their class will usually be inferred from the adjacent pixels. Hence, they could be labeled as "normal" when located in a non-edematous cornea and as "edema" when surrounded by edematous pixels. Future versions of the model could include a separate class for corneal opacities. Currently, the model has been developed only with the Avanti images. It is therefore not usable with any other device's images as is. Re-training it with images from other devices could certainly improve the generalizability of the results. However, lower image resolutions might reduce its performance in cases of very minimal edema.
We choose to use DCCT as an objective quantitative measurement of edema instead of clinical classifications which are usually very subjective and tend to oversee minimal edemas. This approach has some limitations. First, it quantifies edema only in the three central millimeters of the cornea and is not suitable for peripheral localized edema. However, in this study, we included patients with FECD for which edema usually begins in the central cornea. Therefore, it is an acceptable approximation for this use case. Pointby-point differential corneal thickness of the whole cornea would be more adequate to evaluate our model's precision in the peripheral cornea and in other clinical situations such as endothelitis or corneal graft rejection. Second, the Descemet membrane (DM) can be thickened in FECD. This would overestimate the true quantity of edema. It would be interesting to subtract the preoperative DM's thickness to the DCCT to obtain a more precise quantification of edema.
Future studies should determine the optimal post-processing methodology to provide the most clinically valuable information from the model's results on each individual radial scan. Indeed, an en-face map showing the location and importance of edema would be an interesting addition for clinical use. Simultaneous use of all scans of a given eye or of a 3D volume could also improve the results.
The small size of our validation set does not ensure that the current version of the model is robust enough to be usable in clinical practice. The goal of this study was more to describe a new methodology rather than a final model. Further studies should be conducted to assess our model's performance in different clinical situations. Specifically, it tended to produce more false positive results in the normal cases. This should be verified on a larger dataset. Additionally, the proposed pipeline is quite computational expansive as it is composed of three large convolutional networks. Once optimized, it could nonetheless be significantly simplified for production using techniques such as model pruning or knowledge distillation.

Conclusions
This work highlights the bias induced by simultaneously analyzing the epithelium and the stroma. This is an important step toward accurate corneal edema detection based on OCT images using deep learning. We believe our suggested pipeline addresses this problem and constitutes an adjustable framework whose components can be tuned separately through error analysis. In future works, it would be interesting to combine whole corneal thickness and epithelial thickness mapping measurements to the model to further increase its performance. Finally, our work also underlines the possible use of transfer learning to easily engineer smaller parts of more complicated deep learning solutions.