Clinical-Evolutionary Staging System of Primary Open-Angle Glaucoma Using Optical Coherence Tomography

Background: Primary open-angle glaucoma (POAG) is considered one of the main causes of blindness. Detection of POAG at early stages and classification into evolutionary stages is crucial to blindness prevention. Methods: 1001 patients were enrolled, of whom 766 were healthy subjects and 235 were ocular hypertensive or glaucomatous patients in different stages of the disease. Spectral domain optical coherence tomography (SD-OCT) was used to determine Bruch’s membrane opening-minimum rim width (BMO-MRW) and the thicknesses of peripapillary retinal nerve fibre layer (RNFL) rings with diameters of 3.0, 4.1 and 4.7 mm centred on the optic nerve. The BMO-MRW rim and RNFL rings were divided into seven sectors (G-T-TS-TI-N-NS-NI). The k-means algorithm and linear discriminant analysis were used to classify patients into disease stages. Results: We defined four glaucoma stages and provided a new model for classifying eyes into these stages, with an overall accuracy greater than 92% (88% when including healthy eyes). An online application was also implemented to predict the probability of glaucoma stage for any given eye. Conclusions: We propose a new objective algorithm for classifying POAG into clinical-evolutionary stages using SD-OCT.


Introduction
A correct classification is based on the data to which, of a set of categories, a new observation belongs. The main difficulty encountered in medical science is identifying a criterion that can frame all forms of a disease such as glaucoma in a coherent system to offer direct therapeutic suggestions [1]. The objective of this study was to identify an essential element that allows us to develop a clinical-evolutionary classification system for primary open-angle glaucoma (POAG).
Glaucoma is considered one of the main causes of blindness, and based on this relationship, the WHO specifically recommends early diagnosis and treatment for blindness prevention [2].
The currently accepted classification in Europe was proposed by the European Glaucoma Society in 2017 [1].
The visual field (VF) has provided information about the evolution of glaucoma and its diagnosis, but its subjective nature and late significance in relation to the onset of this disease, which requires loss of 40% of retinal ganglion cells, limit its use as an early diagnosis element [3].

Spectral Domain Optical Coherence Tomography
SD-OCT was performed on all patients on one side to avoid problems related to a correlation between the eyes of the same patient. OCT was performed with the Heidelberg Spectralis device (Heidelberg Engineering, GmbH, Heidelberg, Germany, Software Heidelberg Eye Explorer ver. 6.7c).
The glaucoma program provided by the manufacturer was used, which has a patented anatomical positioning system (APS) with a series of patterns for scanning of the optic nerve head, the RNFL and the macular ganglion cell layer. The program compares the eyes of patients with normalised baseline values for normal eyes. All the participants underwent an examination of the thicknesses of peripapillary RNFL rings with diameters of 3.0, 4.1, and 4.7 mm centred on the optic nerve and BMO-MRW.
No segmentations were performed manually for the RNFL values. Only small corrections were made by the same experienced operator (A.P.B.) to readjust Bruchs membrane endings during the BMO-MRW examination.
The absolute values of healthy, glaucomatous and hypertensive eyes prior to normalisation indicated by Heidelberg Engineering and obtained from the seven sectors of the BMO-MRW and the three peripapillary rings (G-TS-T-TI-NS-N-NI), were exported to a Microsoft Excel (version 2016; Microsoft Corporation, Redmon, WA, USA) table. These absolute values were adjusted for age and the area of the papillary cup, and were normalised following the Heidelberg Engineering indications using the following formula provided by Heidelberg Engineering: where: x i is the value of individual i in variable x.
x is the mean of variable x. s x is the standard deviation of variable x. e i is the age of individual i. e is the mean age of healthy individuals at baseline. b xe is the slope of the regression line of variable x on age. a i is the area of the BMO-MRW of individual i. a is the mean area of the BMO-MRW of healthy individuals at baseline. b xa is the slope of the regression line of variable x over the area of the BMO-MRW. z i is the standardized value of individual i in variable x.
A descriptive summary of the main variables in the study was given by groups (healthy and glaucomatous eyes), including the mean and standard deviation.
The normality of RNFL variables was proven with the Kolmogorov-Smirnov test, and the homogeneity of variances was tested with Box's M test using an α level of 0.05.
Pearson's correlation coefficient was used for the correlation analysis of the variables. A strong correlation was considered for (r > 0.75) and a very strong correlation for (r > 0.9). The principal components analysis (PCA) was performed to reduce data dimensionality and to determine which combination of variables explained most of the variability of data. The PCA is a statistical method used to reduce data dimensionality by transforming the original variables (usually correlated) into a new set of linearly uncorrelated variables. The first principal component is a linear combination of variables that explain the widest variability in the data (maximum variance). The second principal component is a linear combination of variables with maximum variance, which is orthogonal to the first component [14].
The k-means method was followed to cluster eyes into glaucoma stages [14]. This algorithm groups eyes into k clusters or classes according to the Euclidean distance from the eye to the cluster centroids in the n-dimensional space of the variables. Every eye is assigned to the cluster with the closest centroid. The initial choice of cluster centroids was random. The number of clusters k was decided according to elbow criteria [15] by observing the reduction in intra-group variability when increasing the number of clusters. This criterion chooses the number of clusters where reduction in intra-group variability significantly decreases.
A linear discriminant analysis (LDA) [14] was used to classify eyes into the clusters, previously defined by the k-means method. LDA is a statistical method used to identify a set of discriminant functions, all of which are uncorrelated linear combinations of the variables, that maximizes the difference between classes and separates the classes the best [14]. The number of discriminant functions created is the number of classes minus one. The classification power of classifiers was assessed by cross-validation, and by computing sensitivity, specificity and the overall accuracy measures from the confusion matrix.
The data and all the results of this study are publicly available at https://github.com/asalber/ glaucoma-staging and http://aprendeconalf.es/glaucoma-staging/.

The Sample and Variables
For the study sample, 1001 individuals were selected (235 with glaucoma; 766 without glaucoma). To avoid a natural correlation between the eyes of the same individuals, only left eyes were considered. For each eye, the main variables measured through OCT were the thickness of the BMO-MRW at the temporal inferior (BMO-MRW.  Table 1 shows the mean and the standard deviation of the variables herein considered.

Correlation Analysis
The correlation analysis ( Figure 1) showed a clear correlation pattern by sectors. A very strong Pearson correlation (r > 0.9) was observed among the thickness of the BMO-MRW and RNFL 3.5, 4.1 and 4.7 by sectors. A strong correlation (r > 0.75) was also found among the sectors of the BMO-MRW, except for the BMO-MRW.G, which very strongly correlated with the other sectors. A very weak correlation (r < 0.2) was found between the T sector and the N, NI and NS sectors.

Principal Component Analysis
As most of the variables were highly correlated, a PCA was performed to reduce the dimensionality and to determine which combination of variables explained most of the variability of the data. Figure 2 shows that the first principal component explained almost 60% of the total variability of the data, while the second and third principal components explained only 11% and 9% of the total variability.
Plotting the subjects' eyes on the first two principal components ( Figure 3) revealed that healthy eyes could easily be discriminated from glaucoma eyes along the first principal component (horizontal dimension), with a small overlapping area, which was not possible at all along the second principal component (vertical dimension).
According to Figure 4, the variables that more contributed to the first principal component were sectors G and TI of BMO-MRW and RNFL.

Cluster Analysis
To define the severity glaucoma stages, the k-means cluster algorithm was applied to the BMO-MRW and RNFL variables of eyes with glaucoma. Based on Figure 5, which shows the reduction in intra-group variability when increasing the number of clusters, we decided to create four clusters as a reduction in intra-group variability was not important for more than four groups (elbow criteria). We named these clusters I, II, III, and IV, which were in the order of increasing severity.   Figure 6a shows the clusters on the axes corresponding to the first two principal components. As we can see in this chart, the four clusters can be separated almost perfectly along the x-axis of the first principal component.  To determine whether the detected classes were related to glaucoma severity, that is, whether a gradient of severity existed from cluster I to cluster IV, we added healthy eyes to the previous plot. In Figure 6b, we see that healthy eyes were plotted to the right of the x-axis (corresponding to the first principal component), overlapping the eyes of cluster I and partially overlapping the eyes of cluster II. This finding is logical as cluster I contained eyes in the early glaucoma stages, which do not differ that much from healthy eyes. Clusters III and IV are located at the other end of the x-axis far from the healthy eyes, which correspond to eyes with severer glaucoma. We can see that cluster IV do not overlap with healthy eyes and the overlap of cluster III is minimal, which comes over more clearly in Figure 7 where the distribution of the first principal component for every glaucoma stage and healthy eyes is depicted.

Linear Discriminant Analysis
After defining clusters, we applied LDA to determine whether the four defined stages can be easily predicted. The LDA was applied to the principal components of the BMO-MRW rims and RNFL sectors as the original variables were highly correlated. Previously, we assessed the normality and homogeneity of variances of the principal components.
As shown in Tables 2a and 3a, the four glaucoma classes could be predicted with very high sensitivity (>0.9) and specificity (>0.95), and 92.9% of the eyes were correctly classified. We repeated the same analysis including healthy eyes. As expected, the classification power by LDA decreased, although this decrease was slight as we were still able to correctly classify 89% of eyes . Tables 2b and 3b show the sensitivity, specificity, and balanced accuracy of the LDA classification. While specificity remained very high (except for healthy eyes), sensitivity decreased substantially for classes I and II because these two classes overlapped mostly healthy eyes. When two classes overlap, the LDA tends to classify an individual into the class with more instances, that is, the most likely class, which is the class of healthy eyes in this study.

Simplifying the Model
After defining the glaucoma classes and assessing whether these classes could be predicted fairly well, we examined whether these classes could be defined without using all the BMO-MRW rims and RNFL sectors, but using only the most important variables because we observed a strong correlation among most of them. So we repeated the LDA for different combinations of the BMO-MRW rims and RNFL sectors.
As the potential number of variable combinations was computationally impracticable, we focused on the variables that more contributed to the first principal component, which was the direction along which the separation of classes was clearer. These variables were sectors G and TI of BMO-MRW and RNFL, as shown in Figure 4a. Table 4 summarizes a comparison of the different assessed models. The model with the highest overall accuracy was the model with rims G and TI of the BMO-MRW and all the RNFL sector variables (with and without including healthy eyes). However, the model that employed only the G and TI rims of BMO-MRW and the RNFL 3.5 sector achieved almost the same overall accuracy both with and without including healthy eyes. Regarding the sensitivity, specificity, and overall accuracy of this model, Tables 5 and 6 shows that no big changes appeared in the model's overall accuracy compared to Table 3 when all the variables were included. Thus, we decided to use only the G and TI rims of the BMO-MRW and RNFL 3.5 sectors to classify eyes into glaucoma stages. Table 7 shows the coefficients of the model's linear discriminant functions with these variables. Table 8 provides the main statistics for the distribution of these variables into glaucoma stages. Figure 8 graphically depicts the 95% confidence intervals for the means of these variables by glaucoma stages. As shown in this chart, classes are statistically well defined because for each G or TI sector variable, significant differences (α < 0.05) were found among the means of all the stages, except for the means of the healthy and stage I eyes for the TI sector of the RNFL3.5. Another interesting observation was that the mean of the G sector of the RNFL3.5 for stage I glaucoma was higher than the mean of healthy eyes, while the mean for the G and TI rims of the BMO-MRW was lower. As stage I glaucoma overlapped mostly healthy eyes, this result reveals that the BMO-MRW is a better indicator when distinguishing stage I glaucoma eyes from healthy eyes. Table 6. The sensitivity, specificity and balanced accuracy of classification into glaucoma stages using LDA with the G and TI sectors of the BMO-MRW and RNFL 3.5.   Finally, in order to render this staging system operational, we developed an online application (app) that implemented this classification model. Figure 9 shows a screenshot of the app. In this simple app, the user must provide a subject's age and the BMO-MRW area, which are required for standardization, and the measurements for rims G and TI of the BMO-MRW and RNFL 3.5 sectors so that the app returns the glaucoma stage prediction for the eye and the probability for each stage. The app also plots the position of the eye on the two first principal components' planes in relation to the clusters of the different stages. The app is currently available at https://asalberapps.shinyapps.io/ Glaucoma-Staging-System/.

Discussion
Even after OCT emergence, one constant in the attempt made to classify POAG was the VF. Our study did not consider the VF to be a classification element given its subjective nature because of arbitrary limits of exclusion according to fixed losses, false-positives, cognitive impairment and difficult collaborations. VF-based classifications have been proposed, and many reviews of these classifications have been published [16][17][18]. A summary of all of these reviews may yield an inevitable conclusion. The presence of glaucomatous optic neuropathy increases with staging severity for all systems. However, different systems led to different severity stages [18].
Other proposals for the diagnosis and classification of POAG have recently been presented based on OCT angiography [19][20][21][22][23]. All of these proposals, and regardless of them being based on retrospective reviews of articles [23] or personal work, the potential of this technique is revealed, but they also introduce uncertainty due to inconsistent criteria in glaucoma diagnosis termes, especially the glaucoma classification. We believe that OCT angiography introduce elements such as anatomical anomalies, exudates, haemorrhages, and the distorting effect of the vessels on the nasal side of the papilla, which can lead to misinterpretation. Therefore, we considered the actual anatomical shape of the papilla limits, BMO-MRW [24], and the normalized values of the BMO-MRW and RNFL to be objective, accessible, and reproducible variables to propose a clinical-evolutionary glaucoma classification system.
The diagnostic accuracy of the sectoral and total analysis of RNFL and/or BMO-MRW has also been determined in glaucoma. Danthurebandara et al. [25] used an independent OCT normative database and classified eyes as glaucomatous if their BMO-MRW or RNFL values went below the normative limits of 1%, 5% or 10% of the normative database by using all the measurements (total analysis) or the sectoral means (sectoral analysis). They reported that at a normative limit of 1%, the sectoral analysis of BMO-MRW gave 87% sensitivity and 92% specificity, while the total analysis yielded 88% sensitivity at the same specificity (92%). For RNFLT, the sectoral analysis yielded 85% sensitivity and 95% specificity, while the total analysis gave 83% sensitivity at the same specificity (95%). The results for the 5% and 10% normative limits yielded lower specificity, but higher sensitivity. The authors concluded that both analyses, the sectoral and total, had similar diagnostic accuracies.
Fan et al. [26] compared the diagnostic ability between three-dimensional (3D) neuroretinal edge parameters (including BMO-MRW) and two-dimensional (2D) parameters (including RNFL thickness) using SD-OCT. They concluded that 3D parameters have the same or better diagnostic ability than 2D parameters. Among the three parameters of the 3D edges (Minimum Distance Band (MDB), BMO-MRW-MRW), no significant differences were found in diagnostic capacity (false detection rate >0.05 with 95% specificity).
Authors like Zheng et al. [27] have also worked with the inferotemporal and superotemporal sectors of the RNFL to improve both sensitivity and specificity in relation to the same sectors of the BMO-MRW with POAG perimetry and percentiles <5. Abnormal superotemporal and/or inferotemporal RNFL thicknesses achieved a higher sensitivity than abnormal superotemporal and/or inferotemporal BMO-MRWs in detecting mild glaucoma (mean VF DM: −3.32 ± 1.59 dB) (97.9% and 88.4%, respectively, p = 0.006), and glaucoma (mean VF DM: −9.36 ± 8.31 dB) (98.4% and 93.6%, respectively, p = 0.006), with the same specificity (96.1%). We belive that below the 5th percentile, the best definitions are of little significance when more specific sectors like G and TI are not studied.
In our study, we relied on artificial intelligence (AI) for its potential for detect, diagnose and classify glaucoma through the automated processing of large data sets and the early detection of new patterns of diseases [28]. We considered the parameter BMO-MRW to be important when defining the stages of the disease, as shown in Table 4. Valuing the diagnostic capacities of the parameters RNFL thickness, BMO-MRW yielded better diagnostic performance than the other parameters. At 95% specificity, the sensitivity of RNFLT, BMO-HRW, and BMO-MRW was 70%, 51%, and 81%, respectively. More studies have compared the diagnostic ability of BMO-MRW and peripapillary RNFL. Gardiner [29] found that peripapillary RNFL outperformed BMO-MRW during the follow-up assessment of glaucoma. In contrast, Chauhan et al. [30], Enders et al. [31] and Malik et al. [32] proved that BMO-MRW showed higher diagnostic ability compared to peripapillary RNFL. Bambo et al. [33] did not find any differences between diagnostic ability of peripapillary RNFL and BMO-MRW in glaucoma. In the present study, we integrated both strategies to achieve a better model.
In a recent study, Brusini [34] proposed a classification with similarities to our own classification, but also with some definitive differences. Brusini's classification uses the upper and lower RNFL thickness values plotted on a Cartesian plane to classify glaucoma OCT results. The RNFL defects are classified into six stages of increasing severity and three classes of defect location (upper, lower, or diffuse defects). The diagram was created based on 302 OCT tests with 94 healthy controls and 284 patients affected by perimetric POAG.
Our study provides a more practical and simpler proposal and incorporates healthy, OHT and glaucomatous patients. We noted some substantial differences with he study by Brusini. Firstly, we included more variables that may suffer the effect of glaucoma (28 variables of BMO-MRW and RNFL sectors) than those Brusini included (only two RNFL sectors). Brusini did not consider BMO-MRW assessment. We corroborated the importance of BMO-MRW for determining glaucoma severity as two of its variables (BMO-MRW.G and BMO-MRW.TI) appear in the simplified classification model (Table 4).
Brusini considered the mean thicknesses of the upper and lower quadrants of RNFL, but ignored other variables because they are not reliable for determining structural damage, although no justification for excluding these variables was provided. We propose considering all the the RNFL and BMO-MRW sectors (NI, N, NS, TS, T, and TI), as well as a global measure (G) that averages 728 points in each rim. We selected those sectors with greater discriminatory power in the final classification model.
Brusini proposed six groups of increasing severity to classify POAG arbitraryly, but it does not obey objective or statistical criteria. Our work established four classes because this number is optimal for reducing intra-group variability according to the k-means algorithm. A bigger number of classes does not substantially reduce the total intra-group variability, as shown in Figure 5.
The groups that Brusini established are separated on the Cartesian plane in the direction of the two variables considered by taking intervals of equal length in the x-and y-axes. However, these axes do not correspond to the directions of the maximum variability of the data, which would be the directions of the principal components.
Our groups are very well separated in the direction of the first principal component, as shown in Figure 6. When using the k-means algorithm, the resulting groups were not same size, but emerge more naturally by agglutination around the centroids of each group. As for the validation of the classification model, Brusini's only studied sensitivity (0.85-0.95) and specificity (0.92-0.98) to distinguish between patients with perimetric damage and healthy patients, but not for making the distinction between different stages. Our results, which include ocular hypertensive patients (without 315 perimetric damage), validate the system with similar balanced accuracies (Table 6) when healthy eyes are not included. When healthy eyes are included in the classification system, the sensitivity of stages I and II lowers. The reason for this is that the cluster of stage I, and partially the cluster of stage II, overlap the cluster of healthy eyes, as we can see in Figure 6b. Thus most of the eyes in stage I are wrongly classified as healthy, as are some of the eyes in stage II. However, the sensitivity and specificity of stages III and IV remain similar to those of Brusini's study.
Our classification algorithm not only demonstrates its functionality within a group of known glaucomatous patients but can also be useful for the diagnosis of POAG in general population surveys because, although its performance in early glaucoma stages (I and II) shows low sensitivity and specificity, significant values are reached in more advanced stages (III and IV), thus enabling diagnosis with a mean global precision of 0.88. When we considered the robustness of our study due to the verification and evaluation of 1001 patients, 235 of whom had glaucoma at various disease stages, we can accept that only a single tomographic scan was considered for the study (the last scan performed). This test was considered the examination that would define the current process state.
This study has some limitations. As it is a cross-sectional study, its conclusions cannot be strictly considered to POAG progression in time and follow-up. More studies have to be done in this sense. Moreover, this work was performed with Caucasian participants, so its results should not be extended to other races. In adition, individuals with refractive error above the spherical equivalent of three dioptres were excluded from this study in order to avoid artifactual results of OCT. Hence this system cannot be applied to eyes with moderate to severe refractive errors, and external validity of the current study may be restricted according to ethnic or refractive factors. Finally, we only considered VF results as the inclusion/exclusion criteria in this work. Thus, we did not compare our proposed OCT clinical-evolutionary staging system with VF parameters. The reasons for this were the subjective nature and poor reliability of VF, as explained above. However, we consider that this issue should be addressed in further studies.
In conclusion, we present a new objective method to classify glaucoma patients according to both BMO-MRW and RNFL measurements that could be useful in clinical practice.