Concordance between SIVA, IVAN, and VAMPIRE Software Tools for Semi-Automated Analysis of Retinal Vessel Caliber

We aimed to compare measurements from three of the most widely used software packages in the literature and to generate conversion algorithms for measurement of the central retinal artery equivalent (CRAE) and central retinal vein equivalent (CRVE) between SIVA and IVAN and between SIVA and VAMPIRE. We analyzed 223 retinal photographs from 133 human participants using both SIVA, VAMPIRE and IVAN independently for computing CRAE and CRVE. Agreement between measurements was assessed using Bland–Altman plots and intra-class correlation coefficients. A conversion algorithm between measurements was carried out using linear regression, and validated using bootstrapping and root-mean-square error. The agreement between VAMPIRE and IVAN was poor to moderate: The mean difference was 20.2 µm (95% limits of agreement, LOA, −12.2–52.6 µm) for CRAE and 21.0 µm (95% LOA, −17.5–59.5 µm) for CRVE. The agreement between VAMPIRE and SIVA was also poor to moderate: the mean difference was 36.6 µm (95% LOA, −12.8–60.4 µm) for CRAE, and 40.3 µm (95% LOA, 5.6–75.0 µm) for CRVE. The agreement between IVAN and SIVA was good to excellent: the mean difference was 16.4 µm (95% LOA, −4.25–37.0 µm) for CRAE, and 19.3 µm (95% LOA, 0.09–38.6 µm) for CRVE. We propose an algorithm converting IVAN and VAMPIRE measurements into SIVA-estimated measurements, which could be used to homogenize sets of vessel measurements obtained with different software packages.

A quantitative description of the retinal vessel network can be provided by several semi-automatic software tools. Three common software packages reported in the literature are IVAN (Integrative Vessel Analysis), [10][11][12][13][14] SIVA (Singapore I Vessel Assess-Diagnostics 2022, 12, 1317 2 of 12 ment), [15,16] and VAMPIRE (Vascular Assessment and Measurement Platform for Images of the Retina) [17][18][19]. All of them include three well-known parameters summarizing vessel widths around the optic disc: CRAE (central retinal artery equivalent), CRVE (central retinal vein equivalent), and arteriovenous ratio (AVR). However, the software packages use different algorithms for vessel segmentation, for vessel labeling (artery or vein), and for estimating vascular diameter. Two additional features are estimated with SIVA and VAMPIRE but not with IVAN: tortuosity and fractal dimension.
Previous studies suggest poor-to-moderate concordance between these software packages. A comparison between SIVA and VAMPIRE on 655 images indicated poor-to-limited agreement for all parameters (0. 16-0.41) and the presence of proportional and systematic bias in the majority of the parameters [20]. IVAN, SIVA, and RA (Retinal Analysis) have been compared on a series of 120 retinal photographs, but the intra-class correlation to assess concordance was not reported [21]. Nevertheless, the authors showed that IVAN yields significantly larger retinal vessel measurements than SIVA and they proposed an algorithm to convert IVAN measurements into estimated SIVA measurements. This approach seems attractive for pooling data from different studies.
The poor concordance that has been reported implies that the software packages are not interchangeable. There is a need to clarify the quantitative differences between their measurements, which may lead to different conclusions in statistical studies using the same image set. This calls for a systematic and structured comparison of these three software applications.
The primary aim of our study was to determine the agreement of CRAE, CRVE, and AVR estimates from the three software tools. Our secondary aim was to propose an algorithm converting measurements between these packages.

Study Population
We included 242 retinal photographs of eyes exempt of ocular condition from 133 patients from the Department of Ophthalmology of University Hospital of Grenoble Alps. All participants had a complete clinical examination, including refraction, visual acuity, measurement of intraocular pressure, slit-lamp examination for the anterior and posterior segment. All the funduscopy color photographs were 45-degree images and obtained with the same CR-2 Canon fundus camera (Canon™) after pupil dilation, at a resolution of 5184 × 3456 pixels. Overall, 19 images were excluded because of bad quality or because they were not centered on the optic disc. Good photographic quality was determined subjectively on the basis of the following criteria: focus (perceived sharpness of the vessel edge against the background retinal pigment epithelium), illumination (even and consistent lighting across the zone of vessel measurement), and color balance (equal saturation of color channels). Therefore, 223 images were considered for the analysis. This study complied with the Declaration of Helsinki guidelines for research involving human subjects. The Institutional review board (IRB# 5921) reviewed and approved the study protocol, as part of the ongoing prospective observational IMAGEYE cohort study. All study participants provided informed consent for all the ophthalmologic examinations and agreed that anonymized data could be used for clinical research.

Retinal Image Analysis
The measurement procedure has been described previously for VAMPIRE [22][23][24], IVAN [25], and SIVA [15]. Each image selected was analyzed with the three software tools. The interface of each software is briefly represented in Figure 1. A single trained software operator from the Grenoble team performed IVAN analysis and VAMPIRE analysis, while SIVA analysis was carried out independently by a trained operator from the Singapore Eye Research Institute. For IVAN, a standardized ARIC [25] grid was calibrated to a fixed size according to the photograph resolution and was then manually centered on the optic disc. For SIVA and VAMPIRE, the optic disc (and also the macula for VAMPIRE) was identified automatically by the software and adjusted manually in the case of incorrect location. Each package identified and traced the retinal arterioles and venules automatically. A trained grader then examined the traced vessels and manually corrected any incorrect vessel labels (artery or vein). IVAN and SIVA enable an operator to also modify the length of the traced vessels so as to obtain a better match.
Eye Research Institute. For IVAN, a standardized ARIC [25] grid was calibrated to a fixed size according to the photograph resolution and was then manually centered on the optic disc. For SIVA and VAMPIRE, the optic disc (and also the macula for VAMPIRE) was identified automatically by the software and adjusted manually in the case of incorrect location. Each package identified and traced the retinal arterioles and venules automatically. A trained grader then examined the traced vessels and manually corrected any incorrect vessel labels (artery or vein). IVAN and SIVA enable an operator to also modify the length of the traced vessels so as to obtain a better match.
In all cases, the vessels are analyzed in the Zone B annulus, from 0.5 to 1 disc diameter from the disc margin. The caliber of the six main arteries and veins was summarized with the CRAE and CRVE based on the revised Knudtson-Parr-Hubbard formula [26]. AVR was then calculated.   In all cases, the vessels are analyzed in the Zone B annulus, from 0.5 to 1 disc diameter from the disc margin. The caliber of the six main arteries and veins was summarized with the CRAE and CRVE based on the revised Knudtson-Parr-Hubbard formula [26]. AVR was then calculated.

Pixel-to-Micron Conversion
We applied the commonly used pixel-to-micron conversion procedure based on the assumption that the adult human optic disc diameter is 1800 µm on average [27]. Using VAMPIRE, we computed the mean optic disc diameter in pixels of the entire set of images, while the IVAN and SIVA procedure is based on an image-converting factor (ICF) calculated in a subsample of images (10%).

Statistical Analysis
All analyses were performed with R (version 3.5.0) [28]. Results are presented as mean ± standard deviation (SD). Pearson's correlation test was performed to evaluate the correlation between sets of measurements. The consistency in vessel diameter measurement between packages was estimated with intra-class correlation coefficients (ICC) using a two-way model, consistency definition, and single rater unit [29]. For Pearson's correlation and ICC, the results were interpreted using the following scale: 0.00-0.39 = poor; 0.40-0.69 = moderate; and 0.70-1.00 = excellent. For ICC, single-measure coefficients and 95% confidence intervals (CIs) are reported.
To assess the agreement between software, we performed Bland-Altman analysis, where the 95% limits of agreement (LOA) were defined as mean difference ±1.96 × SD [30,31].
The three software applications were compared in pairs. A one-sample t test comparing the mean differences (between retinal vessel caliber of two different software applications) and zero value was run to indicate the presence of systematic bias. Pearson's correlation analysis was conducted between the difference and the average (i.e., the axis of the Bland-Altman plot) to indicate the presence of proportional bias.

Conversion Algorithm
We propose a conversion algorithm (converting sets of measurements from different packages) derived using all the data of our sample and based on linear regression. For internal validation of our conversion model, we performed the bootstrapping procedure described by Labarère et al. [32]. Following this procedure, we generated a bootstrap sample of 1000 samples from our original sample. In order to assess the accuracy of our conversion algorithm, we studied the variation of the root-mean-square error (RMSE) in the bootstrap samples compared with the original RMSE in the original sample. Here, "error" means "difference between values from different packages." The results are expressed in 95% CI, in which the RMSE of 95% of the samples is included.

Results
The 223 retinal photographs retained for analysis were obtained from 133 participants. The mean age of the participants was 47.9 ± 22.9 years and the sex ratio was 0.6 (50 men, 83 women). The mean axial length of the retinal photographs in the sample was 23.7 ± 0.9 mm. The characteristics of the participants are listed in Table 1. Table 2 shows the mean absolute value and standard deviation of the CRAE, CRVE, and AVR measurements derived from VAMPIRE, IVAN, and SIVA for the 223 images. The agreement between values computed using VAMPIRE, IVAN, and SIVA is described by scatterplots and Bland-Altman plots (Figures 2-4).   Table 2 shows the mean absolute value and standard deviation of the CRAE, CRVE, and AVR measurements derived from VAMPIRE, IVAN, and SIVA for the 223 images. The agreement between values computed using VAMPIRE, IVAN, and SIVA is described by scatterplots and Bland-Altman plots (Figures 2-4).     Statistical relationships are summarized in Table 3.   Statistical relationships are summarized in Table 3. Statistical relationships are summarized in Table 3.
For each pair of packages and for each parameter, the presence of systematic bias was demonstrated by the significance (p < 0.001) of one-sample t tests comparing mean differences and zero value.
To determine a conversion method between measurements from different packages, we computed linear regression relationships for CRAE and CRVE measurements.  Table 4 describes the comparisons between SIVA parameters and VAMPIRE-derived SIVA parameters or IVAN-derived parameters. In all cases, the mean of the differences between each sample was insignificant. Between SIVA measurements and VAMPIREderived SIVA approximates, the RMSE was 7.27 µm (6.48-8.15) for CRAE and 9.69 µm (8.71-10.79) for CRVE. Between SIVA measurements and IVAN-derived SIVA approximates, the RMSE was 4.88 µm (4.35-5.60) for CRAE and 5.19 µm (4.66-6.03) for CRVE.

Discussion
Our study analyzed the agreement and correlation between CRAE, CRVE, and AVR as measured by three widely used software tools-IVAN, SIVA, and VAMPIRE-with 233 fundus camera images of 133 retinas of healthy adults. Our analysis showed an excellent agreement between SIVA and IVAN, but a poor-to-moderate agreement between VAMPIRE and the two other software tools. In addition, we proposed a simple method for converting the measurements from one tool into estimates from another.
The evaluation of software tools such as SIVA, IVAN, and VAMPIRE can follow two different approaches. First, one can concentrate on morphological measurements of the retinal vascular tree, typically vessel diameters (as carried out here), tortuosity, junction/bifurcations, and fractal dimension. Second, one can study the association between vascular parameters and systematic parameters, or disease state or risk. In previous studies, the correlation with systematic parameters was found not to be significantly different between SIVA and IVAN [21], and between SIVA and VAMPIRE for the main vascular parameters (hypertension, systolic blood pressure, diastolic blood pressure, and HbA1c) [20,33].
For ophthalmologists, retinal vessel diameter can be considered a standard measurement from fundus images, such as, intraocular pressure, anterior chamber depth, corneal thickness, and axial length in routine practice. In this sense, using morphological measurements of retinal vessel diameters (the first approach) is of major interest. As a quantitative measure, this parameter provides objective information on the vascular retinal tree and could prove useful for the monitoring of ocular diseases, such as retinal vascular occlusion, diabetic retinopathy, or glaucoma. From this point of view, it may become mandatory to compare these retinal measurements in patients for vascular evaluation at baseline, during follow-up, and before and after treatment.
The agreement between IVAN and SIVA was higher than the agreement between IVAN and VAMPIRE or SIVA and VAMPIRE. When considering IVAN and SIVA, the mean difference in CRAE measurements of 16.4 µm (95% LOA, −4.25-37.0 µm) in our study was higher than −6.7 µm (95% LOA, −23.8-10.4 µm) reported in a previous study [21]; and the mean difference in the CRVE measurement of 19.3 µm (95% LOA, 0.09-38.6 µm) in our study was similar to that reported previously, −18.2 µm (95% LOA, −36.7-0.4 µm). Proportional bias was observed in both of these studies and, overall, both studies are consistent.
The poor agreement between SIVA and VAMPIRE has already been reported [20]. Interestingly, when comparing VAMPIRE and other software packages, the correlation was found to be higher for CRVE than for CRAE. This could be explained by the easier recognition of the venule edges by each software, due to their larger diameter and their more contrasted appearance in a retinal photograph. In our recent study comparing retinal photographs and adaptive optics (AO) imaging, we demonstrated a better agreement and correlation of vein measurements using IVAN when compared with the gold standard AO [34].
To counter the consequences of the discrepancy between measurements taken by different packages, we have proposed a conversion algorithm based on linear regression. The good correlation between SIVA and IVAN values of CRAE, CRVE, and AVR make this choice sensible, but less so if VAMPIRE values are involved. We did not use splitsampling for internal validation as reported in other publications [21] but we performed the bootstrapping procedure described by Labarère et al. [32]. The robustness of the algorithms is well assessed through the bootstrapping procedure. Between SIVA and IVAN, our algorithms are not strictly comparable to the algorithms previously reported (IVANderived SIVA CRAE = 0.7176 × IVAN-measured CRAE + 34.3984 IVAN-derived SIVA CRVE = 0.7102 × IVAN-measured CRVE + 44.8717) [21]. This can be explained by many factors, such as the retinal photograph resolution and compression, which could differ in the samples studied. The influence of these two parameters on IVAN measurements has been studied previously and found to be considerable [35].
The aforementioned findings highlight the need for standardization of retinal vascular imaging with fundus cameras (resolution, compression, size, instruments, acquisition protocol, quality among others) if we are to expect similar conclusions from a statistical analysis of measurements obtained with different software packages. One should consider conclusions of this study when analyzing images at high resolution (5184 × 3456 pixels) in a JPEG format. Standardization in the field of acquisition of fundus images and treatment of images by cameras is an urgent goal [36]. Development of deep-learning system [37] may produce fully automated measurement of retinal-vessel calibers and therefore may become a useful tool in the future for clinicians.
We should acknowledge some limitations of this study. First, the pixels-to-micron conversion procedure was the same for SIVA and IVAN, but not exactly the same for VAMPIRE, which could be a cause of systematic bias. Second, images were obtained from the instrument in a JPEG compressed format, which is known to distort image-based measurements; our results should be confirmed with uncompressed images [38][39][40]. Third, neither agreement nor correlation among packages attests to the accuracy of measurements with respect to objective ground truth. This kind of study has been reported elsewhere for the three software packages considered here. Fourth, SIVA and VAMPIRE generate a large number of parameters and we have studied only CRAE, CRVE, and AVR. Since several other parameters (e.g., tortuosity, fractal dimension) as well as width measurements in regions other than Zone B are used in biomarker studies, a more extensive comparison would be necessary for a proper assessment of the concordance between the three packages. Fifth, our sample is modest in size and comprises adults with healthy retinas. Larger studies including statistical adjustment modeling potentially important factors (e.g., camera operator, image quality, patient characteristics etc.) are required. Finally, the limited correlation of VAMPIRE values with those of SIVA and IVAN limits the validity of the conversion model, which could, however, have value in trend-oriented investigations (i.e., decreasing or increasing outcome).

Conclusions
In conclusion, our study on the agreement and correlation of measurements from the three most widely used software tools for retinal biomarkers found an excellent agreement between SIVA and IVAN and a poor-to-moderate agreement between VAMPIRE and the two other software tools.
Comparing measurements of the retinal vasculature obtained from different software tools remains a challenge requiring, arguably, a considerable standardization effort on algorithms, image acquisition protocols and quality at least. Standardization would guarantee not only consistent measurements but also, and importantly, comparable findings in terms of biomarkers for specific conditions obtained from statistical analysis. In our study, a solution suggested by our findings is to convert IVAN and VAMPIRE measurements into SIVA approximate equivalents before use for pooled data analysis.