1. Introduction
Glaucoma is the second leading cause of blindness worldwide and it is estimated that the epidemiological growth rate of this disease will continue at an expansive rate [
1]. The disease is a collection of degenerative ocular neuropathies responsible for the loss of retinal ganglion cells and retinal nerve fiber layers, directly impacting the health of the optic nerve head [
2,
3]. It is clinically characterized by neural tissue loss of the nerve head, excavation and thinning of the cup, as well as non-specific visual field defects [
4].
Standard automated perimeters (SAPs) are a principal diagnostic and progression monitoring tool for assessing visual field (VF) loss, alongside optical coherence tomography (OCT), a structural diagnostic device [
5]. VF testing quantitatively measures the sensitivity of a patient’s central and peripheral vision to detect localized functional loss. SAP threshold strategies quantify visual loss by adjusting stimulus intensity based on patient responses [
6]. Multiple tests are commonly performed over a period of time to measure progression and stability against the variability of the test [
7]. Duration of testing depends on the type of test and severity of disease, but is generally more than a few minutes per eye. Fatigue and learning effects can reduce reliability of the visual field due to the long nature of most VF tests, limiting implementations in population-based screening environments [
6]. Strategies such as suprathreshold (screening) tests have been used in parallel with threshold testing due to the quicker nature of the program [
7,
8,
9]. In this paper, ‘screening’ and ‘screening test’ refer to the suprathreshold perimetric strategy of assessing visual function.
There are a variety of screening patterns currently being used which differ in methods and serve different functions. Confrontation screening tests are a quick, deviceless method for finding gross functional defects [
10]. Frequency Doubling Technology and Kinetic perimetry are useful for detecting early visual function loss and neurological field defects, respectively [
11,
12]. In two-zone VF tests, test locations are classified into ‘seen’ and ‘not seen’, based on the response to the stimulus. Unlike threshold testing, these tests provide a binary classification of visual function rather than providing a stimulus intensity. Suprathreshold programs simplify VF testing patterns which reduce test burden and increase patient cooperation. Unlike threshold testing, many suprathreshold strategies present stimuli at a fixed intensity, though some may increase intensity after an initial miss to confirm a defect. The stimuli presented in suprathreshold testing are above the expected threshold, and algorithms vary as to how frequently these stimuli are presented or repeated. An advantage to a quicker VF test is that it can be performed more frequently and be implemented into a healthcare setting more easily; therefore, enabling implementation in ocular-based screening environments, which could address the unmet need of identifying undiagnosed disease. These advantages could facilitate earlier disease detection and treatment in populations that are currently underserved or less frequently tested.
The TEMPO/IMOvifa (CREWT Medical Systems, Tokyo, Japan) visual field device is a bilateral standard automated perimeter capable of performing threshold and suprathreshold VF testing on both eyes independently but simultaneously. The device’s suprathreshold screening program utilizes a two-zone 28-point algorithm based off points most correlated with glaucomatous functional loss (
Figure 1) [
13,
14]. The screening threshold tested is based on each location’s normative values relative to age and the tested stimulus threshold is based on the first percentile of the reference database in relation to age. The suprathreshold strategy has a test time range of 33–121 s per eye when tested in a mixed health population which includes advanced glaucoma [
14]. The TEMPO screening test is intended to serve as a triage tool to identify individuals with potential visual function loss for referral rather than acting as a standalone diagnostic tool.
The objective of this study is to evaluate the diagnostic accuracy of the TEMPO screening test by measuring the sensitivity in a sample of eyes with glaucoma and the specificity in a sample of healthy eyes.
2. Materials and Methods
All participating subjects signed an informed consent form and fulfilled all inclusion and exclusion criteria. Subjects were seen at a single site in the US by an optometrist and underwent a comprehensive ophthalmic exam. The Advarra Institutional Review Board (6100 Merriweather Dr, Suite 600, Columbia, MD, USA) approved the study protocol, and the methodology adhered to the tenets of the Declaration of Helsinki for research involving human subjects and to the Health Insurance Portability and Accountability Act.
2.1. Study Participants
The study was prospective in nature and took place at a single site in Wilmington, North Carolina. The cohort comprised two groups: healthy and glaucoma. Subjects were assigned to their pertaining group based on either a comprehensive examination performed by the optometrist with VF damage or structural loss providing additional diagnostic confirmation or based on findings from their past ocular examinations.
Included subjects were required to be 40 years of age or older, able to understand and sign informed consent, as well as have a best corrected visual acuity (BCVA) of 20/40 or better in both eyes. Exclusion from the study took place if subjects were unable to tolerate ophthalmic testing, had a history of complicated intraocular surgery, non-glaucomatous, visually impacting comorbidities, any neurodegenerative disease, any visually impacting disease apart from glaucoma or demonstrated unreliable VF/OCT testing, including poor fixation. Both eyes were evaluated, and a study eye was selected through randomization. The majority of subjects had both eyes simultaneously tested unless monocular VF testing was required due to reliability issues or comorbidities.
2.2. Study Design
The following tests were performed as part of the comprehensive exam: BCVA, slit lamp biomicroscope, IOP with a Goldmann tonometer, central corneal thickness (CCT), color fundus photography, 3D Wide (12 mm × 9 mm) OCT scan utilizing the Maestro2 (Topcon Corporation, Tokyo, Japan), and a threshold TEMPO AIZE-Rapid test (24-2, Stimulus Size III, Tracking OFF). OCT was used to identify glaucomatous structural defects and was considered reliable if the scan contained minimal artifacts and the image quality (TopQ) score was 25 or above. Structural defects were based on retinal nerve fiber layer thinning or ganglion cell loss as well as optic nerve head changes such as increased cup-to-disc ratio and neuroretinal rim thinning. Threshold VF testing with the TEMPO detected functional loss and was deemed reliable if fixation losses (FLs) were 20% or below, false positives (FPs) 10% or below, and false negatives (FNs) 12% and below. Functional loss was determined based on glaucoma-consistent defect patterns, with mean deviation and pattern standard deviation used to stratify disease loss severity. The threshold test results served as the ‘ground truth’ for classifying visual field outcomes as positive or negative.
Upon confirming all eligibility criteria, one TEMPO screening test (Stimulus Size III, Tracking OFF, Age corrected P1% stimulus threshold) was conducted by an ophthalmic technician in a dimly lit room and refractive corrective lenses were used if needed. Some patients only had the study eye tested with the threshold program due to the fellow eye not meeting inclusion or exclusion criteria. A second TEMPO screening test was then conducted to measure intra-session testing stability, with no defined break period between the two tests. A positive screening test was defined as missing at least one stimulus presentation at a test location, while a negative screening test was defined as detecting all presented stimuli.
2.3. Statistical Analysis
Sensitivity was assessed twice, once with the first test and once with the second test. To assess sensitivity, positive screening tests were divided by the total number of glaucoma patients included. Specificity was calculated by dividing the negative screening tests by the total number of healthy patients. Sensitivity was further analyzed for glaucoma subgroups based on mean deviation (MD) severity ranges: −3 dB, −6 dB < −3 dB, and −6 dB. MD was derived from the TEMPO threshold VF.
To evaluate the ability of the TEMPO screening test to detect the correct screening outcome, a logistic regression model was developed that accounts for ground truth, age of the patient, and the MD value. The outcome of the model is the screening test finding—a binary value of a positive or negative screening test. The data were randomly split into a 60% training and 40% testing data scenario for each repetition, applied to the logistic regression model, and repeated 50 times to increase precision of the estimates. Applying the exponential function to the coefficient estimates provided the odds of detecting true positive field defects. Descriptive statistics and logistic regression models were conducted using R (4.4.2). To improve the clarity and grammar of the manuscript, AI-based language enhancement tools, including ChatGPT (GPT-4-turbo, OpenAI, October 2024), were used in the manuscript revision process. However, all intellectual contributions and interpretations remain in the responsibility of the authors.
4. Discussion
This study demonstrates that the TEMPO/IMOvifa screening program has high specificity, indicating a strong likelihood of accurate screening in an ocular-based screening environment with a predominantly healthy demographic. The specificity of the repeated suprathreshold tests was comparable, suggesting intra-session testing is reliable. These findings align with the specificity reported for previously validated suprathreshold strategies [
15,
16].
Sensitivity and its clinical relevance vary among MD subgroups. In a cohort with average MDs of −3 dB, the sensitivity was lower compared to MDs < −3 dB, suggesting that the ability to detect a positive field defect with the TEMPO suprathreshold test is dependent on the severity of the disease. In testing applied to individuals whose MD was worse than −3 dB, the sensitivity measures showed excellent ability to screen for positive field defects. Confidence intervals show that the margin of error was slim. Screening tests are not usually considered to be diagnostic and are predominantly useful in identifying subsets of populations that require additional testing. Threshold testing may not always be a viable option in certain healthcare settings and utilizing a screening test as an initial baseline can expand patient testing volume, especially for those which may need it the most. Having a highly specific test is needed for ocular screening since majority of patients are likely without visual function loss. The bilateral nature of the TEMPO may contribute to specificity since there is improved comfort for the patient due to a patch not being needed which may minimalize test errors and false positives. The rapid test duration may reduce fatigue-related errors and improve reliability.
When interpreting our regression model, we find that it verifies which factors affect the screening test as well as the degree of the effects. The MD odd ratios indicate that lower values of MD were more likely to be in the glaucoma category, which is consistent with clinical expectations, and it also supports the expectation that MD has in being a predictor of a positive screening test. The odds ratio for the glaucoma category indicated that the model was four times more likely to correctly identify positive screening tests than negative screening tests after the first test, and six times more likely in the second test, demonstrating stability as testing is repeated. From this, the model implies that the screening test is more effective in detecting true positives than false positive cases (
Table 3 and
Table 4).
Previous works by Nishijima et al. [
15] have focused on the ‘imo’ screening program (ISP), the predecessor of the TEMPO, by comparing its effectiveness to the frequency doubling technology (FDT). In line with our results, the ISP demonstrated high AUC and sensitivity in mild glaucoma populations and stronger AUC and sensitivity in moderate to severe glaucoma cases. Specificities of the TEMPO screening program were also higher than the values in FDT and ISP. They concluded that the screening program was thus comparable to the FDT and an effective tool in glaucoma screening. Our findings were also analogous to results in Arai et al., a work that examined the standalone diagnostic accuracy of the ‘imo’ screening device. Their AUC for mild, moderate, and severe glaucoma was 0.77, 0.97 and 1.0, respectively [
14]. Sensitivity values for mild to severe glaucoma cases were also comparable to our results [
14].
Other novel screening strategies have also been developed and tested in recent years. The glaucoma screening test developed for the Octopus perimeter (Haag-Streit, Koeniz, Switzerland) utilizes a 28-point suprathreshold program which tests points up to three times at a stimulus intensity less than 5%, designed for testing a large and diverse population [
16]. They reported a sensitivity range of 93.9–100% in the moderate–severe glaucoma group, mirroring our reported findings for this range of glaucoma [
16]. The timing for their healthy group was 40 s per eye and for early, moderate, and advanced glaucoma it was 55, 77, and 95 s per eye, respectively. The TEMPO screening test demonstrated faster timing than the Octopus, especially for the glaucoma group, with an average of 42 s (early = 40 s; moderate = 45 s; advanced = 52 s) per eye. For the healthy group, the TEMPO had an average time of 39 s. This is likely due to the fact that stimulus points are presented a maximum of two times in the TEMPO screening program. Similar to the screening pattern of the Octopus, the Humphrey Field Analyzer (HFA) employs a three-zone as well as a two-zone testing pattern. The HFA and TEMPO share a two-zone method but differ greatly due to the fact that the TEMPO tests both eyes simultaneously while HFA can only test one eye at a time. This dramatically affects test time and reduces burden on the patient.
Multi-sampling screening (MSS) techniques have also been used as a suprathreshold alternative to traditional SAP due to its comparable sensitivity with full-threshold strategies [
17]. As seen in Artes et al., MSS focuses on bettering detection of defects in milder populations of glaucoma (−6 dB <) while also maintaining comparability to threshold strategies as MD worsens. Our findings are consistent with the screening accuracy of the MSS when accounting for mild to severe cases. It is important to note, however, that accuracy of glaucoma detection increases as the number of presentations increases [
18,
19]. This leads to improved diagnostic capability in milder cases when utilizing MSS. It can, however, be more burdensome to patients than other suprathreshold strategies due to more stimuli being repeated. Fatigue from the test could then lead to test-taking inaccuracies, especially when evaluating one eye at a time, as opposed to simultaneous bilateral evaluation.
Our tested screening program demonstrates a strong specificity (90%) in a healthy cohort and strong sensitivity in a moderate–severe glaucoma cohort, indicating its potential benefits as an ocular screener. However, a 90% specificity may still produce a considerable number of false positives in low-prevalence contexts. This highlights the need for a multi-tiered approach where positive screening cases are referred for further structural and functional testing. Enabling this care model may serve to standardize this screening tool across ocular care settings which in turn will lead to increased detection of disease especially when considered in high patient volume clinics. Public health initiatives may be able to leverage this technology to better approach high-risk populations such as older adults and individuals with a known family history of glaucoma.
A limitation our study faced is that majority of subjects had an MD −3 dB, which in turn could lower overall sensitivity. It should be considered, however, that general screening environments in the US predominantly include early cases of glaucoma while underserved regions may have a higher proportion of moderate to advanced—our study shows that we can sufficiently address both cases. We do aim to address this limitation by having proportionate and equal-sized cohorts of MD values in future work. A single-site study design may also limit the generalizability of our results, and future works will consider a multi-site approach. Additionally, the screening pattern discussed is only focused on the frequency of glaucomatous VF defects. Including other visually impacting diseases could prove beneficial. Age matching between the healthy and glaucoma cohorts was not performed because it could introduce age-related bias; however, structural and functional markers were used to characterize glaucoma in our study, which minimized the impact of age. In addition, stimulus intensity was adjusted for age, and hence age was already factored in when producing the test output. This is further shown in the linear regression model where contribution to age is not significant. Future studies should aim to isolate the age difference, however. Lastly, testing was conducted at a single site which consisted of a predominant Caucasian cohort, limiting the generalizability of our results. Future work will focus on screening a wider demographic range as well as further investigating different screening patterns and stimuli sizes and integrating structural data modalities into VF screening. Gathering repeatability measures of the screening program will also be conducted to further validate our model.
To conclude, our findings demonstrate a VF screening program capable of effectively screening for loss of visual function in mild to severe cases of glaucoma (MD < −3 dB). The screening pattern also showed a strong specificity in being able to accurately classify true healthy cases. Implementing this screening technology in optometric/ophthalmic settings that are routinely screening for glaucoma could lead to earlier vision loss detection in under screened populations and demographics. While it should not serve as a stand-alone diagnostic program due to the lower sensitivities in early glaucoma, this study suggests that it may serve as a useful screening tool when used with other diagnostic tests. The ease of testing and speed allows it to be a favorable option for both the practitioner and the patient.