1. Introduction
The act of following a line with the eyes, although a seemingly simple task, requires several skills and the application of different cognitive functions. This task was formerly called visual tracing [
1], in contrast to visual tracking, which refers to the detection of a target in motion [
2].
A simple clinical task that uses visual tracing is the Groffman visual tracing (GVT) test. Initially developed by Groffman in 1966 [
3] to assess the tracing abilities of children, it has recently become the subject of some research into its use as a clinical tool or experimental task in both children and adults with or without specific deficits [
4,
5,
6,
7,
8]. GVT is an indirect psychometric measure of oculomotor performance, used for the clinical assessment of oculomotor behavior.
The original GVT test [
3] consists of two cards with five contorted lines of increasing overlap, crowding, and difficulty. The subject starts from each of the letters at the top of the page, follows the line from the letter to the corresponding number at the bottom of the page, and names the number. Despite the fact that the GVT test was originally intended for use on subjects in the developmental age range, it can also be applied to adults [
5]. Individuals with visual and cognitive deficits following different etiologies, such as traumatic brain injury (TBI) or acquired brain injury (ABI) [
6,
9,
10], can exhibit oculomotor alterations. It has been used in an adapted form in different studies involving healthy [
4,
8,
11], learning disabled [
12], epileptic [
13,
14,
15], and occipital injured [
16] children to assess visual perceptual abilities.
Recently, Zee [
17] drew the attention of the neurology community involved in examining eye movement disorders in various neurological deficits to the requirements for and availability of easy-to-use tools to be used to measure and quantify such conditions. Oculomotor deficits can be found in patients who have suffered a stroke at a high percentage, ranging from 7% to 86% [
18,
19,
20,
21], depending on specific deficits, the time elapsed since the stroke, and the stage of recovery. In particular, oculomotor problems have been observed in association with specific cognitive deficits such as unilateral spatial neglect [
22,
23], neglect dyslexia [
24,
25], simultanagnosia [
26], oculomotor apraxia [
27], Balint syndrome [
28], progressive supranuclear palsy (PSP) [
29], and cerebellar ataxia [
30].
Tests of oculomotor functioning such as GVT, which although may appear to be limited in comparison with the recording of eye movements, are becoming promising tools for use for the fast evaluation of eye movement disorders. They can be used either with neurologically unimpaired individuals or neurological patients, in clinical contexts where eye-tracking technology is not suitable because of the difficulty of implementation [
31,
32,
33]. From a clinical point of view, only a few simple paper-based tasks for oculomotor functioning on are clinically available, and those that have been proposed have some limitations in the normative values that are available [
3,
32,
34,
35,
36]. The available oculomotor tests differ in their characteristics; therefore, they may not address the same aspects of oculomotor behavior [
31].
Only one study has been directed towards the assessment of the psychometric properties of the original GVT test that are necessary for its correct clinical use. That study showed that the original five-line version is useful for adults but too difficult for young children, for which an easier three-line modified version is more appropriate [
5]. In any case, for clinical application, GVT lacks reference norms for adults.
Consequently, the aim of this study has been to assess the impact of age on eye tracing behavior and to define specific normative data for the GVT test with the application of a new scoring system.
2. Materials and Methods
2.1. Subjects
A power analysis was first performed to assess the minimum sample size required. Because the definition of normative values was regression-based, we followed this approach for the power analysis (see statistical methods paragraph for details). Based on a regression model with three independent factors (demographic characteristics: age, education, and sex), alpha of 0.05, power of 0.80, and effect size f2 of 0.04, we determined a minimum sample size required of 277 participants.
A group of 537 participants was originally enrolled, but because of the presence of extreme outliers (3 × IQR over the third quartile) in the execution times, 11 participants were removed, giving a final sample of 526 participants. The procedure used for filtering is described in the section dealing with statistical methods. The participants had a mean age of 45.9 years (SD 16.0, range 20–79). The education mean was 13.41 years (SD 3.7, range 5–25). Of 526 participants, 292 were females (56%). These were subdivided into six age groups, increasing in decades, from 20–29 to 70–79 years old. The size of our sample for each decade, compared with the age distribution of the 40–79 years old adult Italian population in 2020, was not significantly different (χ
2(5) =2.47,
p = 0.78). Participants were recruited as a convenience sample from those available by direct contact from all examiners.
Table 1 summarizes the demographic data of the participants.
The inclusion criteria were the presence of normal binocular vision assessed by the cover test, the absence of ocular diseases reported by the participants, and a visual acuity equal to or greater than +0.1logMAR in each eye, at near, using SLOAN letters logMAR chart (Goodlite 729000, Elgin, IL, USA). The exclusion criteria were the actual or previous presence of neurological or psychiatric disorders reported by the participants.
Before the evaluation, the participants signed informed consent in order to participate in the study. The study was carried out following the guidelines given in the Declaration of Helsinki, and it was approved by the Optics and Optometry Institutional Review Board of the University of Milano-Bicocca (5/2019; 13 May 2019).
2.2. Groffman Visual Tracing Test
Following the original instructions [
3], the GVT test is composed of two cards of 216 × 279 mm (i.e., US letter size,
Figure 1). Each card consists of five separate intersected continuous lines in a twisted pattern. The task consists of rapidly and accurately “following with the eyes” each line without losing it. The task requires starting from each of the letters at the top of the page (A, B, C, D, and E), following the line from the letter to the corresponding number at the bottom of the page (1 to 5), and naming the number. The corresponding number and the execution times are both recorded. As a pre-test, the demonstration card is shown to the participant, and the instructions about the start, intersections, and ends are explained carefully. The demonstration card is intended to enable the instructions to be understood and to check that the subject possesses the minimum skills required to execute the test. When a participant could not follow a single line on the demonstration card correctly after three attempts, testing was halted because the required level of the minimum skill had not been attained.
As reported in the original paper, the instructions were: “This is a test to see how quickly and accurately you can follow a line using only your eyes. Look at the line that starts at the letter A, Follow it with your eyes. When it reaches another line (point to the first intersection), follow it through the gap (point to the broken line). This line goes under the whole line and continues through.” (Groffman, 1966, p. 140). After the demonstration card, cards A and B were always administered in the same order. The instructions for each card and line were: “Now we are going to trace five more lines. Your score will depend on accuracy and speed, so work quickly, but try not to make a mistake.” (Groffman, 1966, p. 140). The answer keys for cards A and B were reported on the scoresheet.
2.3. Procedure
The evaluation was performed in a quiet and well-illuminated room (about 350–400 lux). Initially, consent to participate in the research was signed by participants, and the inclusion/exclusion criteria were checked. Each participant was seated at a desk wearing the correct glasses (if necessary), and the different cards were positioned on a lectern at a distance of 40 cm. A stopwatch was used to record the execution time. The first card A was positioned on the lectern, and the lines were covered by a white sheet to prevent the participant from following the lines before starting the test. Consequently, only the five letters at the top of the page were visible. The examiner named the first letter removed the white sheet and started recording the time. When the participant named the corresponding number, the examiner stopped the stopwatch. The accuracy (i.e., number of lines followed correctly) and the execution times were recorded on the scoresheet. For each line, if the number reported was not correct, accuracy was scored as zero, and only if the number reported was correct was the execution time recorded, and the accuracy for the tested line was 1. If the participant lost the mark, the accuracy was zero. Scoring of the GVT test was performed using the overall accuracy and mean execution time of each card and line (2 cards × 5 lines) [
5].
2.4. Statistical Methods
When plotting the raw data of execution times, some high outliers emerge for one line. It is possible that the participant could have gone back or restarted the task, and the examiner could not have recognized this behavior, even if it was not admitted. For this reason, a posteriori case-wise deletion of univariate extreme outliers was performed. Based on all execution times, the non-parametric threshold for the extreme outlier was calculated as three times the interquartile range (3 × IQR) over the 3rd quartile [
37]. The value obtained was 78 s. If the execution times of at least one line were equal to or greater than 78 s, all data for the individual participant were discarded. This corresponds to a case-wise deletion of 11 participants, from 537 to 526.
Initially, a series of descriptive and inferential analyses were performed to evaluate the performance of the GVT test over age groups with respect to accuracy and execution times. Comparisons of accuracy between age groups were performed with 1-way ANOVA. Accuracy was measured using a score from 0 to 10. Since not all participants performed all lines correctly, the comparisons of execution times were performed with a linear mixed model (LMM) ANOVA using Id (anonymous identities) as a random factor (random intercept) and Card, Line, and Group as fixed factors with all interactions.
The definition of normative values was performed using a standard procedure used in neuropsychological testing [
38,
39,
40]. To judge whether a participant performs at a normal level in a specific test, it is necessary to compare its performance to the population sample with the same demographic characteristics. This procedure requires collecting data for each factor that influences the score. Consequently, a very large sample, with a minimum of 90–100 participants for each category of gender, age, and education level is needed, resulting in thousands of participants. An efficient alternative model is to subtract the influence of age, gender, and education (if necessary [
41]) from the raw score and to calculate the normative data on this adjusted score using a non-parametrical approach [
39]. This scoring system was widely used in the field of neuropsychological testing and requires only some hundreds of participants [
42,
43,
44,
45,
46].
Based on the results of the previous analyses, irrespective of whether the comparisons between lines and cards were significant, execution times were scored whether they were separated or not. The final goal was to make the differences between lines uniform and to have the same mean execution time for all lines. The influence of the line on the execution time was balanced using the steps outlined below. Firstly, the mean execution time of each line for all participants was calculated. Secondly, the mean value of these means was calculated. The difference between the mean of each line from the mean of the means was determined. These series of values (one for each line), with reversed signs, represented the first correction factor and were added to the raw data for the execution time of each participant. A table that could be used to facilitate calculation was provided. Thirdly, since the participants may have followed a different number of lines (from 1 to 10) correctly, a mean execution time for each participant was calculated. This scoring procedure provided two easy scores for GVT, namely accuracy and execution time.
Following this procedure, the influence of demographic variables (age, education, and gender) on the dependent variable (mean corrected execution times or accuracy) was assessed in different steps.
Using the general linear model, a series of bivariate regressions were performed, with different transformations of the independent variable (age, education, sex) to find the most appropriate transformation [
38,
39]. The transformations used were: linear, reverse, quadratic, logarithmic, logarithmic reverse, square root, geometrical, inverse, and exponential.
Akaike’s Information Criterion (AIC) [
47] was used for the selection of the most appropriate transformation model for each independent variable [
48].
The three best bivariate models (one for each predictor) were entered into a multivariate model with two or three independent factors.
We used AIC model selection to find the most appropriate model among a set of 7 possible models describing the relationship between the dependent variable (accuracy or execution time) and age, education, and sex in their single or multiple combinations.
Subsequently, based on the previous result, a second regression model was built, based on deviation from the mean. Then, by reversing the regression coefficients, a regression for adjusting the score was calculated taking into account the contribution of each confounding variable. The two regressions discussed above are not equivalent because the first one used the raw score as a dependent variable. In contrast, the second one used the deviation from the mean. For its clinical usefulness, only the second model was reported.
Based on the results of this regression, a simple correction grid was built to facilitate the scoring process. Specifically, since from a clinical point of view it is easier to find age and education in a table when the value falls in a specific range (e.g., 20–29), the age included in the regression was the mean of the interval considered (e.g., 24.5). This represents a simplification, but the correction grid is a simpler tool to facilitate clinical use. A precise detailed scoring could be performed using the regression equations.
In order to define a cut-off score, the one-sided non-parametric 95% tolerance intervals, with a confidence limit of 95%, were then calculated. For accuracy, the leftward limit was calculated and for the execution time, the rightward limit was considered. Corrected scores, percentile, and rank-based equivalent scores [
49] were calculated and reported for clinical use. Statistical analyses and figures were performed with R statistical environment 4.0.3 [
50] and specific packages: ez 4.4-0 [
51], Hmisc 4.6-0 [
52], lme4 1.1-26 [
53], lmerTest 3.1-3 [
54], Tolerance 2.0.0 [
55], and AICcmodavg 2.3.1 [
56].
4. Discussion
The aim of this study was to assess the influence of age on visual tracing performance by using the GVT task and to provide adult norms for this test. Scoring based on the overall accuracy and execution times has been applied as a standard in many neuropsychological performance tests [
31,
43,
57,
58].
The results show that accuracy decreases over age groups. This represents a clear aging trend. Each line on a different card showed a specific accuracy level which was slightly but significantly different one to the others. However, this is an intrinsic characteristic of the test, and there are no floor or ceiling effects that invalidate the task.
Execution times, other than increasing with age, as previously shown in a small number of participants [
5], have been shown to be different for each line and card. The previous result has been confirmed in the current study with a larger and more representative sample, which was necessary for defining norms.
There is an awareness that there are many cognitive factors that influence the performance of the oculomotor test, primarily visuospatial attention [
1,
2,
4,
59]. Nonetheless, paper-based oculomotor tests could be helpful in many clinical situations [
18,
31,
33].
Normative data were produced, keeping in mind the procedure usually used in the neuropsychological tests. Accuracy was influenced by age, education, and sex, while mean execution time was influenced only by age. With a specific adaptation for obtaining mean execution time, the results are reported as percentiles and equivalent scores for different clinical requirements. Even though this process of scoring seems time-consuming, it represents a standard in neuropsychological testing and allows a comparison to be made of the scores obtained with other tests that use the same standard scores, namely percentile or equivalent score.
Although the test includes two cards and five separated lines, it is advantageous to consider it as a whole, in particular with respect to accuracy. This takes into account that the accuracy over 10 lines represents a better scoring method than considering separate scoring for each line and card (5 + 5). Conversely, for execution times, a slightly complex method of scoring has been applied because of the nature of the task itself (execution time is available only for the lines followed correctly) and to obtain a single (mean) score of execution times. Alternatively, each line needs to be scored separately, giving a series of speed scores, one for each line followed correctly. This procedure in a clinical setting is time-consuming, as well as making it difficult to interpret multiple results. By using the method of scoring applied in this study, a simple assessment of speed and accuracy can be performed.
This study has set the basis for clinical application of the GVT test in the adult population. Future directions could involve its use on specific populations of neuropsychological patients such as ABI and TBI, and the comparison of GVT with either eye-tracking or other paper-based oculomotor tests, such as King Devick, the DEM test, and the visual search test [
31].
The participants were from Italy, and consequently, the norms could be correctly defined as Italian norms. However, since in this test, as in many visuospatial tasks, there is no influence of culture or language, in the absence of other studies, they can be used as an independent international reference. It is important to note, however, that the norms presented have some limitations (and uncertainty). In another sample of the same size, the model used to calculate adjusted scores and its coefficients may differ depending on the specific sample. In future normative studies, a representative and larger sample could be used to verify and ameliorate this point.