Objective Assessment of Attention-Deficit Hyperactivity Disorder (ADHD) Using an Infinite Runner-Based Computer Game: A Pilot Study

In the last few years, several computerized tasks have been developed to increase the objectivity of the diagnosis of attention-deficit hyperactivity disorder (ADHD). This article proposes the “running raccoon” video game to assess the severity of inattention in patients diagnosed with ADHD. Unlike existing tests, the proposed tool is a genuine video game in which the patient must make a raccoon avatar jump to avoid falling into different gaps. The distance to the gap is recorded for each jump. To evaluate the proposed game, an experiment was conducted in which 32 children diagnosed with ADHD participated. For each participant, the median and interquartile range of these distances were calculated, along with the number of omissions. Experimental results showed a significant correlation between the participants’ inattention (measured by the Attention-Deficit/Hyperactivity Disorder Symptoms and Normal Behavior rating scale (SWAN) inattention subscale) with each of these three measures. In addition to its accuracy, other benefits are its short duration and the possibility of being run on both standard computers and mobile devices. These characteristics facilitate its acceptance in clinical environments or even its telematic use. The obtained results, together with the characteristics of the video game, make it an excellent tool to support clinicians in the diagnosis of ADHD.


Introduction
Attention-deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder with an estimated prevalence in children and adolescents of 7.2% according to a systematic review published recently [1]. ADHD is characterized by difficulties in maintaining sustained attention, hyperactivity and acting on impulse. Among the consequences of this disorder are a higher percentage of accidents, higher rates of school dropouts, or a greater probability of having addiction problems [2,3]. Moreover, ADHD increases mortality between two and eight times in children, adolescents, and adults when it is not properly treated [4]. Accordingly, an accurate and early diagnosis of ADHD is fundamental to improve its poor prognosis.
Usually, ADHD diagnosis is based on the judgment of the health professional using a clinical history often supported by scales filled out by caregivers and/or teachers. Therefore, the ADHD diagnosis depends primarily on health professionals' expertise and the caregiver/teacher's observational skills [5]. Several experts have criticized this way of diagnosing ADHD as they indicate that it tends to be subjective on both clinicians' and caregivers' side [6,7]. For instance, a recent study has shown that a group of 473 psychotherapists, specialized in children and adolescents, committed more than 15% of false positives and about 20% of false negatives in identifying this disorder through medical records [8].
In another study, Schultz and Evans showed that young female teachers tended to provide higher scores than older male teachers [9]. Finally, it has also been shown that parents' and caregivers' evaluations can be influenced by their mood [10].
In addition to the observational capabilities of the medical professionals and caregivers, the assessment's accuracy may also be influenced by various weaknesses associated with the use of questionnaires. As an example, the veracity of the responses may not be guaranteed. In the case of ADHD, some of the reasons for which patients tend to exaggerate or attenuate their symptoms are: to justify academic failure, to access to stimulant drugs, to obtain certain social/academic benefits, or to refuse that they have the disorder [11,12]. Furthermore, limited accuracy is sometimes obtained with questionnaires and scales [13].
In order to surpass these limitations, in recent years, some authors have proposed to analyze patients' behavior while performing a computerized task [14]. As will be shown below in the bibliographic review, these works are based on the go-no-go paradigm. Unlike these existing tests, this article proposes a proper video game aiming at accurately assessing the severity of inattention in patients with ADHD. In this game genre, the player has to make a running avatar avoid different obstacles that are in its way. Specifically, in our game, the avatar will have to jump to avoid falling into different gaps that are in its way. Our hypothesis is that children diagnosed with ADHD will commit more omissions and perform a greater number of jumps near the gap as a result of distractions. The popularity of this game's type makes patients feel familiar and therefore increases its ecological validity. Examples of some games of this genre can be easily found and freely downloaded (i.e., temple run, subway surfer or jet pack joy ride [15]). Another difference compared with some computerized tasks requiring virtual reality equipment is that the proposed video game can be executed on any standard desktop or mobile device. This allows the evaluations to be performed at zero cost during the consultation or even telematically.
The rest of the article is structured as follows. Section 2 reviews the existing computerized task for assessing ADHD. Then, in Section 3, the materials and methods are described. These include the description of the sample, the developed video game, and the statistical methods. In Section 4, the results obtained in an experiment aiming to evaluate the proposed video game are presented. The article concludes in Section 5 with a discussion of the found benefits of the proposed video game and pointing out future lines of research.

Bibliographic Review
As mentioned above, a significant number of computerized tasks have been developed in recent years aimed at identifying patients with ADHD [14]. A common characteristic is that all of them are continuous performance tests (CPTs) based on the go-no-go paradigm [16].
Some of these works extend the traditional CPT to make it more similar to a video game. For example, Berger, Slobodin, and Cassuto proposed the MOXO-CPT in which the letters were replaced by cartoons, and visual and auditory distracters were added [17]. Concisely, the target was a child's face, and the distractors were five animals including a duck with a similar color and shape to the target. The researchers observed that the measures collected in a traditional CPT (number of correct responses, reaction time, omissions, and commissions) consistently distinguished between children with ADHD and their unaffected peers. Specifically, they reported an area under the curve (AUC) of 0.96 as a precision measure. The AUC is often used to evaluate the performance of a classifier. It takes values between 0.5 and 1, where 0.5 suggests discrimination not better than a random guess. Although there is no criterion for determining when an AUC is good, some authors consider that a value higher than 0.9 suggests outstanding discrimination [18]. However, a weakness of this work is that the researchers did not split the data into training and testing sets in their analysis. This may cause the reported results to be better than those obtained in a new sample. In another work, Shaw, Grayson, and Lewis conducted a similar study using images of Pokemon [19]. However, these authors found no significant difference between children with ADHD and controls (AUC close to 0.5). This discordance can be explained by the fact that they tried to identify children with ADHD by taking into account only the number of commissions which was the least discriminating measure in the work of Berger, Sloboding, and Cassuto.
A different extension was proposed by Keller et al. [5]. In their game "Groundskeeper", inspired by the popular game "Whac-A-Mole", the keyboard was replaced by Sifteos cubes. These cubes are able to digitally display different images and interact with each other by proximity. The patient had to move a cube with the image of a mallet towards any of the other three cubes when the image of a gopher appeared and not bring it closer when the image of birds, a rabbit or the groundkeeper showed up. The novelty of this work was not only that the human-computer interaction was tangible, but also the high number of predictors it collected. These predictors were analyzed by several recent machine learning techniques such as decision trees, boosting, or random forest. Although their results were quite accurate in discriminating children with ADHD from controls, the researchers observed that they were not able to improve the predictive ability obtained either by the standard CPT or by the Conners' Brief Rating Scale.
Later, these authors replicated the study by replacing the previous predictive techniques with a logistic regression [20]. In this study, "Groundskeeper" achieved better results than both CPT and the Conners' Brief Rating Scale. However, the same as the previously commented works, this study had the weakness of not having analyzed the data by means of cross-validation or a repeated validation, which reduces the generalization of the results.
Another work in which the interaction between the participant and the computer was carried out through movement was developed by Delgado-Gomez et al. [21,22]. In their study, patients reacted to the stimuli that appear in the CPT by raising their dominant hand instead of pressing the space bar of the keyboard. Three-dimensional positions of the dominant hand were captured 30 times per second using a Kinect camera. In this way, they were able to identify events that could not be captured on a standard CPT, such as when the participant started the reaction but stopped it before pressing the space bar. The authors reported that they obtained more accurate assessments of the participants' impulsiveness than with the Conners' CPT.
In order to incorporate ecological validity into the assessment, Rizzo et al., proposed the use of virtual reality [23]. In 2006, Rizzo et al., developed the virtual classroom, a three-dimensional virtual environment that mimics a classroom. In their study, the participants performed a CPT in which the stimuli appeared on the blackboard of the virtual classroom [24]. In this study, which included eight children diagnosed with ADHD and 10 controls, the authors noted that children with ADHD had slower hit reaction times, higher reaction time variability, and made more omissions and commissions errors. Using a sample of 10 children with ADHD and 10 controls, Parsons et al., replicated the study conducted by Rizzo and his colleges obtaining similar results [25]. The novelty of their work is that the authors compared the measures obtained in the virtual classroom with those obtained in the Conners' CPT-II, observing a significant correlation in the omissions and commissions [26]. These results were later verified by Diaz-Orueta et al., in a sample of 52 children diagnosed with ADHD [27]. In addition, Bioulac et al., observed that, in a sample of 36 children, the performance degradation over the course of the virtual classroom test was similar to the obtained at CPT-II [28]. Recently, Areces et al., found out that, in the virtual classroom, the number of commissions and the motor activity, measured through the head mounted display, was lower in the inattentive subtype than in the hyperactive subtype [29].
The following section describes the proposed raccoon runner game. Unlike the previous computerized tasks, the proposed video game does not follow a go-no-go paradigm. It is a standard video game that most children are familiar with, which increases the ecological validity of the test. It also has the advantage of not needing specific hardware. This considerably reduces its cost and allows it to run on personal computers, tablets, or mobile devices, allowing remote evaluations.

Participants
A group composed of 32 children (29 males) referred to the Child and Adolescent Psychiatry Unit of the Department of Psychiatry at Fundación Jiménez Díaz Hospital (Madrid, Spain) and diagnosed with ADHD according to DSM-5 criteria participated in the study [30]. All participants were receiving medication. Among the participants, 10 were diagnosed as inattentive type while the remaining 22 were diagnosed as combined type. The mean and standard deviation of the age was 12.46 and 3.01, respectively. The minimum age was 8, and the maximum was 16.

Running Raccon Game
The running raccoon game is a video game based on the genre of infinity runners in which a raccoon must jump several gaps before reaching the goal. The game was implemented using the widely used Unity 3D game engine [31]. Figure 1 shows a screenshot of the game.
Brain Sci. 2020, 10, x FOR PEER REVIEW 4 of 10 performance degradation over the course of the virtual classroom test was similar to the obtained at CPT-II [28]. Recently, Areces et al., found out that, in the virtual classroom, the number of commissions and the motor activity, measured through the head mounted display, was lower in the inattentive subtype than in the hyperactive subtype [29].
The following section describes the proposed raccoon runner game. Unlike the previous computerized tasks, the proposed video game does not follow a go-no-go paradigm. It is a standard video game that most children are familiar with, which increases the ecological validity of the test. It also has the advantage of not needing specific hardware. This considerably reduces its cost and allows it to run on personal computers, tablets, or mobile devices, allowing remote evaluations.

Participants
A group composed of 32 children (29 males) referred to the Child and Adolescent Psychiatry Unit of the Department of Psychiatry at Fundación Jiménez Díaz Hospital (Madrid, Spain) and diagnosed with ADHD according to DSM-5 criteria participated in the study [30]. All participants were receiving medication. Among the participants, 10 were diagnosed as inattentive type while the remaining 22 were diagnosed as combined type. The mean and standard deviation of the age was 12.46 and 3.01, respectively. The minimum age was 8, and the maximum was 16.

Running Raccon Game
The running raccoon game is a video game based on the genre of infinity runners in which a raccoon must jump several gaps before reaching the goal. The game was implemented using the widely used Unity 3D game engine [31]. Figure 1 shows a screenshot of the game. In detail, the raccoon has to jump 180 gaps which are grouped into 18 blocks. Each block is identified by the raccoon's speed, the trunk length, and gap length. The length of the trunk and the speed of the avatar define the inter stimuli (IS) time, which is approximately 1.5, 2.5, and 3.5 s while the gap's width defines the difficulty of the jump. The settings of the different blocks are shown in Table 1. In detail, the raccoon has to jump 180 gaps which are grouped into 18 blocks. Each block is identified by the raccoon's speed, the trunk length, and gap length. The length of the trunk and the speed of the avatar define the inter stimuli (IS) time, which is approximately 1.5, 2.5, and 3.5 s while the gap's width defines the difficulty of the jump. The settings of the different blocks are shown in Table 1. For each jump, it is recorded whether the participant jumped or not, and if so, the distance from the jump point to the beginning of the gap is also recorded. As discussed in the Introduction, we hypothesize that the distance from the jump point to the gap will be related to the attention process in the sense that inattentive children will jump closer to the border as a result of distractions.

Inatention SWAN Rating Subscale
The Attention-Deficit/Hyperactivity Disorder Symptoms and Normal Behavior rating scale (SWAN) is a parent/caregiver report inventory developed for screening ADHD [32]. It extends the 18-item ADHD rating scale-IV by increasing the number of possible responses for each item from four to seven [33]. On the SWAN scale, each item is scored from −3 to +3 (below average to above average), where 0 is "normal". A strong internal consistency and moderate test-retest reliability has been reported [34]. The Swan scale is composed of two subscales. The first nine items are related to inattention, while the last nine items are related to hyperactivity and impulsivity. In this article, the inattention subscale is used.
During the experiment, a trained psychiatrist accompanied each of the patients while they conducted the task. While each child was performing the test, the corresponding caregiver or legal tutor filled the inattention subscale of the SWAN scale. The average score obtained was −7.1 and the standard deviation was 10.7.

Statistical Analysis
The predictors that are widely used in the literature were computed. These are the median and interquartile range (IQR) of the recorded distances along with the number of omissions for each participant. The recorded distance is the distance from the jump point to the beginning of the gap, while the number of omissions represents the number of times the participant did not jump. Pearson's correlation was calculated for each of these measures and the score obtained in the inattention subscale of the SWAN scale. In addition, a multiple regression analysis was conducted to analyze if these predictors are independent.

Ethics Procedures
Caregivers were required to sign their informed consent after been explained the test in detail. The consent form and the study protocol were reviewed and approved by the Institutional Review Board of Fundación Jiménez Díaz Hospital of Madrid. During the experiment, a trained psychiatrist accompanied each of the patients while they conducted the task. While each child was performing the test, the corresponding caregiver filled the inattention subscale of the SWAN scale.

Results
The first column in Table 2 shows the correlations (and p-value) of the median and the interquartile range of the jump distances and the number of omissions made by each participant with respect to the score obtained by them on the inattention subscale of the SWAN scale. Columns 2 to 4 in Table 2 display the correlation considering only the jumps where the time between stimuli is 1.5, 2.5, and 3.5, respectively. Whenever the time between stimuli is less than 2 s, the correlations are no longer significant, and when it is greater than 2 s, they increase. A relevant aspect to investigate is whether these widely used predictors are independent. Table 3 shows the T statistics and the p-values associated with the coefficients of the variables included in the different possible linear regression models. The fact that the variables are significant in the simple linear regression models and no longer in the multiple models shows the collinearity of these variables. This result makes sense since, for example, the shorter the jump distance, the more likely it is to commit an omission (the raccoon automatically jumps when it collides with the edge of the gap). For this reason, the following analyses are performed using only the median of the jump distances.
Another important aspect to investigate is whether the value of these correlations depends on the length of the test. That is, whether the correlations obtained with these three measures are higher at the beginning or the end of the test. Figure 2 shows the correlation obtained between the median of the jump distances and the participant's inattention score for each of the 18 blocks defined above. It can be observed that these correlations do not depend on the stage of the test. However, the correlations were higher when the time between stimuli is long (2.5 or 3.5 s) than when the time between stimuli is shorter (1.5 s).
In order to verify the last statement, an ANOVA test was conducted in which the dependent variable is the correlation and the factor is the time between stimuli. The analysis showed that at least one of the means was significantly different from the others (p-value = 0.005). To get a better understanding, Figure 3 displays the mean plot. In addition, to determine which means are statistically significantly different from the others a multiple range test was conducted. A Bonferroni multiple comparison procedure identified that the mean of the correlations obtained when the interstimulus time was 2.5 or 3.5 s are statistically significant different from those obtained when the interstimulus time was 1.5 s. It did not identify a significant difference between the mean of the correlations obtained for the 2.5 and 3.5 interstimulus times.  To assess the performance of the proposed game, a repeated validation experiment was performed [35,36]. To do this, the previous available data were divided into two disjoint sets. These two sets are usually called training and validation sets. The training set is used to estimate the parameters of the model, while the evaluation set is used to assess it. The key point is that the validation set is not used when the model is trained and therefore it plays the role of a new sample. For this purpose, 75% of the observations (n = 24) were used to estimate the parameters of the linear regression model, while the remaining 25% were used to evaluate it. The linear regression model was built with three predictors. These were the medians calculated using the jump distances of each of the three blocks with similar interstimulus times. For each observation in the validation set, the inattention of the associated participant was estimated by the built linear regression model using the three participants' predictors. The correlation of these estimates with the scores obtained by those participants in the SWAN inattention subscale was calculated. To obtain more significant results, 10,000 repetitions were performed and the average of the obtained correlations was calculated. The mean correlation obtained was 0.53.  To assess the performance of the proposed game, a repeated validation experiment was performed [35,36]. To do this, the previous available data were divided into two disjoint sets. These two sets are usually called training and validation sets. The training set is used to estimate the parameters of the model, while the evaluation set is used to assess it. The key point is that the validation set is not used when the model is trained and therefore it plays the role of a new sample. For this purpose, 75% of the observations (n = 24) were used to estimate the parameters of the linear regression model, while the remaining 25% were used to evaluate it. The linear regression model was built with three predictors. These were the medians calculated using the jump distances of each of the three blocks with similar interstimulus times. For each observation in the validation set, the inattention of the associated participant was estimated by the built linear regression model using the three participants' predictors. The correlation of these estimates with the scores obtained by those participants in the SWAN inattention subscale was calculated. To obtain more significant results, 10,000 repetitions were performed and the average of the obtained correlations was calculated. The mean correlation obtained was 0.53. To assess the performance of the proposed game, a repeated validation experiment was performed [35,36]. To do this, the previous available data were divided into two disjoint sets. These two sets are usually called training and validation sets. The training set is used to estimate the parameters of the model, while the evaluation set is used to assess it. The key point is that the validation set is not used when the model is trained and therefore it plays the role of a new sample. For this purpose, 75% of the observations (n = 24) were used to estimate the parameters of the linear regression model, while the remaining 25% were used to evaluate it. The linear regression model was built with three predictors. These were the medians calculated using the jump distances of each of the three blocks with similar interstimulus times. For each observation in the validation set, the inattention of the associated participant was estimated by the built linear regression model using the three participants' predictors. The correlation of these estimates with the scores obtained by those participants in the SWAN inattention subscale was calculated. To obtain more significant results, 10,000 repetitions were performed and the average of the obtained correlations was calculated. The mean correlation obtained was 0.53.

Discussion and Conclusions
In the present study an adaptation of a traditional video game of the infinity runner genre has been proposed to assess the degree of inattention of children with ADHD. This work differs from previous studies in two aspects. Firstly, with the exception of the work of Shaw, Grayson and Lewis [19], it is a genuine video game instead of a computerized task based on the go-no-go paradigm. Secondly, our article assesses the severity of inattention and does not focus on discriminating against children with ADHD from controls.
Our results suggest that the number of times the avatar does not jump, as well as the median and interquartile range of the jump distances, show a significant correlation with the severity of patients' inattention. In addition, this correlation tends to be greater when the time between stimuli increases. This could be explained because when the time between stimuli is short, the patient is immersed in the game, whereas whenever this time is longer, ADHD patients have difficulty maintaining the attention. This finding suggests giving more importance to the jumps in which the interstimulus time is longer (i.e., those in which the time between stimuli is greater than two seconds).
The proposed methodology has several advantages. First of all, unlike the existing methods that last more than 15 min, the developed test takes approximately seven minutes. Furthermore, our results indicate that a shorter test could be sufficient to accurately evaluate ADHD. This feature makes it especially attractive in clinical environments where time is scarce. Second, the test does not require complicated or expensive hardware such as virtual reality equipment. A standard computer has been used in this work, but devices such as tablets or mobile devices could also be used. The limitations of our study are the sample size, the use of a single assessment scale and the absence of a healthy control group. Future studies with larger samples and administered with different assessment scales will help to confirm our pilot results. In addition, the availability of a control group will also allow us to analyze whether the proposed game is capable of discriminating children with ADHD from the participants in the control group.
In conclusion, the results obtained open up new lines of research. Firstly, to find out what is the optimal length and configuration (time between stimuli) of the test. Secondly, since the game can be run on any device, to analyze the possibility of performing the test remotely. More importantly, our study indicates to use other video game genres (graphic adventures, strategy, puzzles, etc.) as diagnostic tools for ADHD or any other mental disorders.