Analysis of Heart Rate Variability and Game Performance in Normal and Cognitively Impaired Elderly Subjects Using Serious Games

: Cognitive decline is one of the primary concerns in the elderly population. Serious games have been used for different purposes related to elderly care, such as physical therapy, cognitive training and mood management. There has been scientiﬁc evidence regarding the relationship between cognition and the autonomic nervous system (ANS) through heart rate variability (HRV). This paper explores the changes in the ANS among elderly people of normal and impaired cognition through measured HRV. Forty-eight subjects were classiﬁed into two groups: normal cognition (NC) ( n = 24) and mild cognitive impairment (MCI) ( n = 24). The subjects went through the following experiment ﬂow: rest for 3 min (Rest 1), play a cognitive aptitude game (Game 1), rest for another 3 min (Rest 2), then play two reaction-time games (Game 2&3). Ten HRV features were extracted from measured electrocardiography (ECG) signals. Based on statistical analysis, there was no signiﬁcant difference on the HRV between the two groups, but the experiment sessions do have a signiﬁcant effect. There was no signiﬁcant interaction between sessions and cognitive status. This implies that the HRV between the two groups have no signiﬁcant difference, and they will experience similar changes in their HRV regardless of their cognitive status. Based on the game performance, there was a signiﬁcant difference between the two groups of elderly people. Tree-based pipeline optimization tool (TPOT) was used for generating a machine learning pipeline for classiﬁcation. Classiﬁcation accuracy of 68.75% was achieved using HRV features, but higher accuracies of 83.33% and 81.20% were achieved using game performance or both HRV and game performance features, respectively. These results show that HRV has the potential to be used for detection of mild cognition impairment, but game performance can yield better accuracy. Thus, serious games have the potential to be used for assessing cognitive decline among the elderly.


Introduction
The worldwide population's portion of older adults has been gradually increasing. By 2050, the World Health Organization (WHO) estimates that the world's population of people aged 60 and above will reach 2 billion [1]. Some of the reasons for this phenomenon are the steady increase in the worldwide life expectancy, along with decreasing fertility rates. Taiwan is currently an aged society, with the percentage of its elderly people being more than 14% of its population. With this trend, this percentage may rise to more than 20% by 2026, making Taiwan a super-aged society [2].
Cognitive decline is one of the key problems among the elderly. The brain structure and function can change as people become older. There are numerous factors such as alcohol, head trauma or injury, excessive stress, and the development of degenerative diseases such as Alzheimer's disease (AD) that over the years can cause cumulative damage to the brain leading to cognitive impairment. The frequency of Alzheimer's disease increases with age in older persons, and it is also the primary cause of cognitive deterioration [3]. According to a study in 2020, the global prevalence of cognitive impairment in older adults could range from 5.1% to as high as 41% [4]. While cognitive decline cannot be completely reversed, it can be delayed or even avoided. This can be accomplished by therapies such as exercise, cognitive training, nutritional counseling, and risk factor monitoring [5]. One of the ways to implement cognitive training is through the use of serious games. Interventions can benefit from early detection, but the onset of cognitive impairment might be very subtle and difficult to detect. The progression is slow and close relatives and friends may notice it more than the sufferers themselves [3].
Playing game requires cognitive abilities, depending on the game objectives. Some games may require problem solving, decision-making or quick reflexes, while some may require memory and language skills. Games provide a wide range of stimuli capable of eliciting cognitive activity and emotion in the player, depending on the game [6]. Due to this, they could be used as an alternative for evaluating a person's cognitive state. Common testing methods such as Mini-Mental State Examination (MMSE) and the Montreal Cognitive Assessment (MoCA) are great for detecting cognitive impairment, but the results can also be affected by the person's educational background [7].
As with the case of video games, different game genres and gameplay can affect the autonomic nervous system (ANS) activity, and conversely, the heart rate variability (HRV) [8].
There is a rapid growth in the field of serious games in the industry as well as in academic research. Serious games are games created for a purpose other than pure entertainment [9]. Serious games focused on health were used to help with health monitoring [10][11][12], detection and management of health conditions [13][14][15][16], and therapy and rehabilitation [17][18][19]. Serious games designed for the elderly have been developed over the years with a variety of goals in mind, ranging from physical therapy, balance training, and mood or stress management [20][21][22]. While these games had a positive effect on the elderly, no physiological measures were considered. Since physiological measurements are difficult to influence consciously, they have been proved to be more objective [23]. By recording and analyzing these physiological signals, the games and possibly other applications designed for the elderly can be designed better towards their needs.
The relationship of cognition and HRV features has been investigated in several studies, and the results indicate the influence of the ANS in cognition. HRV is generally defined as the variation on the time interval between a person's heartbeats. The R-to-R (RR) intervals are calculated by acquiring the time between each R wave in the QRS complex of the ECG signal. Normalized RR (NN) intervals are also calculated in a similar method, but artifacts and noise from the signal were removed. Most HRV features are then derived from RR or NN intervals [24]. There were some HRV features that have been shown to have a correlation to a person's cognitive abilities. High resting HRV values correlate to better performance in different cognitive domains. Some of the HRV features to note are high-frequency (HF) power, low-frequency (LF) power, standard deviation of NN intervals (SDNN) and root mean square of successive RR interval differences (RMSSD) [25]. From these results, autonomic markers such as HRV features, can be used as biomarkers for cognitive impairment [25,26]. However, not all studies observed a relationship. Some cognitive domains and tasks implemented in these studies found no correlation between the subjects' cognitive performance and their HRV.
In this study, we wanted to investigate if there are significant difference in the ANS activity between cognitively normal and cognitively impaired elderly subjects as they play serious games through their measured HRV. These serious games were designed for the elderly, one game for testing the cognitive aptitude and two games testing their processing speed and reaction time [27,28]. The purpose of investigating the HRV response is to know how these serious games affect elderly people of different cognitive status. In addition to this, we also compared the two groups for their cognitive function through their game performance. We also explored the use of HRV and game performance features measured from elderly subjects and its application for automated detection for cognitive impairment using machine learning (ML).
The subject criteria, experimental design, physiological measurement, signal preprocessing, feature extraction, statistical analysis, feature selection and classification are shown in detail in Section 2. The results from the experiment and data analysis are presented in Section 3. The discussion and interpretation of these results are in Section 4. Finally, conclusions and future work are presented in Section 5.

Materials and Methods
This section highlights the materials and methods used to achieve the aim and purpose of this study. It contains the details for subject criteria, experimental design, physiological measurement, signal processing, feature extraction, statistical analysis and classification.

Subjects
The subjects were selected according to certain criteria. They must be aged 50 years or older. They should be able to understand the purpose, process, risks, rights and interests of the research, and sign the consent form. MoCA was used to evaluate the subjects for their cognitive abilities. It was shown to have better detection of cognitive impairment, especially mild cognitive impairment (MCI) [29]. The cut-off score used for this study is 23/24, as the age and education of the subjects have been taken to consideration [30,31]. Their scores were also evaluated and verified by hospital doctors and psychologists specializing in elderly care.
People with neuropsychiatric diseases such as abuse of alcohol or other substances, Parkinson's disease, epilepsy and similar matters were excluded from participating. People with substance abuse problems were excluded from participating this experiment as alcohol and some drugs can influence the heart rate variability [32]. People with psychiatric disorders, such as anxiety, panic attacks, posttraumatic stress disorder, anorexia, borderline personality disorder, and depression have been found to have lower HRV features [33] and are also excluded. Parkinson's disease and epilepsy may affect the subject's hand and walking function, such as tremor, stiffness and slow walking. Severe vision and hearing loss, or any major medical illness (cancer, uremia, anemia or thyroid dysfunction) or other obstacles that can hinder the person's ability to cooperate with the researchers can be a criterion for subject exclusion.
Subjects can withdraw from the experiment at any point they want, even during the experimental process. If they are unable to complete a trial, including the clinical scale evaluation and board game interactive system, then they will also be excluded from the experiment. This experiment has been approved by the Ethics Committee Approval (Institutional Review Board (or Ethics Committee) of Taichung Veterans General Hospital Taiwan (protocol code SF18297A-1; 16 January 2019)). A total of 48 subjects were selected, in which 24 subjects have normal cognition while the other 24 subjects have mild cognitive impairment.

Experimental Design
The experiment was conducted in the Chiayi Branch of the Taichung Veterans General Hospital. The experiment room consisted of two testing areas for the subjects to play games. Since all of the subjects lived in Taiwan for the majority of their lives, the language used to instruct and play the game was in Traditional Chinese. The games used in this study are serious games, since these games were designed with the main purpose of cognitive assessment training for older adults, and not just for entertainment.
The flow of the experiment is shown in Figure 1. The subjects rest outside of the testing room for 3 min. During this period, the subjects were also asked to sit, relax and close their eyes. This is shorter than the recommended 5 min, but based on [34], the minimum required recording time for the HRV features used in this study is only 2 min. So, 3 min is more than sufficient, and reduces the experiment time for the subjects. After this initial resting period, the subjects enter the testing room and walk towards the area where Game 1 was set up. Instructions regarding on how to play the game was then given to the subjects. Game 1 is a cognitive aptitude game based on nostalgia theory and themes related to Taiwanese culture including food, clothing, transportation, housing and entertainment [27]. The gameplay time ranged between 8 to 10 min. The game's user interface was projected to a wall, and run using a personal computer. These questions involve memory, attention, executive function and language. Speed is not a priority. In this game, test subjects have to identify everyday objects, places and events. They have to select the appropriate clothes, make arithmetic calculations, and decide on which directions to take depending on the described scenarios. They were also asked about their favorite festivals and they have to select the appropriate scenes or items related to that event. A sample of the user interface is shown in Figure 2, with the categories labelled with their respective English translation.  After completing Game 1, they were then required to take a rest for 3 min. This is the second resting period. Then, they proceeded to another area to play two reaction time games in succession without a resting period in between. These games were played using a personal desktop computer and a specialized button board, but projected to a large screen for the elderly subjects to have a better view of the games. Game 2 and 3 are called Whack-a-Mole and Hit-the-Ball, respectively. Whack-a-Mole was played first, then Hit-the-Ball. These two games were used to test the subjects' reaction time and processing speed [28]. For Whack-a-Mole, the subjects have to hit the correct hole where the mole pops up, as shown in Figure 3. There are three levels of speed: slow (level 1), medium (level 2) and fast (level 3). For Hit-the-Ball, the subject has to choose the correct fairway where a football pass through. One ball, either a football or spiked ball, will roll on either of the three fairways. The spiked balls should be avoided, in which the subject should not press any of the buttons on the board. They also have to press the correct button when the football reaches the target area, as shown in Figure 4. As with the Whack-a-Mole game, it has three levels of speed. On the surface, these two games are similar, and by being reaction-time games, they test how fast the subjects can react to the games' stimuli. However, Hit-the-Ball has an added layer of difficulty in which they have to identify the type of ball moving across the field and act accordingly. The total gameplay time for playing these two games in sequence ranges from 4 to 5 min.  After playing, the experiment ends and they were asked to go back to the resting area outside the testing room. ECG was recorded during resting periods and while the subjects were playing the games. All games used in this study were developed in-house and are free.

Physiological Measurement and Signal Preprocessing
The ECG signals were measured using a wearable device developed in-house, as described in [35]. Ag/AgCl patches were placed on the below the subject's left and right collar bones and on the lower left rib in a lead II configuration. The ECG signals were sampled at a frequency of 500 Hz. The measured signals were divided into different time sections. These time sections are Rest 1, Game 1, Rest 2 and Game 2&3. The time for Game 2 and Game 3 were combined as they were played in succession and there was no resting period in between.
To ensure the quality of the ECG signal, it underwent signal preprocessing before feature extraction. The raw ECG signal was detrended to remove the linear trend, then filtered using a Butterworth band-pass filter with cutoff frequencies from 5 to 40 Hz. The filtering was carried out to remove the baseline drift, white noise and motion noise.
Artifact removal is vital in the calculation of HRV, as a minor artifact can severely skew the extracted HRV features. Artifacts such as ectopic beats, arrhythmic events, missing data and other noise can affect the calculation of the power spectral density (PSD) of HRV. The software ARTiiFact [36] was used to remove these artifacts. It first extracts the interbeat interval (IBI) or the R-R interval from the ECG. After the IBI extraction, artifact detection was implemented using either or both of these two algorithms: the median absolute deviation detection and an algorithm proposed by Berntson et al. [37]. It also allows for manual artifact detection, in which the user can visually check the ECG signal. To ensure the accuracy of the IBI calculations, the automated methods were used first, then verify the results from these algorithms through manual detection. The detected artifacts were then processed through interpolation. Two methods were used, namely cubic spline and linear interpolation. Both methods have their own pros and cons. Linear interpolation is simple to use, however it is ineffective when dealing with nonlinear data. The cubic spline interpolation can produce smooth interpolants, but it would not work well with sample points that are near together while having large value variations.

Feature Extraction
After the artifact removal, the HRV was calculated using Heart Rate Variability Analysis Software (HRVAS) [38]. It is an open-source program that calculates HRV features. Ten HRV features were used: 5 time-domain features, 1 nonlinear feature, and 4 frequencydomain features from PSD.
Time-domain features are derived from the detection of the QRS complex in the ECG signal. Each R wave was detected and the R-R intervals are obtained along with the instantaneous heart rate. From the R-R intervals, the mean R-R interval (RRI mean), heart rate (HR), standard deviation of NN intervals (SDNN), root mean square of successive differences of NN intervals (RMSSD) and percentage of differences between adjacent RR intervals that are greater than 50 ms (pNN50) can be derived. To calculate for SDNN, normal beats were used. RMSSD and pNN50 strongly correlate to parasympathetic activity [34].
Sample entropy (SampEn) is a nonlinear measure often used to measure complexity in any time series. In HRV analysis, SampEn is presented to measure the rate of entropy in short-term NN intervals [39]. High SampEn readings might indicate a high degree of complexity and little predictability. Lower values might imply a lack of complexity and a high degree of regularity. [34].
Frequency-domain features are derived from the PSD. There are numerous methods in calculating the PSD. The PSD was calculated using a parametric approach called autoregressive (AR) power spectral analysis. Even using fewer samples, parametric approaches offer smoother spectral components and more accurate PSD. Low frequency (LF) and high frequency (HF) are the two major spectral components considered in this investigation (HF). The LF band, which spans 0.04 to 0.15 Hz, is more closely linked to sympathetic activity. The HF band, on the other hand, is related with parasympathetic activity and runs from 0.15 Hz to 0.4 Hz [40].

Statistical Analysis
Statistical analyses were performed using SPSS 25 [41] and R Software 3.6.1 [42]. The age, years of education, and MoCA scores were compared between NC and CI groups using Mann-Whitney U test as these factors were not normally distributed. The gender was compared between the groups using Chi-square test. The ECG features were tested for normality using the Shapiro-Wilk test. Robust Mixed-model ANOVA was used to statistically test the effects of group and session, since the majority of the features were nonparametric data. The variable "group" refers to the two main groups of elderly people in this study, which are the NC and MCI groups. The variable "session" refers to the different sessions in the experiment, consisting of the rest and game sessions, namely: Rest 1, Game 1, Rest 2, and Games 2&3. This robust statistical method was implemented using the R package WRS2 [43]. Statistical tests of the main effects were followed by post-hoc comparisons between the different sessions. The significance values were then adjusted using Bonferroni correction. To compare the game performance of the two elderly groups, Mann-Whitney U test was used as the data is nonparametric and the groups are independent from each other. The significance level used for all statistical analyses was 0.05.

Classification
Three sets of data were used for machine learning. The first set only used HRV features, then the second set only used game performance features, and the third set used both HRV and game performance features. Leave-one-out (LOO) procedure was used to split the data to evaluate the performance of the classifier. The sizes of each dataset are presented in Table 1. The typical machine learning pipeline is shown in Figure 5. It involves several processes such as data retrieval, data preparation, modeling, model evaluation and tuning, and deployment and monitoring. However, the process of making a pipeline is very tedious and time-consuming. So automated machine learning (AutoML) was used to reduce the time it takes to make a pipeline while maintain model quality [44]. The machine learning pipeline for each dataset was generated using an AutoML library called Tree-based Pipeline Optimization Tool (TPOT). It uses a tree-based structure to construct a pipeline for predictive modeling, including data preparation, modeling algorithms and model hyperparameters [45][46][47]. The performance of the generated pipeline was measured using accuracy, sensitivity and specificity. The purpose of using TPOT was to generate the best machine learning pipeline that will suit the data, thus acquiring the optimum performance despite using different datasets.

Results
This section presents the main results of the study, including the demography and characteristics of the subjects to observe the factors that can affect the measured HRV, statistical analysis results of the HRV and game performances features, and classification results using machine learning.

Demography and Characteristics of Subjects
The subjects in this study were composed of 24 cognitively normal subjects and 24 subjects with mild cognitive impairment. The descriptive statistics are listed in Table 2. No significant difference between the two groups was observed in terms of the subjects' gender. While the females have a larger proportion in both groups, the gender-related variation in HRV features decreases once the age reaches over the age of 55. Since the subjects aged above 60 years old, the gender-related differences that can affect the results of this study is also diminished [48]. The group with normal cognition (NC) showed to be significantly younger than the group with MCI. HRV is also affected by age, with older adults having lower measured HRV. The NC group have significantly longer years of education compared to the MCI group. The MCI group have significantly lower MoCA scores compared to the Normal group. Since the two groups have significant differences on the age and years of education, the MoCA cutoff scores used were adjusted based on [31]. The cutoff score used is 23/24, having both high specificity and sensitivity. The subjects were also further assessed by geriatric doctors and psychologists. The extracted ECG features are shown in Table 3, with the data represented through median and interquartile range values since most of them are nonparametric data based on the normality test.

Statistical Analysis Results
The data obtained from the experiment were statistically analyzed using nonparametric statistical methods since most of the data were not normally distributed. Statistical analysis was used to investigate the relationship between the variables group, session and the ECG features. The scores and reaction times of the subjects were also analyzed to determine if there were significant differences between the two groups.

Measured HRV Features
The influence of group and session on the ECG features were examined to understand the effects of cognitive impairment and the experiment on the ANS activity of the subjects. To investigate these effects as well as the interaction between factor, Robust Mixed-model ANOVA. As mentioned, robust methods were used since most of the ECG features violated the normality assumption. The results of this test are shown in Table 4. For some of the ECG features (HR, RRI, RMSSD, HF, and LF/HF), there was a significant main effect on the session. There was no significant effect for the group factor, and the interaction between group and session was also not significant. Since there are no significant interactions, the main effects of the two variables, group and session, their main effects were interpreted separately. Since the sessions had a significant main effect on some of the ECG features, it was followed up with pairwise comparisons. The results of the pairwise tests using Wilcoxon signed-rank test are shown in Table 5. Here, the HRV measured from the cognitive aptitude game was significantly different from Rest 1, Rest 2 and Games 2&3. Subjects have higher heart rates and LF/HF ratios and lower RRI, RMSSD, and HF, indicating a sympathetic activation during this session. The lack of significant difference between Rest 1 and Rest 2 indicates that HRV features returned to baseline.

Game Performance
The scores of the subjects were recorded and tallied after each game session. For the first game, the total score is 25 points, with each category having different number of items. There is one point for each of the correct answer, and no points were awarded for wrong answers. For the second and third games, their scores are based on the hit ratio and reaction time. Table 6 shows the scores, hit ratios and reaction times of the subjects, categorized by their cognitive status. It also shows the statistical analysis for the two groups based on their game performance. From the statistical results, there was a significant difference between the two groups in terms of their scores from Game 1, hit ratios and reaction times from Games 2 and 3. The group with MCI have significantly lower scores and reaction times compared to the cognitively normal group. These results are similar to a previous study [28], making the serious games a possible evaluation for cognitive impairment among elderly people.

Classification Results
There were three sets of features used for classification of NC and MCI subjects. The first set only used HRV features, the second only used game performance features, and the third set used both HRV and game performance features. For all sets, TPOT was used for generating the machine learning pipeline, and LOO was implemented for cross-validation of the generated pipeline.

HRV Features Only
There was a total of 40 features for this set. This is due to the measured HRV for each session (10 HRV features × 4 sessions). TPOT constructed a pipeline consisting of Recursive Feature Elimination (RFE), Binarizer and Decision Tree Classifier. RFE is a feature selection method eliminates features by ranking them by importance [49,50]. The binarizer transforms the data to 0 or 1 according to a threshold. The decision tree classifier is a supervised learning algorithm that is used for classification and regression. As with its name, it splits the data into branches until it reaches a threshold value. A decision tree is composed of root nodes, children nodes, and leaf nodes. Using a decision tree classifier has the following advantages: (1) the data does not need to be normalized or scaled, (2) fast and simple, compared to other classifiers, and (3) can handle both categorical and continuous data [51]. After using LOO cross-validation, an accuracy of 68.75% was achieved. It was also able to achieve a sensitivity of 66.67% and a specificity of 70.83% for MCI. The confusion matrix, with recall and precision rates, is shown in Table 7.

Game Performance Features Only
This dataset consists of 13 game performance features. This includes the response times and hit ratios for the three levels for the Whack-a-Mole and Hit-the-Ball games, and the game score from the cognitive aptitude game.
TPOT generated a pipeline consisting of RBFSampler and Decision Tree Classifier. The RBFSampler does not directly use Radial Basis Functions (RBF), but uses a variation of Random Kitchen Sinks, and is able to approximate RBFs [52]. As stated in [53] it approximates an RBF kernel by using the Monte Carlo approximation of the kernel's Fourier transform. As with the previous set, the Decision Tree Classifier was also used for classification. After cross-validation, the achieved accuracy was 83.33%, with a sensitivity of 79.17% and a specificity of 87.50% as shown in Table 8.

Combination of HRV and Game Performance
By combining the HRV and game performance features, the total of features sums up to 53 features. TPOT chose the Decision Tree Classifier, and was able to achieve the accuracy of 81.20%. The sensitivity and specificity were 87.50% and 75%, respectively, as shown in Table 9.

Discussion
The experimental protocol in this study was able to induce multiple alterations in the ANS activities in the subjects, as evident in the changes in the HRV features. The significant increase in heart rate, along with significant decrease in the RRI, while playing Game 1 suggests that the game was able to induce sympathetic activity. This is also further corroborated by the other features, with a significant decrease in RMSSD and HF and a significant increase in the LF/HF ratio. The subjects recovered from this during Rest 2, as indicated by the significant difference in the HR, RRI, HF and LF/HF between Rest 2 and Game 1, and the lack of a significant difference between Rest 1 and Rest 2.
However, from the results in the Robust Mixed-Model ANOVA, the cognitive status of the subjects had no significant effect on the HRV. This is different from what we expected, as the NC group were significantly younger than the MCI group, and age can affect the measured HRV. The changes in the HRV features were more due to the experiment session the subject was engaged in. There is also the lack of interaction between cognitive status and the experiment session on their effects on the HRV. Thus, elderly people regardless of cognitive status can play these games and experience similar changes in their HRV.
In terms of the subjects' game performance, the cognitively normal group had better scores and lower response times compared to the group with cognitive impairment. With these results, the serious games used in this study can be used for cognitive evaluation of the elderly. This is also supported by the high accuracy achieved from classifying the subjects using game performance features.
Some of the results presented in this study differed from some of the literature that focused on cognition and HRV. As mentioned, the cognitive status did not have a significant effect on the HRV, but some studies found a relationship between some HRV features and cognitive function. This could be because not all cognitive domains have a direct correlation to HRV. Based on two review papers, 8 out of 10 studies that used global cognitive tests found a significant positive relationship between HRV parameters and global cognition. For the executive function, 14 out of 26 studies confirm a relationship [25,26]. Lower HRV predicts poorer performance, but this relationship is also dependent on the cognitive task [26]. For memory, three studies found a positive correlation [54][55][56], while seven studies did not [57][58][59][60][61][62][63]. Two studies found a relationship between HRV and language [55,64], but another two did not [57,60]. Six studies investigating the relationship of HRV and processing speed did not find a correlation to HRV features [62,63,[65][66][67][68], while only one study showed a correlation [59].
Using machine learning to classify the cognitive status of the subjects generated moderate to high accuracy, specificity and sensitivity. This does show the potential of using the changes in HRV to evaluate the cognitive status of the subject, but still, using game performance features can gave better results. However, these results need to be confirmed in other experiments with larger sample sizes. Cognitive impairment can exhibit multiple sources of variability, thus having a larger sample size is essential for obtaining reliable results.
In addition to the limitations due to sample size, the experiment flow was carried out in a sequential order. The order of the games was not counterbalanced, thus there could be order effects. However, based on the results, the HRV features between the two resting periods showed no significant difference, indicating that the HRV returned to baseline.

Conclusions and Future Work
In this study, we investigated the HRV of two groups of elderly people: cognitively normal and cognitively impaired. The changes in HRV features were significantly affected by the experiment sessions and not by the cognitive status of the subjects. No interaction was observed between the cognitive status and sessions that can affect the HRV. Thus, the two groups have similar HRV responses while playing the serious games. This means that the elderly can experience similar changes in their HRV regardless of their cognitive status. The need to measure HRV to assess for MCI may not be as crucial as we thought.
Game performance was significantly different between the two groups, with the cognitively normal subjects performing better than the ones who are cognitively impaired. The serious games presented in this study can be useful in assessing cognitive decline in the elderly. This is further corroborated by the use of game performance in the classification of the two groups, achieving an accuracy of 83.33%, compared to the 68.75% using HRV features. While HRV have the potential to be used for classifying cognitive status, game performance can be more accurate. Screening tools such as MMSE and MoCA are easy to administer, but can be affected by the age, educational and cultural background of the patient [7,69]. From the ML classification results, serious games have the potential of being used as an alternative method for detecting cognitive decline aside from using questionnaires and clinical interviews. This result warrants further development on the design of serious games for assessing cognitive impairment.
For future studies, a larger sample size for each group would be recommended to confirm or refute the findings presented in this study. Other serious games designed for the elderly can also be used to verify or challenge the findings in this study. In addition, the cognitive domains and their relationship with the ANS activity and other biomarkers can also be explored. Data Availability Statement: Data is available upon request due to ethical and privacy reasons. The data are not publicly available due to privacy concerns of the subjects.

Conflicts of Interest:
The authors declare no conflict of interest.