Effectiveness of a Serious Game Design and Game Mechanic Factors for Attention and Executive Function Improvement in the Elderly: A Pretest-Posttest Study

: Attention allows us to focus and process information from our environment, and executive function enables us to plan, work, and manage our daily lives. As individuals become older, both of these cognitive abilities decline. It is essential for the elderly to perform more cognitive exercises. Previous studies have shown that arithmetic calculations require attention span and that playing video games requires executive function. Therefore, we developed a serious game involving mental arithmetic calculations speciﬁcally for improving attention span and executive functions. Our objective was to analyze the effectiveness of the game and the efﬁcacy of the game’s mechanic factors affecting attention span and executive function in the elderly. Forty elderly volunteers who are over 60 years of age were invited to join an eight-week cognitive training program through an elderly social welfare center. Four assessment tests were used in pre-test and post-test before and after the training period. D-CAT and SAT are used for screening attention span; TMT-A and TMT-B are used for screening executive function. They were instructed to play the game for at least 15 min per day, 5 days per week, for a total of 8 weeks. There were three independent variables (difﬁculty, pressure, and competition) with two parameters that could be selected. A paired-sample t -test showed the effective results by comparing the pre-test scores and post-test scores of the cognitive training. There were signiﬁcant improvements in attention span and executive functions. The mixed repeated-measure ANOVA and MANCOVA results showed that two game mechanic factors (difﬁculty and pressure) had a signiﬁcant effect and an interaction effect, but the other factor (competition) had a non-signiﬁcant effect. In conclusion, the game showed a signiﬁcant enhancement in both attention span and executive functions after training, and the difﬁculty factor and the pressure factor were shown to have an effect, but the competition factor was shown to have no effect.


Elderly
The global population is rapidly aging, which has become an important topic in all countries. The United Nations (UN) and the World Health Organization (WHO) define older persons as those aged 60 years or over [1,2]. The United Nation report "World Population Prospects 2019" data show that, by 2050, 1 in 6 individuals worldwide will be over the age of 65 (16%), while 1 in 11 (9%) were at this age in 2019. By 2050, 1 in 4 individuals in Europe and North America will be 65 years of age or older. In 2018, the number of individuals aged 65 or over globally surpassed the number of individuals under the age of 5. In addition, the population aged 80 or older is projected to rise from 143 million in 2019 to 426 million in 2050 [3]. Additionally, 80% of the elderly individuals will be living putation, mental computation, and written computation [17]. Star [18], in 2005, mentioned that an in-depth understanding of mathematics is not possible without the acquisition of basic facts and procedural computational skills.

Serial Sevens
Serial Sevens is a well-known attention assessment. The process of the test is a mental arithmetic calculation [19]. It is a screening test for checking the attention span of individuals of different ages. The test is applied in popular global cognitive assessments and cognitive screening instruments (MMSE and MoCA) [20,21]. MMSE and MoCA are widely used in checking a person's cognitive level at different ages. They are used for screening healthy elderly persons and those with mild cognitive impairment (MCI) and dementia [22][23][24][25].
Serial subtraction/addition tasks are serial processing calculations. They comprise a processing stage that is organized serially, where one stage always precedes or follows another, from the beginning to the end of a task-in this case, subtracting form or adding to a serial action repeatedly. These are mental calculation processes. For example, for the Serial Sevens test, participants need to subtract the number 7 from the number 100 serially. Previous studies have shown that both serial subtraction tasks and serial addition tasks require an attention span [26][27][28].
Besides the Serial Sevens (7's), previous studies also use Serial Threes (3's) or Serial Thirteens (13's) for attention screening. There are three types of serial calculation because their difficulty levels are different. For example, for the 3's, participants need to subtract the number 3 from the number 100 serially. When subtracting or adding the whole number 3, the ones place digit changes every time, but the tens place digit changes every 3rd or 4th time (for example, 79 − 3 = 76; 76 − 3 = 73; 73 − 3 = 70). This is easier than the 7's and 13's. For the 7's, participants need to subtract the number 7 from the number 100 serially. When subtracting or adding the whole number 7, the ones place digit changes every time, but the tens place digit changes every 1st or 2nd time (for example, 86 − 7 = 79; 79 − 7 = 72; 72 − 7 = 65), so it is more difficult than the 3's but easier than the 13's. For the 13's, participants need to subtract the number 13 from the number 100 serially. When subtracting or adding the whole number 13, the ones place digit changes every time, and the tens place a digit change every time as well (for example, 87 − 13 = 74; 74 − 13 = 61; and 61 − 13 = 48), which is more difficult than the 3's and 7's [29][30][31].

Attention
William James [32] in 1890 wrote the following: "Attention is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence".
There are three components of human attention: alertness, selectivity, and processing capacity. These components are all strongly related to attention [33]. Previous studies have shown that, by doing simple arithmetic, the frontal, temporal, and parietal cortices in the brain are activated during each task [34]. Those areas of activity in the brain are related to attention control [35,36]. Some studies show that solving simple arithmetic calculations can improve attention span [37][38][39]. Serial subtraction tasks and serial addition tasks require arithmetic and mental calculations, so attention span can be improved by these tasks as well.

Executive Function
Executive function is a mental process that enables us to plan, work, and manage our daily lives. It is a combination of cognitive skills. The foundational components are inhibition, working memory, and shifting [40]. Such functions allow individuals to perform goal-directed problem solving, and this depends upon neural networks involving the prefrontal cortex in the brain [41]. As individuals become older, the capacity for executive function declines [42], but it can be enhanced by doing cognitive training exercises [43,44].
Previous studies have shown that smartphone games or video games can improve executive function, so they could be used as cognitive training tools to enhance executive function [45][46][47][48][49].

Game Design
In the book "Rules of Play, Game Design Fundamentals", Tekinbas [50] mentions that the principle game design schemas are in three areas: rules, play, and culture. Rules are the indispensable logical and mathematical structures of a game; play refers to the player's interaction with the game and other players; and culture refers to the larger cultural context within the game.
Another widely used theory is the MDA [51], i.e., mechanics, dynamics, and aesthetics. Mechanics refers to the components of the game, i.e., the rules of the game; dynamics refers to the run-time behavior of the game, which represents the system of the game; and aesthetics refers to the interaction with the game system, representing "fun" or "gameplay".
One study describes the game design elements in five parts: the game interface design pattern, the game design pattern and mechanics, the game design principles and heuristics, the game models, and the game design methods [52].
These previous studies share similar conceptions and representations of game design. We summarize them into three main parts: game mechanics, which represents the rules, methods, and patterns of the game; gameplay, which represents the interaction with the game; and game culture, which represents the principle and system of the game.
Game mechanics is important in game design. It can be designed and controlled in the game. It includes the rules, methods, and patterns [53,54]. There are three important game mechanic factors in game design that can affect the performance of a player: difficulty, pressure, and competition.
The difficulty of a video game affects player performance, and this is based on the flow theory by the psychologist Mihály Csíkszentmihályi [55] from 1975. In a flow state, a person fully focuses and enjoys the process of the activity. This can be controlled by setting the difficulty level so that it is moderate, and that can lead to a high level of concentration during the task [56]. Therefore, difficulty is an essential game mechanic factor. Nakamura [57] mentioned, in 2014, the following: "Consciousness is the complex system that has evolved in humans for selecting information from this profusion, processing it, and storing it. Information appears in consciousness through the selective investment of attention". Therefore, attention is related to the flow state. It is affected by setting the difficulty level. A standard straight line graph showed in Figure 1 can be used to describe the flow theory. The vertical axis is the challenges (from low to high), and the horizontal axis is the skills (from low to high). Lines with positive slope upwards. The middle part between two lines (Area 2) represents a moderate level of challenges and skills, which means Flow State. The left side (Area 1) represents high challenges and low skills, which means Anxiety. The right side (Area 3) represents low challenges and high skills, which means Boredom [58].
Pressure is a major mechanic game factor that affects the performance of a player. The relationship between pressure and performance is shown in the Yerkes-Dodson law, which is a theory posited by Robert Yerkes and John Dillingham Dodson [59] in 1908. The Yerkes-Dodson bell curve is the most popular curve to describe pressure and performance. A standard bell curve graph showed in Figure 2 can be used to describe the Yerkes-Dodson law, the vertical axis is the performance (from weak to strong), and the horizontal axis is the pressure (from low to high). A normal distribution with a rounded peak tapering away at each end. The peak of the curve (Area 2) represents optimal pressure and optimal performance. The left side of curve (Area 1) represents low pressure and low performance, which means increasing attention and interest with low pressure. The right side of curve (Area 3) represents high pressure and low performance, which means impaired performance because of strong pressure [60]. Time pressure or a time limit is commonly used in game design. It puts pressure on the player to finish a certain task or reach a goal in a limited amount of time, and it is related to performance [61][62][63]. Therefore, suitable time pressure in the game can enhance performance and thus increase attention span and executive function.  Competition is another main mechanic game factor that affects the performance of a player. Competition indicates the presence of a competitor or competitors that compete with the player. Previous studies have shown that competition can increase physical effort and motivation [64], so there is a relationship between competition and performance. Performance can increase cognitive function and enhance attention [65].

Serious Game
A serious game is a game designed for a specific purpose, not only for pure entertainment. It was first introduced by Clark [66] in the 1970s. Nowadays, because of the advance of science and technology in video games and smartphones, serious games have become more popular, especially in training and educational areas. They can be used for cognitive training, and previous studies have shown that there are different kinds of cognitive function training that employ serious games [67][68][69][70].
Most individuals have a smartphone for their daily communication or entertainment [71], but, by installing serious games on a smartphone, it can also be used as a training tool. It is convenient for players to do cognitive training at a place of their choosing. Previous studies have shown that smartphone applications can serve as effective cognitive training tools [72,73]. Therefore, we especially developed a serious game involving mental arithmetic calculations for cognitive training purposes. Most of the previous research used a variety of tasks or exercises in cognitive training to enhance multiple cognitive domains [74][75][76]. This serious game design is unique, and there has not been any research done only focusing on computerized serial calculation tasks, in particular for attention span and executive function training purposes.
Finally, after gathering all those literature reviews and the important key points, it is necessary to design serious games for the elderly to maintain or improve their cognitive ability. Our aim was to develop an effective serious game. Additionally, we sought to discover the game mechanic factors that can affect and enhance the attention span and executive function in the elderly.

Materials and Methods
This study was approved by the Ethics Committee of National Taiwan University and conducted in accordance with the good practice procedures of the Declaration of Helsinki.

Cleverbrain Game Design
We developed a game called Cleverbrain, which is a mental arithmetic calculation game. The game design is based on the features of a cognitive assessment, i.e., the Serial Sevens.
Serial Sevens is a widely used screening test for attention span; it has also been used as a cognitive screening tool. During the test, participants need to mentally calculate questions, i.e., the subtraction of 7 from 100. After providing the correct answer, participants need to continue to subtract seven from the previous answer in a serial fashion. Serial threes and serial thirteens are also screening tests for attention span with the same rule but with different numbers. We also applied them in this game.
Cleverbrain is a serious game designed for smartphones. Its appearance is similar to a card game. There is a set of 10 cards numbered 0 to 9 located at the bottom area of the screen. Players use their fingers to control the movement of the objects (cards) on the touch screen. They need to move the correct answers (cards) into the designated area (dotted line area) and then release their finger to complete the answering process. After that, the internal system can detect that a card is being occupied and then automatically distribute another card to the bottom area.
This is similar to Serial Sevens. Players need to mentally calculate and answer questions serially. Each card contains a one-digit number only, and each answer is formed by a two-digit number from 0 to 99. Players can only move one card at a time, so players need to complete a pair of cards for each answer. At the beginning of each game set, the internal system will firstly deal a pair of random cards, which represent the starting numbers, and players can then start the game from this number.
The goal of the game is to complete six pairs of cards in each set, and there are two sets for each round. The first set is a serial subtraction task, and the second set is a serial addition task. Players need to provide the correct answer in the targeted area (dotted-line area) by moving the cards from the bottom area. For the serial subtraction process, the designated area (dotted-line area) is shown on the left side of the cards. For the serial addition process, the designated area (dotted-line area) is shown on the right side of the cards. For each play, if the answer is correct, the next target area (dotted-line area) will pop up for players to place the next answer; if the answer is wrong, the selected pair of cards will be destroyed, and the player needs to redo it. The internal system inside the smartphone continuously checks the answer during the game.
The game user interface and game process pictures are shown in Figure 3. The top left corner shows the player's name and avatar, which can be set in the game menu, and a coin icon with numbers shows the instant scores of the game. The top center area shows the round number of the game. The top right corner shows the time limit (in seconds) for each set, and a setting button icon is used for other functional controls, such as turning music on/off and pausing the game. The middle area is the gaming area. After the game is started, a pair of random cards is dealt in the middle; the targeted area (dotted-line area) then shows cards on the left side for the serial subtraction task, and the targeted area shows cards on the right side for the serial addition task. The bottom area shows a set of set of 10 cards numbered 0 to 9. Players select the bottom card and move it to the target area (dotted-line area) to answer ( Figure 3). The 1st set is the serial subtraction task, and the 2nd set is the serial addition task. Before the game starts, the screen shows the game rules and details in the middle (for example, this set is Minus 7, a serial subtraction task). (B) After the game is started, a pair of random cards is distributed in the middle area by the internal system. This is the starting number. Players follow this number to play. They select the card in the bottom area and move it to the (left side) targeted area (dotted-line area). (C) Players keep answering serially (left side) until six pairs of cards have been played. (D) After completing the 1st set, the 2nd set then starts afterward. The screen shows the game details in the middle (for example, this set is Add 7, the serial addition task). (E) After the game is started, a pair of random cards is distributed to the middle area by the internal system. This is the starting number. Players follow this number to play. They select the card in the bottom area and move it to the (right side) targeted area (dotted-line area). (F) Players keep answering serially (right side) until six pairs of cards have been played.
We designed a feedback and reward system in this game, and the outcome is based on the player's performance. After each round is finished, the internal system calculates the scores, the accuracy percentage, and the total game time. This feedback allows players to monitor and track their own performance. In terms of reward, if players complete the game with high accuracy, then prizes (trophies and stars) will appear on the screen after the game is completed. Higher accuracy leads to more prizes, and lower accuracy leads to fewer prizes. Moreover, the history of all rewards (total trophies) and feedback (total scores) is recorded and shown in the top center area of the screen. The outlook and layout of the feedback are shown in Figure 4. The feedback and the reward system are designed to motivate and encourage players, thus creating a sense of joy, success, and satisfaction after a game.

Game Mechanic Factor Design
In Cleverbrain, there are three independent variables: difficulty, pressure, and competition, and these are the game mechanic factors. Players need to follow the instruction to select the settings of the game (based on the corresponding group number on Table 1).

2.
Pressure Previous studies have shown that pressure can enhance a player's performance, and time pressure (a time limit) is an important factor in game design. Thus, a suitable time pressure (time limit) is important. In this game, players can choose a suitable time limit (in seconds) or choose the default setting (unlimited time).

3.
Competition Previous studies have shown that competition can help individuals to improve performance. Therefore, having competition (a multiplayer mode) and setting the number of competitors should enhance the player's performance. In this game, players can choose multiplayer mode (competition) and the number of competitors (1-3), or choose the default setting (single-player mode), which means there are no competitors.

Experimental Design
Forty elderly volunteers were invited to join an eight-week cognitive training program through an elderly social welfare center, which is a local community center especially for old people in Hong Kong. All volunteers live in the same common location, they can gather, meet, and interact with one another in the center. Additionally, there are a lot of group and community work activities in the center. The staff in this center invited the elderly who lived in the local community to join this program. They were assigned to play the game for at least 15 min per day, 5 days per week, for a total of 8 weeks. Every participant needed an MMSE score of at least 25 and had to be above 60 years old for enrollment in this study.
Participants needed to do a pretest (before training) and a posttest (after 8 weeks of training). There were four assessments: two for testing the attention span and two for testing executive function.
Before the training started, there was a preparation process. We supported all participants in installing the software on their smartphone or opening the website version on their smartphone. We taught them how to play the game and showed them the rules, the methods, and how to control movement and operation. Additionally, we selected the game settings beforehand for each participant and subsequently allowed them to familiarize themselves with it. After the preparation process was finished, participants could begin training.

Game Mechanic Factors
The game mechanic factors (difficulty, pressure, and competition) had two possible settings that could be selected (with or without). Forty participants were randomly divided into eight groups. Each group had five participants, and the settings of each group are shown in Table 1.

1.
Difficulty There are two difficulty settings that can be selected: "with" or "without". The setting is selected before training. If the game is "with" a difficulty level setting, a moderate difficulty level (not too easy or not too hard) is set for the player, which should enhance performance. Therefore, before training, the difficulty level was set based on each participant's evaluation of their skill and their acceptability of the difficulty level. The difficulty level can be adjusted or gradually increase during the period of training because the participant's skill might increase during training. Thus, what is considered a suitable difficulty level can be adjusted over time. The literature review shows that players can improve attention span and executive function by adjusting the difficulty level [77][78][79]. If the game is "without" a difficulty level setting, a default setting (the easy level) is selected and fixed during the training period.

2.
Pressure There are two pressure settings that can be selected: "with" and "without". The setting is selected before training. If the game is "with" a pressure setting, there is a time limit in each game. A moderate time limit (not too little or not too much) was selected for each participant, to enhance performance. Therefore, before training, the time limit (in seconds) was selected based on each participant's evaluation of their skill and their acceptability of time pressure. The time limit (in seconds) can be adjusted or gradually decrease during the training because the participant's skill might increase. Thus, what is considered a suitable time pressure can be adjusted over time. The literature review shows that players can improve attention span and executive function by setting the time limit [46,80,81]. If it is "without" pressure, a default setting (unlimited time) is selected. Participants can spend an unlimited amount of time to complete the questions in each game, and they are fixed during the training period.

3.
Competition There are two competition settings that can be selected: "with" and "without". The setting is selected before training. If the game is "with" competition, there is a competitor in each game, which is the multiplayer mode. Therefore, a multiplayer mode was set, and a suitable number of competitors, from 1 to 3, could be selected. Because each participant's skill might increase during the training, this can be adjusted or gradually increased during the training to fit what is considered a suitable number of competitors. Thus, what is considered a suitable number of competitors can be adjusted over time. We used a computer player (NPC, non-player character) as the competitor in this game. This character is controlled by the computer and acts as another player in multiplayer mode. The player is created by computer programming and attempts to mimic a human player. The literature review shows that players can improve attention span and executive function by interacting in the game [82,83]. If the game is "without" competition, a default setting (no competitor) is selected. It is then in single-player mode, and it is fixed during the training period.

Software and Game Engine
We used JavaScript as the programming language and used the Construct 3 game engine for development. JavaScript is a high-level programming language and can run in Construct 3. Construct 3 is a user-friendly game engine that allows users to export the game to a range of platforms, such as the web (HTML5), Android, iOS (via Cordova), and desktop apps (via Windows/macOS wrappers or NW.js). Thus, installation and application are convenient for users. We also provided a website version for users who prefer to play the game through their smartphone web browser.
During the eight weeks of cognitive training, we contacted each participant to check their game status and see whether they needed support.

Cognitive Screening Assessment Test
A cognitive screening assessment test (MMSE) [20] was used to screen the participants before training. Only participants who scored 25 or higher could join the experiment.
The MMSE, Mini-Mental State Examination, is a widely used assessment for total cognitive screening. It is a 30-point questionnaire that measures cognitive impairment. A score of 25 or higher is normal. A score of 24 or lower is abnormal, which indicates that the person might have cognitive impairment.

Attention Assessment Test
The two attention assessment test (D-CAT and SAT) was used in this experiment to screen attention span. D-CAT [84], a Cancellation Test of Digits, is an attention assessment. There are three 10 × 13 lists of number, each with 130 numbers, for a total of 390 numbers. Numbers include 0 to 9. For the first number list, subjects need to find one preset number (5) in the list; for the second number list, subjects need to find two preset numbers (2 and 6) in the list; for the third number list, subjects need to find three preset numbers (1, 4, and 9) in the list. Subjects need to look at all numbers in the list and recognize whether they are correct or not. The primary measure of this test is the correct answers. Therefore, the maximum score is 390.
SAT [85,86], the sustained attention task, is another attention assessment. There are 29 numbers, including numbers from 1 to 9. There are 11 instances of the number 1. SAT target detection uses tapping. When the subject hears the number 1, they have to tap on the table to respond. If it is not number 1, they should not respond. The primary measure of this test is the correct tapping number. Therefore, the maximum score is 11.

Executive Function Assessment Test
Two executive function assessments (TMT-A and TMT-B) were used in this experiment. They are well-known tests for screening executive function.
TMT-A [87][88][89], Trail Making Test A, is an executive function test. There are 25 numbers on a paper on which the number positions are irregularly placed. The task requires subjects to connect numbers from 1 to 25 in a sequence, and examiners need to measure the time it takes to complete all 25 numbers starting from the beginning (the number 1). The primary measure of this test is the time of completion. There is no time limit for this test.
TMT-B [89,90], Trail Making Test B, is another executive function test. The original English version connects the number and alphabetic character alternately in sequential order, such as 1, A, 2, B, 3, C, 4, D . . . , for a total of 12 numbers and 12 English letters on a paper on which the positions are irregularly placed. However, all the participants in this study are native Chinese speakers. Previous studies have shown that the language abilities of a person can affect the result of this test. Using the Chinese version of Trail Making Test B for native Chinese speakers is more suitable and will produce more accurate results. Therefore, we used the Chinese version, where Chinese characters representing sequential numbers replaced the English alphabetic characters, connecting the numbers and Chinese characters alternately in sequential order, for a total of 12 numbers and 12 Chinese characters. The primary measure of this test is the time it takes to complete it. There is no time limit for this test.

Statistical Methods
Firstly, a paired sample t-test was used to analyze the comparison of the pre-test score and the post-test score. There are four assessments. By doing this analysis, we can understand the effectiveness of the game design in the 8-week training period for elderly participants. The result shows whether there are significant differences between pre-test scores and post-test scores.
Secondly, a mixed repeated-measure ANOVA was used to analyze the pre-test score and the post-test score with three independent variables (difficulty, pressure, and competition) with two possible settings (with and without). By doing this analysis, we can understand whether there is a significant effect or an interaction effect of these three independent variables.
Thirdly, graphs were plotted to show the relationship between variables and the interaction effect of variables, to help one visualize the variation of effect and show how the interaction affects the participants' performance.
Additionally, we conducted multivariate analyses of covariance (MANCOVA) for the post-test scores in four assessment tests (attention span and executive function). The post-test scores in all measures were the dependent variable, and the game mechanic factors (difficulty, pressure, and competition) were the independent variables. The pre-test scores were the covariate, which can show the effect of the independent variables on the dependent variables without unwanted interference.
Finally, we used test-retest reliability to measure the consistency of four assessment tests (D-CAT, SAT, TMT-A, and TMT-B) of pre-test and post-test scores in different groups. There were eight groups of participants with different game settings in the serious game (Table 1).

Results
There were 40 participants enrolled in this study, including 24 females and 16 males. The mean age (year) was 65.18, and the standard deviation was 2.07; the mean of education completion (year) was 7.78, and the standard deviation was 1.85; and the mean of MMSE score was 26.83, and the standard deviation was 1.06 (Table 2).

D-CAT Assessment Test Results
The pre-test scores and post-test scores showed that the mean of the pre-test was 373.63, the standard deviation was 2.35, and the standard error was 0.37; the mean of the post-test was 382.30, the standard deviation was 5.06, and the standard error was 0.80 (Table 3) ( Figure 5).  The paired sample correlation showed that there was a significant difference (p < 0.001) ( Table 4). The paired sample t-test result showed that there was a significant difference (p < 0.001) (     Table 7 shows the mean of scores and other details of three independent variables (difficulty, pressure, pre/post-test) with two levels in the D-CAT.  Figure 6A shows two separate lines, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the D-CAT. Figure 6B shows two separate lines, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * pressure in the D-CAT. The interaction of pre/post-test * difficulty with pressure condition ( Figure 7A) shows two lines intersecting each other, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the D-CAT. The interaction of pre/post-test * difficulty without the pressure condition ( Figure 7B) shows two separate lines, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the D-CAT. However, the forms of these two interaction graphs were different from each other (with and without pressure), indicating that the interaction of pre/post-test * difficulty had different effects on the dependent variable at different levels of the pressure factor, which showed that there was a three-factor interaction effect.
In conclusion, D-CAT results showed the significant improvement of attention span. The difficulty factor and pressure factor had a significant effect and an interaction effect.

SAT Assessment Test Results
The pre-test scores and post-test scores showed that the mean of the pre-test is 8.53, the standard deviation is 0.78, and the standard error is 0.12; the mean of the post-test is 9.68, the standard deviation is 0.99, and the standard error is 0.16 (Table 8) ( Figure 5). The paired sample correlation showed that there was a significant difference (p < 0.05) ( Table 9). The paired sample t-test result showed that there was a significant difference (p < 0.001) (     Table 12 shows the mean of scores and other details of three independent variables (difficulty, pressure, and pre/post-test) with 2 levels in SAT.  Figure 8A shows two separate lines, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the SAT. In Figure 8B shows two lines that intersect each other, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * pressure in the SAT. The interaction of pre/post-test * difficulty with pressure condition ( Figure 9A) shows two separate lines, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the SAT. The interaction of pre/post-test * difficulty without the pressure condition ( Figure 9B) shows two lines intersecting each other, and they are not parallel due to the different direction. So, there was an antagonistic interaction of pre/post-test * difficulty in the SAT.
However, the forms of these two interaction graphs were different from each other (with and without pressure), indicating that the interaction of pre/post-test * difficulty had different effects on the dependent variable at different levels of the pressure factor, which showed that there was a three-factor interaction effect.
In conclusion, SAT results showed a significant improvement in attention span. The difficulty factor and pressure factor had a significant effect and an interaction effect.

TMT-A Assessment Test Results
The pre-test scores and post-test scores showed that the mean of the pre-test was 61.15, the standard deviation was 2.25, and the standard error was 0.36; the mean of the post-test was 56.53, the standard deviation was 3.77, and the standard error was 0.60 (Table 13) ( Figure 10).  The paired sample correlation showed that there was a significant difference (p < 0.05) ( Table 14). The paired sample t-test result showed that there was a significant difference (p < 0.001) (     Table 17 showed the mean of scores and other details of three independent variables (difficulty, pressure, pre/post-test) with 2 levels in TMT-A.  Figure 11A shows two lines intersecting each other, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was exponential an interaction of pre/post-test * difficulty in the TMT-A. Figure 11B shows two separate lines, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * pressure in the TMT-A. The interaction of pre/post-test * difficulty with pressure condition ( Figure 12A) shows two lines intersecting each other, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/posttest * difficulty in the TMT-A. The interaction of pre/post-test * difficulty without pressure condition ( Figure 12B) shows two lines intersecting each other and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the TMT-A. However, the forms of these two interaction graphs were different from each other (with and without pressure), indicating that the interaction of pre/post-test * difficulty had different effects on the dependent variable at different levels of the pressure factor, which showed that there was a three-factor interaction effect.
In conclusion, the TMT-A results showed the significant enhancement of executive function. The difficulty factor and pressure factor had a significant effect and an interaction effect.

TMT-B Assessment Test Results
The pre-test scores and post-test scores showed that the mean of the pre-test is 69.43, the standard deviation is 1.65, and the standard error is 0.26; the mean of the post-test is 66.25, the standard deviation is 2.55, and the standard error is 0.40 (Table 18) (Figure 10). The paired sample correlation showed that there was a significant difference (p < 0.05) ( Table 19). The paired sample t-test result showed that there was a significant difference (p < 0.001) (     Table 22 shows the mean of scores and other details of three independent variables (difficulty, pressure, and pre/post-test) with two levels in TMT-B.  Figure 13A shows two lines intersecting each other, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the TMT-B. Figure 13B shows two lines intersecting each other, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * pressure in the TMT-B. The interaction of pre/post-test * difficulty with the pressure condition ( Figure 14A) shows two lines intersecting each other, and they are not parallel due to the different slopes. The blue line is steeper than the green line. So, there was an exponential interaction of pre/post-test * difficulty in the TMT-B. The interaction of pre/post-test * difficulty without the pressure condition ( Figure 14B) shows two separate lines, and they are not parallel due to the different direction. So, there was an antagonistic interaction of pre/post-test * difficulty in the TMT-B.
However, the forms of these two interaction graphs were different from each other (with and without pressure), indicating that the interaction of pre/post-test * difficulty had different effects on the dependent variable at different levels of the pressure factor, which showed that there was a three-factor interaction effect.
In conclusion, the TMT-B results showed the significant enhancement of executive function. The difficulty factor and pressure factor had a significant effect and an interaction effect.

MANCOVA Results
The results of the MANCOVA showed the significant difference of attention span and executive function.
A 2 × 2 × 2 between-subjects multivariate analysis of covariance (MANCOVA) was performed on four dependent variables: the post-test scores of D-CAT and SAT (for attention span), and the post-test scores of TMT-A and TMT-B (for executive function), after controlling for pre-test scores. Independent variables are levels of difficulty (with and without), pressure (with and without), and competition (with and without).
Assumptions of normality, and the homogeneity of variance-covariance matrices, were evaluated. The Box's M result showed 159.77, which indicates that the homogeneity of covariance matrices across groups is assumed (F (70, 1401.94) = 1.25, p = 0.084). The linearity and multicollinearity were satisfactory.
For multivariate tests, after controlling for pre-test scores, with the use of Wilks' Lambda criterion, the combined dependent variables were significantly different in terms of levels of difficulty (Wilk's Lambda Λ = 0.06, F (4, 25) = 106.95, p < 0.001, partial η2 = 0.95), and pressure (Wilk's Lambda Λ = 0.07, F (4, 25) = 85.29, p < 0.001, partial η2 = 0.93), and not significant in terms of competition (Wilk's Lambda Λ = 0.81, F (4, 25) = 1.44, p > 0.05, partial η2 = 0.19). Additionally, there was a significant interaction effect of difficulty and pressure (Wilk's Lambda Λ = 0.22, F (4, 25) = 22.77, p < 0.001, and partial η2 = 0.79) (Table 23). Levene's test of equality of error variances showed that all four assessment post-test scores were not significant p > 0.05. A non-significant result indicates it had met the assumption of homogeneity of variance (Table 24). Four tests of between-subjects effects showed that there was statistical significance (p < 0.05) in difficulty factor, pressure factor, and the interaction effect of difficulty and pressure on four dependent variables (D-CAT, SAT, TMT-A, and TMT-B post-test scores). Additionally, there was an interaction effect of difficulty and pressure, and competition was shown to be statistically significant (p < 0.05) on one dependent variable (SAT) but not significant in any other assessment tests (D-CAT, TMT-A, and TMT-B) ( Table 25).

Test-Retest Reliability Results
Results of test-retest reliability (intraclass correlation coefficient) in different groups are shown in Table 26. We use test-retest reliability to measure four assessment tests (D-CAT, SAT, TMT-A, and TMT-B) consistency of pre-test scores and post-test scores in different groups. There were eight groups of participants with different game settings in the serious game (Table 26). D-CAT results showed a high correlation coefficient in groups 1-6 and 9 (good reliability, ≥0.8 and <1.0); a medium correlation coefficient in group 7 (acceptable reliability, ≥0.6 and <0.8). The SAT results showed a high correlation in groups 1-3, 5, 7, 8 (good reliability, ≥0.8 and <1.0); a medium correlation coefficient in group 6 (acceptable reliability, ≥0.6 and <0.8); and a low correlation coefficient in group 4 (poor reliability, ≥0.4 and <0.6). The TMT-A results showed a high correlation in groups 2 and 5-7 (good reliability, ≥0.8 and <1.0); a medium correlation coefficient in groups 1, 3, and 4 (acceptable reliability, ≥0.6 and <0.8); and a low correlation coefficient in group 8 (poor reliability, ≥0.4 and <0.6). The TMT-B results showed a high correlation in groups 3, 5, 6, and 8 (good reliability, ≥0.8 and <1.0) and a medium correlation coefficient in groups 1, 2, 4, and 7 (acceptable reliability, ≥0.6 and <0.8).
In conclusion, the total number of results was 32. Twenty-one results displayed good reliability (≥0.8 and <1.0); nine results displayed acceptable reliability (≥0.6 and <0.8), and two results showed poor reliability (≥0.4 and <0.6).

Discussion
Our objective of the study was to analyze the effectiveness of the game and the efficacy of game mechanic factors affecting attention span and executive function in the elderly.
Regarding attention span, from the data of the two attention assessments (D-CAT and SAT), we used a paired sample t-test to compare the pre-test and post-test results of the game training. This showed that there is a significant difference between D-CAT (p < 0.001) and SAT (p < 0.001), which indicates that the game is effective. With the advancement of technology, individuals can improve their attention span by training with the serious mental calculation game in this study. This also proves that this game design can successfully turn traditional mental calculation into a new game on smartphones and with attentionspan-improvement outcomes. Moreover, many individuals have smartphones and use them daily; it is more convenient for them to undergo cognitive training with mobile technology devices.
Regarding executive function, other studies have shown that traditional arithmetic calculation does not enhance executive function [91,92]. However, by using this serious smartphone game on mental calculation, participants can improve their attention span and executive function. According to the data of two executive function assessments (TMT-A and TMT-B), we used the paired sample t-test to compare the results of the pre-test and post-test scores of the game. This showed that there is a significant difference between TMT-A (p < 0.001) and TMT-B (p < 0.001). This might be because the process of the traditional arithmetic calculation during training is straightforward. Subjects either verbally answer an examiner or write down the answer on paper. However, the game requires both the use of the brain and the movement of the hands and fingers. Players need to mentally calculate the question, choose cards, use their fingers to select them, and then execute commands by moving the correct answer into the target position. After that, players removed their finger from the screen for the final answering process. This combination of mental and physical operation activity on a smartphone is more complicated than a traditional arithmetic calculation. Therefore, this could be the reason that this game can improve both attention span and executive function.
Difficulty is an important game mechanic factor. The mixed repeated-measure ANOVA showed that there are significant differences between D-CAT (p < 0.001), SAT (p < 0.001), TMT-A (p < 0.001), and TMT-B (p < 0.001). In the theory of flow, a moderate difficulty level can increase performance, where a suitable level of difficulty means that a game is moderately difficult, which is an effective result in this experiment. The difficulty level might provide a challenge for individuals. They want to overcome a challenge and feel successful. This makes the game more attractive and makes players focus on it, thus improving attention span and executive function. On the other hand, if it is too easy, players might feel less challenged and become bored, which would cause players to focus less on the game and thus not improve their cognitive function at all. Additionally, if the game is too difficult, players might feel that the goal is unreachable, that there is a low chance of winning, and that there is a high chance of disappointment, which would lead to decreased concentration and thus no improvements in cognitive function [55,93,94]. Moreover, players can adjust or gradually increase the difficulty level during the game. Players might have increased their skill during the training period. They might feel that the original level is too easy, in which case players can gradually increase the difficulty level. Examples such as these might be what effectively leads to improvements in attention span and executive function.
Pressure is another important game mechanic factor. The mixed repeated-measure ANOVA showed that there are significant differences between D-CAT (p < 0.001), SAT (p < 0.001), TMT-A (p < 0.05), and TMT-B (p < 0.001). The Yerkes-Dodson law states that appropriate and suitable pressure can increase performance, thereby improving attention span and execution ability. When the time is limited in the game, players must complete the goal of each game within the time limit. A suitable amount of time pressure can make the player focus more on the game, which can improve their attention span and executive function. The Yerkes-Dodson bell curve describes the relationship between pressure and performance. We can see that the best performance is at the highest middle point of the bell curve, which is a moderate amount of pressure [60,[95][96][97]. This might lead to suitable time pressure, which could enhance attention span and executive function. The time limit (in seconds) can be adjusted or gradually decreased during the game. The participant's skill might have increased in the training period. They might feel that the original time limit is too long, in which case they can decrease the time limit. The time limit can be adjusted to best fit each participant during the training. These might be what effectively leads to improvements in attention span and executive function.
Competitive interaction is another important game mechanic factor. The mixed repeated-measure ANOVA showed that there are no significant differences between D-CAT (p > 0.05), SAT (p > 0.05), TMT-A (p > 0.05), and TMT-B (p > 0.05). Previous studies have shown that competition can increase performance for young people and adults, which includes the improvement of attention, physical effort, and motivation [65,98,99]. Not all digital games are suitable for the elderly. Therefore, a special design competitive game is necessary that can provide a positive experience and suitable for older players [100]. However, this experiment result shows that competition had a non-significant effect. There might be two reasons for this. Firstly, we used computer players (NPC, non-player character) as competitors. The competitor is controlled by a computer and is not human. Therefore, interactivity between humans might be different, and this might affect the effectiveness of the competition factor as human players might feel and act differently if they know that their competitors are computer players [101][102][103]. Another possible reason for this is the lack of communication with teammates or other competitors during the game [104,105]. In a competition, there is communication, for example, between teammates; teamwork and inspiration are necessary, and players can chat with other competitors during a game to exchange information. These two reasons might explain why the competition factor had a non-significant effect in game training.
There is an interaction effect between the difficulty factor and the pressure factor. The mixed repeated-measure ANOVA showed that there are significant differences between D-CAT (p < 0.001), SAT (p < 0.01), TMT-A (p < 0.01), and TMT-B (p < 0.05). This interaction effect indicates that the difficulty factor and the pressure factor are both essential and strongly related to the game. The reason for this might be that both factors can improve the attention span and executive function. Moreover, based on the theory of flow and the Yerkes-Dodson law, both factors have a similar feature, i.e., they both need to have a suitable amount of input to obtain the best performance. A moderate amount of input can lead to a more effective performance result. Therefore, the similarity of these two game mechanic factors might explain the interaction effect.
MANCOVA results were similar to the results of mixed repeated-measure ANOVA. For multivariate tests, after controlling for pre-test scores, with the use of Wilks' Lambda criterion, the combined dependent variables were significantly different regarding the levels of difficulty and pressure, and the interaction effect of difficulty and pressure. Tests of between-subjects effects showed that the main effects of difficulty and pressure were significantly different, and the interaction effect of difficulty and pressure was significant different in all four assessment tests (D-CAT, SAT, TMT-A, and TMT-B). Additionally, there was one interaction effect of difficulty and pressure, and competition was significantly different in the SAT assessment test but no significant in any other assessment tests (D-CAT, TMT-A, or TMT-B).
Test-retest reliability measured the test consistency (the reliability) over time in this study. We measured the test results when repeating the same test on the same group of people in the pre-test and post-test. The results showed the consistency of pre-test scores and post-test scores of four assessment tests (D-CAT, SAT, TMT-A, and TMT-B) in eight different groups. Most of the results showed good reliability (≥0.8 and <1.0), and some results showed acceptable reliability (≥0.6 and <0.8).

Conclusions
Attention ability and executive function will gradually decline as individuals age. It is essential that the elderly do more cognitive exercise. This study concerns a serious game designed for attention span and executive function training in the elderly. We developed a mental arithmetic calculation game for use on a smartphone based on an attention assessment, Serial Sevens, and also by combining both serial subtraction tasks and serial addition tasks. The results show that participants improved attention span and executive function after training. The difficulty and pressure game mechanic factors showed a significant effect and an interaction effect.
This study makes some important contributions. First of all, the elderly are normally unfamiliar with ICT (information and communication technologies), and most do not play video games or smartphone games. This study shows that the elderly can be trained with a serious smartphone game and that it is suitable for them. Nowadays, with the advancement of technology, a smartphone is not only a communication tool but can also be a cognitive training tool for the elderly.
Furthermore, previous studies have shown that arithmetic calculation traditionally requires a long attention span and that attention can only be enhanced through training. We modified a new mental calculation game designed for smartphones to provide cognitive training. The results show that participants not only significantly improved attention span but also effectively improved executive function.
Finally, the difficulty and pressure game mechanic factors are the important elements in serious game design. The results show that a moderated level of both factors can obtain better performance. These are the valuable and crucial contributions of this study.

Future Recommendation
This research can act as a reference for game developers designing a game for cognitive training in the future. They can apply this design and the game mechanic factors to other games designed to increase cognitive function.
Analyzing other age groups is suitable for further research on the effectiveness of serious games involving mental arithmetic calculations, for children, teenagers, and adults. Because individuals of different ages have different levels of cognitive function abilities, by comparing the pre-test and post-test scores of training, we can determine whether there are significant effects on attention span and executive functions. Moreover, analyzing the effectiveness of game mechanic factors (difficulty, pressure, and competition) on groups of different ages is another suggestion for future research.