Serious Games and Mathematical Fluency: A Study from the Gender Perspective in Primary Education

: In recent years, serious games offer great opportunities for learning processes at schools. However, it is unclear whether this type of proposals can offer differentiated answers among the students according to their gender. In this context, the aim of this paper is to know the possible differences that occur in primary school classrooms according to gender, with serious games designed for the development of mathematical ﬂuency, and to examine to what extent these games contribute to the overall school performance. We carried out a quasi-experimental study, including pretest and posttest, without control group and with several experimental groups, and the participation of 284 students from ﬁrst to fourth grade. The results show that the software beneﬁts boys and girls equally, compared to the previously followed methodology that beneﬁted boys. A clear relation between the results achieved and the performance in the overall students’ grades has also been observed. The conclusions show the potential of serious games in school settings, and the opportunity to approach performance differences based on the gender.


Introduction
The possibilities offered by serious games has been an area of increasing interest over the past years in the field of education. Unlike video games, which are created for entertainment purposes, serious games focus on the educational aspect [1] on one of the possible developments that this type of software is able to offer [2]. Serious games recover characteristic elements of the students' leisure time and take them to the classroom, generating an experience that promotes learning.
The presence of video games has progressively ousted other forms of entertainment. Data provided by the Spanish Video Game Association [3] show very high average weekly consumption figures: 6.7 h a week on average in Spain (in countries like the UK the figure almost doubles, with 11.6 h) and a male profile in 58% of the cases. This should be added to a larger presence in the lower age ranges: 75% of the boys and girls aged between 6 and 10 years fit this profile, thus reducing the percentage as we include higher ages. With these consumption levels, we find studies that point out the benefits of video games [4], also shared with serious games [5]. However, they are not exempt from possible negative assessments either. Research exists [6] showing that the use of video games during school days is related to a worse academic performance. Moreover, intensive video gamers have worse academic results in a research line that relates excess time dedicated to this type of leisure with problems in school performance [7], a situation occurring twice as often among boys than among girls [8].
However, the use of serious games offers an opportunity, because students showing difficulties working in ordinary situations at school increase their engagement in this type of proposals [9]. We do also have to be aware of possible improvements in learning processes, which are highly dependent on the contexts [10] and the design of the game experience [11].

Serious Games and Mathematics
In primary education schools, serious games can help to improve the learning of the subjects, and, among them, the field of mathematics is a recurrent space of interest. It is also a place of research due to the gender gaps that affect achievements, attitudes and relationship with this field of knowledge [29]. Research shows that these gaps change based on cultural variations and opportunities for girls and women in aspects such as equity in school enrolment or women's participation in research work [30]. However, at the primary education level there are no differences between boys and girls in mathematics performance, a situation that changes in favor of boys in secondary school and university [31]. What is identified in this type of content is that the gender difference in the performance in activities involving competitiveness among students is not the same as that produced in non-competitive tasks. This competitiveness, when it enters the picture, could be affecting the size of the gender gap in mathematics test scores, potentially exaggerating the mathematical advantage of men over women when learning is similar [32]. All of these elements globally influence the performance of boys and girls and explain, together with factors such as the socioeconomic level, the significant differences shown in international studies for the Spanish context compared to other countries [33].
One way to see the progress in the first levels of primary education in mathematics is through the concept of mathematical fluency [34]. The measure of this element can work as a good indicator of school improvement with the use of serious games. Recent research relates the use of serious games with the improvement of mathematical fluency [35], being also a benefit that is projected in the set of key learning in schools [36].
In order to probe into the possible improvements in mathematical fluency due to the use of serious games, we have recently carried out an investigation in this area with the ReflexMath software. It is a program frequently used in schools in the United States that focuses on the mathematical fluency of calculus ( Figure 1) with a curricular structure that follows the indications of the Council of Chief State School [37]. The software generates fully individualized learning pathways from the massive collection of user data, mainly based on response times and the error rate. A virtual assistant, in the form of a child character, provides help when the calculation processes do not improve. In serious games, the activation of the assistant is based on the error, considering it an opportunity for learning and not a penalty. This character, following the indicated curricular proposal, offers specific strategies for the calculation operation detected as especially difficult for students. The work of systematizing the calculation is based on video games in which the user's interaction with the characters and the proposed situations is achieved by solving calculation operations adapted in real time to their profile. The software, in addition to the serious game, has an integrated gamification system generating rewards through exchangeable points for improvements to the game's avatar, awards in the form of diplomas, etc. similar to a microform of digital badging. Teachers have the possibility of monitoring the process through a specific dashboard that offers information through tools based on learning analytics ( Figure 2). relates the use of serious games with the improvement of mathematical fluency [35], being also a benefit that is projected in the set of key learning in schools [36]. In order to probe into the possible improvements in mathematical fluency due to the use of serious games, we have recently carried out an investigation in this area with the ReflexMath software. It is a program frequently used in schools in the United States that focuses on the mathematical fluency of calculus ( Figure 1) with a curricular structure that follows the indications of the Council of Chief State School [37]. The software generates fully individualized learning pathways from the massive collection of user data, mainly based on response times and the error rate. A virtual assistant, in the form of a child character, provides help when the calculation processes do not improve. In serious games, the activation of the assistant is based on the error, considering it an opportunity for learning and not a penalty. This character, following the indicated curricular proposal, offers specific strategies for the calculation operation detected as especially difficult for students. The work of systematizing the calculation is based on video games in which the user's interaction with the characters and the proposed situations is achieved by solving calculation operations adapted in real time to their profile. The software, in addition to the serious game, has an integrated gamification system generating rewards through exchangeable points for improvements to the game's avatar, awards in the form of diplomas, etc. similar to a microform of digital badging. Teachers have the possibility of monitoring the process through a specific dashboard that offers information through tools based on learning analytics ( Figure 2).  The data from the research carried out with this software in a primary education center show that, in a context such as Spain and with a curriculum structured differently from the American one, this software leads to great improvements in mathematical fluency in schools with ordinary schooling students in the first four levels of primary education with a statistically significant and large improvement in mathematical fluency. The data are consistent with previous studies with the same software with much smaller samples [38] and also with students with learning difficulties [39].
However, surprisingly, the differences in performance shown in the pretest segmented by gender turn out to be very favorable towards boys. The pretest data also showed that, with the methodology used in the school center, these differences did not decrease over the years. In addition, it was an aspect in which the educational center of the study showed concern because it had never been assessed and there was no awareness of this fact. It is an emerging element that requires analysis and that could potentially be worked on by means of an alternative methodology through a serious game. For all of these reasons, the objectives of this research work are, on the one hand, to analyze from a gender perspective the performance of students in mathematical fluency before and after using a serious game, and the use that they have made of the serious game in terms of number of days they have used it and the number of activities resolved; and, on the other hand, to study from a gender perspective the role that classroom work with serious game plays in the general school performance of students. The data from the research carried out with this software in a primary education center show that, in a context such as Spain and with a curriculum structured differently from the American one, this software leads to great improvements in mathematical fluency in schools with ordinary schooling students in the first four levels of primary education with a statistically significant and large improvement in mathematical fluency. The data are consistent with previous studies with the same software with much smaller samples [38] and also with students with learning difficulties [39].
However, surprisingly, the differences in performance shown in the pretest segmented by gender turn out to be very favorable towards boys. The pretest data also showed that, with the methodology used in the school center, these differences did not decrease over the years. In addition, it was an aspect in which the educational center of the study showed concern because it had never been assessed and there was no awareness of this fact. It is an emerging element that requires analysis and that could potentially be worked on by means of an alternative methodology through a serious game.
For all of these reasons, the objectives of this research work are, on the one hand, to analyze from a gender perspective the performance of students in mathematical fluency before and after using a serious game, and the use that they have made of the serious game in terms of number of days they have used it and the number of activities resolved; and, on the other hand, to study from a gender perspective the role that classroom work with serious game plays in the general school performance of students.

Design
To achieve the intended objectives, a quasi-experimental design with pretest-posttest was proposed, without a control group and with several experimental groups. The study was carried out in a standard school setting, where the classroom-groups were already established (the assignment of the subjects to the groups was not random). Each group

Design
To achieve the intended objectives, a quasi-experimental design with pretest-posttest was proposed, without a control group and with several experimental groups. The study was carried out in a standard school setting, where the classroom-groups were already established (the assignment of the subjects to the groups was not random). Each group has worked completely independently, without instructor-initiated intervention, using the ReflexMath educational software. A mathematical fluency test was applied before and after its use. The teachers of the mathematics area were in charge of putting the proposal into practice in each classroom. To do so, the same indications were given to them on the use of the possibilities offered by the system. For the performance of the study, we had the informed consent of the educational center.

Participants
In this research there were 12 primary education classrooms between the first and fourth grade levels, belonging to a private (concerted) educational center located in an urban context of the Autonomous Community of Galicia. Specifically, 284 students participated, of which 54.2% were boys and 45.8% were girls. As regards the academic year, the sample was configured as follows: 24.3% first grade; 25.4% second grade; 25.7% third grade; and 24.6% fourth grade. First, the proposal was presented to the school leadership team and the 25 teachers who teach in the primary education stage. The participation in the research work received a positive feedback, especially by the mathematics teachers.

Instrument
The assessment of calculus learning was carried out through the Basic Math Operations Task (BMOT) [40], which was extracted from a later publication [39], and which was completely translated into Spanish, both in its pretest and posttest versions. This instrument includes combined operations for calculating additions, subtractions, multiplications and divisions, which the students must answer in a maximum time of one minute. The test is entirely suited to the curricular level of third and fourth grades of primary education, while for the first and second grades it was adapted including only addition and subtraction. The test correction generates an individual performance indicator based on the total rate of operations that are answered correctly. This indicator matches the object of work of the software, justifying its relevance.

Procedure
The study was carried out between September 2019 (application of the pretest) and December 2019 (application of the posttest). The software was used during the classroom work time in the math classes, at the rate of three sessions a week, as recommended by the manufacturer. Data were collected on the gender of the students and their academic performance, specifically the grades of all the knowledge areas assessed at the end of the first quarter of the school year, coinciding with the posttest. It was also been possible to obtain information on the number of days of use of the program and the volume of activities resolved in each case, through a specific tool that allows access to the system database. Moreover, as regards the teaching staff, data was taken from the years of teaching experience.

Data Analysis
The software used for data analysis has been IBM SPSS Statistics, version 25. Regarding the first objective, measures of central tendency and dispersion were calculated and the Student's t test of independent samples was applied to find out if there were statistically significant differences based on gender. For the comparison between the pretest and the posttest within the same group of male or female students, the t-test was applied for related samples. For these statistical hypothesis tests, a significance level of 0.05 was established. In addition, to know the size of the effect, coefficients related to Cohen's d family were used, specifically the formulas corresponding to d p , in the case of the independent groups (boys vs. girls), and to d D , in the case of the related measures test-posttest within the same group [41]. The resulting values were interpreted according to conventional criteria [42]: 0.2 implies a small effect size; 0.5, medium; and 0.8, large. In relation to the second objective, simple regression analysis was used, given the nature of the criterion variable (general academic grade) and attending to a single explanatory variable (the posttest score, the number of days of use and the number of activities performed with ReflexMath were taken separately).

Results
First, the results in mathematical fluency and use of serious play are presented according to gender. Next, we go deeper into the different educational levels and descend to certain classrooms to analyze in detail the differences between male and female students. Finally, we investigate to what extent serious play contributes to the general school performance of students.

Mathematical Fluency in Primary Education: Differences Based on Gender
The results obtained by primary school students, segmented according to gender, show an unbalanced situation (Table 1) with a significant difference of one point and a half in favor of boys in mathematical fluency in the pretest (t(282) = 2.153, p = 0.032, d = 0.26). In the posttest, after the use of serious game, the differences remain (t(282) = 2.496, p = 0.013, d = 0.30). This situation offers revealing elements: the methodology used in the school center favors boys in general, but serious play produces an improvement for boys and girls alike, since gender differences do not change after the intervention. However, there are also apparently contradictory data: girls develop a similar improvement to boys doing fewer activities. This difference is significant, but of low magnitude (t(282) = 2.291, p = 0.023, d = 0.27) and offers an advantage to girls in performance per activity performed compared to boys who do more to obtain the same level. If we analyze how much improvement is achieved with the use of serious games in each group by gender, at the intragroup level, we see that both boys (t(153) = −19.340, p = 0.000, d = 1.56) and girls (t(129) = −17.650, p = 0.000, d = 1.55) make great progress thanks to the serious game between pretest and posttest, but without actually differentiating between them (t(282) = 1.052, p = 0.294, d = 0.13). Figure 3 reflects the distributions of the scores before and after the use of the serious game, showing the important progress, as well as a greater dispersion in the posttest.

Mathematical Fluency, Gender and Educational Level
If the data is broken down by educational levels and by gender (Table 2), distributing the participating students with similar curricular requirements, we observe that the data by grades favor in the posttest those who had a better performance in the pretest. Between boys and girls the differences become significant in the second grade both in the pretest (t(70) = 3.345, p = 0.001, d = 0.80) and in the posttest (t(70) = 2.939, p = 0.004, d = 0.71).
What is evident at all levels is that girls always do fewer activities than boys. Again we find a grade, second, where the differences become statistically significant at a level of p < 0.1 and with an effect size very close to a mean value (t(70) = 1.870, p = 0.066, d = 0.45), boys developing, on average, 1731 more activities than girls.

Mathematical Fluency, Gender and Educational Level
If the data is broken down by educational levels and by gender (Table 2), distributing the participating students with similar curricular requirements, we observe that the data by grades favor in the posttest those who had a better performance in the pretest. Between boys and girls the differences become significant in the second grade both in the pretest (t(70) = 3.345, p = 0.001, d = 0.80) and in the posttest (t(70) = 2.939, p = 0.004, d = 0.71). What is evident at all levels is that girls always do fewer activities than boys. Again we find a grade, second, where the differences become statistically significant at a level of p < 0.1 and with an effect size very close to a mean value (t(70) = 1.870, p = 0.066, d = 0.45), boys developing, on average, 1731 more activities than girls.
As in the results obtained in general, between the pretest scores and those achieved in the posttest within the group of male students and the group of female students, statistically significant and highly important differences are shown in all the analyzed grades (Table 3). These data once again value the proposal with serious games compared to the work that is usually done in the school for this type of content. All students progress without gender differences, so that, fundamentally, the improvement experienced is similar for male and female students, except again for the situation in second grade at a level of p < 0.

Gender Differences in the Classroom: The Role of Teachers
Making a more detailed analysis of the situation in second grade, the results of the pretest-posttest show a classroom where the performance of the girls is clearly different to the rest of the groups, specifically classroom 6, as illustrated in Figure 4.

Gender Differences in the Classroom: The Role of Teachers
Making a more detailed analysis of the situation in second grade, the results of the pretest-posttest show a classroom where the performance of the girls is clearly different to the rest of the groups, specifically classroom 6, as illustrated in Figure 4. It is a classroom where the situation of the female students in comparison with their peers (Table 4)   It is a classroom where the situation of the female students in comparison with their peers (Table 4)   However, in this classroom the gender gap is even accentuated, as statistically significant differences and of great magnitude are obtained in the number of solved activities (t(22) = 2.246, p = 0.035, d = 0.92). This situation becomes even more evident (d = 1.05) in the progress of mathematical fluency (t(22) = 2.557, p = 0.018). Boys improve by an average of 4.5 points more than girls.
The analysis of other parameters creates the need to reconsider the situation and opens the way to future research work. The teacher of classroom 6 is a novice teacher with an experience of less than five years and with a recent university degree. If the results obtained by the female students of this novice teacher (classroom 6) are analyzed in comparison with those of the more senior teachers (more than 30 years of experience) responsible for classrooms 4 and 5, it is observed that, despite not obtaining statistically significant differences, they are relatively important differences (mean value effect size) in relation to the pretest (t (27)

Contribution of Serious Games to Academic Performance
Considering the sample as a whole, the simple regression model developed for the global academic grade with the posttest score as an independent variable ( Figure 5), has explained 14% of the variance (adjusted R 2 = 0.14, F = 48.421, p = 0.000). It is also observed that the days of use of the software (adjusted R 2 = 0.07, F = 21.321, p = 0.000, y = 7.02 + 0.03 * x) and the number of resolved activities (adjusted R 2 = 0.08, F = 25.977, p = 0.000, y = 7.41 + 7.156 * 10 −5 * x) contribute significantly to the general academic average of the students. These data highlight the importance of these types of improvements in mathematical fluency and the potential that the use of serious games has for the development of general academic performance.  When calculating regression models by gender, it is noted that the posttest score explains, in the case of boys, 22% of the variance of the overall academic grade (adjusted R 2 = 0.22, F = 43.75, p = 0.000, y = 6.75 + 0.05 * x), while for girls this variance is reduced to 11% (adjusted R 2 = 0.11, F = 16.466, p = 0.000, y = 7.36 + 0.04 * x). Therefore, it is observed that mathematical fluency is, in general, a better predictor of academic performance for boys than for girls.
However, the days of use of the software and the number of activities resolved have greater explanatory power in the school performance of girls than in that of boys. The days of use manage to explain 14% of the variance of the general academic mean of the female students (adjusted R 2 = 0.14, F = 22.000, p = 0.000, y = 7.36 + 0.04 * x) and 3% in the case of male students (adjusted R 2 = 0.03, F = 5.098, p = 0.025, y = 6.83 + 0.02 * x). Likewise, When calculating regression models by gender, it is noted that the posttest score explains, in the case of boys, 22% of the variance of the overall academic grade (adjusted R 2 = 0.22, F = 43.75, p = 0.000, y = 6.75 + 0.05 * x), while for girls this variance is reduced to 11% (adjusted R 2 = 0.11, F = 16.466, p = 0.000, y = 7.36 + 0.04 * x). Therefore, it is observed that mathematical fluency is, in general, a better predictor of academic performance for boys than for girls.
However, the days of use of the software and the number of activities resolved have greater explanatory power in the school performance of girls than in that of boys. The days of use manage to explain 14% of the variance of the general academic mean of the female students (adjusted R 2 = 0.14, F = 22.000, p = 0.000, y = 7.36 + 0.04 * x) and 3% in the case of male students (adjusted R 2 = 0.03, F = 5.098, p = 0.025, y = 6.83 + 0.02 * x). Likewise, the activities carried out with the software also explain 19% of the girls' performance (adjusted R 2 = 0.19, F = 31.055, p = 0.000, y = 7.34 + 1.18 * 10 −4 * x) compared to 5% in the case of boys (adjusted R 2 = 0.05, F = 9.506, p = 0.002, y = 7.35 + 5.645 * 10 −5 * x).
In order to better understand the role of mathematical fluency in the school performance of boys and girls, the results were broken down by academic levels ( Table 5) Contrary to what is seen in general, the resulting determination coefficients indicate that mathematical fluency explains the overall academic grade of girls to a greater extent than that of boys; except in fourth grade where, for male students, the proportion of the variance, explained in contrast to the lowest value of the female students, is doubled. That is, we can observe that the explanatory capacity of mathematical fluency on the school performance of boys and girls shows variations according to the analyzed grade. In any case, the estimated models have greater explanatory force in first and second than in third and fourth grade, in line with the greater curricular load of this type of content in the first two levels of primary school.

Discussion and Conclusions
The objective of this research has focused on the analysis from a gender perspective in the improvement of mathematical fluency with serious games and their relationship with school performance. Empirical evidence on the improvement that occurs in both boys and girls is provided, in line with previous research with similar software proposals [15][16][17].
The analysis of the data disaggregated by gender indicates a starting situation, with the methodology used in the school center, that clearly benefits boys. However, after working with the serious game, the data show that the improvement on the part of girls is similar to that of boys. Both improve in an equivalent way, and this is in itself a fact to be highlighted. It should also be taken into account that this process is not enough to reduce the disadvantage of the girls evidenced in their mathematical fluency in the pretest. This data is consistent with previous research in which this type of proposal benefits girls equally [24,25]. In any case, we should bear in mind the fact that the work styles are different, since in the case of girls their answer is based on their concern for academic achievement [26]. This trend, as reflected in other studies [28], implies a greater acceptance of the proposal by girls, compared to a more competitive position in the case of boys. This helps to understand the greater number of activities carried out by boys, which shows findings from previous works where gender motivation by cognitive and personal factors [19] influences their engagement in the proposal [20].
The data offer a use of the BMOT instrument [40] for a purpose not contemplated in its design: rapid identification of possible gender biases in mathematics. In response to the difference in performance observed in the pretest-posttest contrast with the use of the serious game, an opportunity is provided to normalize performance by gender, with a view to achieving results closer to those obtained in research on this type of content, since at these educational levels no gender differences should be found in their learning [31]. The data confirm the power that contexts have in the use of serious games [10], in addition to their design as a learning experience [11] in order to offer the same opportunities to both genders. In this sense, the identification of a classroom that showed a clearly inferior performance of the girls, unlike the rest of the center, brings up the importance of not losing sight of the role of the teachers and the arrangement of the elements to favor the same learning opportunities for boys and girls regardless of the software and the proposal. It is essential to train and prepare teachers so that their actions can avoid an unbalanced situation from a gender perspective. It is also necessary to offer tools for their diagnosis, since in many cases their appraisals may differ from the reality that exists in their classrooms [27]. In any case, it is convenient to watch out for the experience that the serious game offers, since the greater competitiveness of boys can offer a false appearance of learning [32] made visible here in a greater number of tasks without therefore having a better performance in his mathematical fluency.
Conversely, the regression analysis shows the importance of the proposal for the set of learning, highlighting the relevance of this type of skills to understand the general school success of students [36] at an age where the curricular weight of these contents it is much higher than in higher grades. The data here do show differences based on the indicators that are taken into consideration, apparently influenced by cognitive styles [28]. We should recall that, in both cases, both boys and girls show similar performance levels in mathematical fluency. However, in the case of girls, their general academic level is more related to the time of use of the proposal and the number of activities carried out. It appears that here too, differentiating characteristics of one and the other are projected in the styles with which they participate in the serious game [19].
Finally, it is worth considering some limitations of this research. Above all, it is noteworthy that, although there is a sufficient number of subjects, data are only available from one educational center, so the results should be used with certain caution to prevent potential generalizations. The factors that have influenced the group that has shown gender differences that even increase with the serious game are also unknown. All this offers clear lines of future research, especially in the role of teachers, not contemplated in this research, which would help to identify elements to take into account in the initial and ongoing training processes.