This section presents the data collected during the experiment and the statistical analysis of the data.
6.1. Pre- and Post-Tests
The pre- and post-tests were evaluated using the Microsoft Forms application, which was used to collect the data based on the pre-set correct answers. The resulting data sets were imported separately and then merged into the Microsoft Excel (Microsoft Corporation, Redmond, WA, USA) platform. The statistical calculator DATA
tab (DATA
tab e.U., Graz, Austria) was used for data analysis. The 279 participants in the experimental group completed pre- and post-tests; descriptive statistics for these are presented in
Table 8. Both tests consisted of six questions, each worth one point.
The mean score notably increased from 4.38 to 5.82, indicating that the overall performance of the participants improved considerably after the GBL session. This positive trend is supported by the median, which grows from 5 to 6. The standard deviation decreased from 1.78 to 0.49, demonstrating that the scores became less varied, i.e., the results of the participants were more closely clustered around the mean post-intervention. The minimum score increased from 0 in the pre-test to 3 in the post-test, which is particularly noteworthy, as it highlights that even the lowest-performing individuals showed signs of improvement. The median and the maximum score remained constant at 6, indicating that high achievers maintained their performance level. The reduction in the median absolute deviation from 1 to 0 suggests that the participants became more homogenous in terms of their scores, reflecting an overall increase in proficiency across the board. The descriptive statistical data indicate a clear improvement in performance from the pre-test to the post-test. Based on the data analysis of the pre-test and post-test results, it can be concluded that the DETerminator GBL intervention had a positive and measurable effect on student learning outcomes.
Hypothesis testing was used to determine if there is a statistically significant difference between the results of the pre-test and the post-test. To answer the research question RQ1, the following hypotheses were formulated:
The histograms of the pre-test and post-test data are shown in
Figure 5. The left histogram represents the pre-test results, while the right histogram shows the post-test results, illustrating the changes in distribution and central tendency following the GBL intervention.
It was checked whether the normality condition was met. We used four different statistical tests to check the normality (see
Table 9). Low
p-values, that is, less than 0.05, indicate that the data differ significantly from a normal distribution. In our case, all four tests indicate that the data differ significantly from the normal distribution. This means that we should proceed with statistical methods for hypothesis testing that do not assume the normality of the data.
Due to the failure of the normality assumptions, we used the Wilcoxon test to compare the pre-test and post-test results. The Wilcoxon test is a suitable choice because it is a non-parametric method, i.e., it does not assume that the data follows a normal distribution. The Wilcoxon test is designed for paired data, which is relevant in this context since we are analysing the differences between the same participants’ pre-test and post-test scores. This test accounts for the individual paired differences, providing a robust analysis even in the presence of outliers (
Sheskin, 2011). By utilising the Wilcoxon signed-rank test, we statistically evaluate whether the differences observed between the pre-test and post-test scores are significant, testing our hypotheses.
We calculated the Wilcoxon test ranks for the pre-test and post-test datasets. The results are summarised in
Table 10.
The analysis of the Wilcoxon test ranks comparing pre-test and post-test scores reveals significant trends in student performance following the DETerinator GBL session. A total of 279 participants were evaluated. Among them, 151 individuals exhibited negative ranks (Pre-test < Post-test), indicating that their post-test scores were higher than their pre-test scores. This significant majority suggests an improvement in performance for these students after the intervention, which is a positive outcome. Only 10 participants demonstrated positive ranks (Pre-test > Post-test), meaning their post-test scores were lower than their pre-test scores. The average rank for those who improved was 84.5, while the average rank for those who declined was significantly lower at 28.15. This disparity between the number of negative and positive ranks highlights a generally favourable impact of the instructional strategy employed. Additionally, 118 participants showed equal scores on both assessments (Pre-test = Post-test), indicating no change in their performance. The findings suggest that a significant proportion of students benefited from the GBL intervention.
Table 11 presents a more detailed statistical analysis of these results. The Wilcoxon test statistic (W) is 281.5, indicating the sum of ranks associated with the positive differences. The standardised
z-score of −10.6 reinforces the significance of the results, while the
p-value of less than 0.001 confirms that the changes in scores are statistically significant, leading us to reject the null hypothesis. Finally, the effect size (
r) of 0.63 reflects a large effect, suggesting that the GBL intervention had a meaningful overall impact on the participants’ learning outcomes. Note that the effect size is considered large if
(
Steyn, 2020).
6.2. Midterm Exam
The midterm exam results were manually entered into a Microsoft Excel spreadsheet and then imported into the DATA
tab application for data analysis. The scores of the tasks related to determinants in the midterm exam were collected and analysed for all 580 participants. Descriptive statistics are presented in
Table 12, and
Figure 6 illustrates the essential features of the midterm exam scores for both groups.
The mean score for the Control Group is 2.5, which is notably lower than the Experimental Group’s mean score of 3.14. Both groups share the same median score of 3. When examining the mode, both groups display a most frequent score of 3, this score was the most common among participants across both datasets. However, the standard deviation reveals a difference: the Control Group has a higher standard deviation of 1.8 compared to the Experimental Group’s 1.62. This higher standard deviation for the Control Group implies that scores are more widely dispersed from the mean, highlighting greater variability in individual performance compared to the more consistent scores of the Experimental Group. Looking at the range of scores, both groups have a minimum score of 0, signifying that some participants did not score any points. The maximum score for the Control Group is 6, whereas the Experimental Group reached a maximum score of 7; there was a higher peak performance within the Experimental Group. Furthermore, analysing the MAD reveals that the Control Group has a MAD of 2, which is higher than the Experimental Group’s MAD of 1. So, the scores in the Control Group not only vary more widely from the mean but also show greater inconsistency around the median.
The descriptive statistics show that the experimental group performed a bit better overall compared to the control group. Statistical testing must be performed to further validate this conclusion. The following hypotheses were formulated to answer research question RQ2:
Null hypothesis: There is no difference between the experimental and control groups with respect to the dependent variable.
Alternative hypothesis: There is a difference between the experimental and control groups regarding the dependent variable.
None of the samples are normally distributed (see
Table 13), and their sizes are different and independent of each other. Because of these conditions, a non-parametric statistical method, the Mann–Whitney U-test was used to compare the performance of the experimental and control groups.
Table 14 shows the results of the Mann–Whitney U-test. The U statistic value is 34,356. This number represents the number of times a score from one group ranks higher than a score from the other group. The current U statistic suggests there is a notable difference in the ranking of scores between the two groups. The
z-statistic is −3.89. This value indicates the standardised deviation of the U statistic from the mean. The asymptotic
p-value is <0.001, which is significantly below the conventional alpha level of 0.05, providing strong evidence to reject the null hypothesis. Moreover, the exact
p-value is also <0.001. Its consistency with the asymptotic
p-value further confirms the statistical significance of the findings. The effect size (
r) is calculated to be 0.16, which is a small effect size (
Steyn, 2020). So, the Mann-Whitney U-test results indicate a significant difference between the experimental group and the control group at the 5% significance level.
6.3. Questionnaire
In the questionnaire, different categories of questions are distinguished to obtain a comprehensive picture of general mathematical attitudes, experiences with the DETerminator game, the benefits of the game, and the possibilities for development. In addition to collecting demographic data, Category I questions focused on participants’ attitudes towards mathematics learning (Q1–Q3) and game-based learning (Q4–Q6). Category II questions were specifically related to game-based learning with the DETerminator game (Q7–Q12) and opinions about the DETerminator card game (Q13–Q15). The questions and the percentage distribution of responses received from the 279 participants are presented in
Table 15 and
Table 16, together with descriptive statistical data.
The questions presented in Category II focus on students’ experiences and perceptions of the DETerminator card game as a learning tool for understanding the concept and properties of the determinant in mathematics. Each question aims to gauge various aspects of the learning process, including perceived difficulty, enjoyment, effectiveness, and preferences for future engagement with the game.
Table 15 provides a clear overview of the results from Category I of the questionnaire, showing the distribution of responses for each question alongside the mean (M) and standard deviation (SD). The data presented in
Table 15 reflect the students’ responses regarding their perceptions of learning mathematics and the potential to incorporate game-based teaching methods into their education. Each question provides insight into the students’ attitudes toward mathematics, interest in innovative teaching methods, and overall engagement with the subject. The first question (Q1), which enquires about students’ enjoyment of learning mathematics, reveals a diverse range of opinions. With a mean score of 3.26, students are slightly inclined toward enjoying math, but nearly 40% of respondents either disagree or express a neutral stance. This indicates that while some students find satisfaction in studying mathematics, a significant number do not, highlighting the necessity for educators to explore pedagogical strategies that foster enthusiasm and engagement in this subject. In evaluating the second question (Q2) concerning the interest in learning mathematics, the responses skew more positively, with a mean score of 3.64. Approximately 62.8% of students report some level of agreement with the statement that they find learning mathematics interesting. This suggests an optimistic attitude among students about the content, even if the interest varies among individuals. The results imply that while some strategies may work effectively for a majority, educators should remain mindful of those who may still struggle with engagement. The third question (Q3) addresses the perceived importance of mathematics in students’ lives or future careers, scoring a mean of 3.91. The high percentage of students recognising its importance reinforces the idea that students appreciate the subject’s relevance and are likely aware of its applications in various fields. This understanding presents an opportunity for educators to connect the material more directly to real-world applications, potentially enhancing the enthusiasm and perceived relevance of the subject.
Question four (Q4) showcases an impressive mean score of 4.44 regarding students’ interest in game-based teaching methods, with an overwhelming 60.9% of respondents expressing strong approval. This indicates a strong inclination towards engaging teaching styles that involve gaming components. The positive reception towards game-based learning methods highlights a shift in educational expectations, wherein students are more receptive to innovative instructional approaches that could enhance their learning experiences. The fifth question (Q5) emphasises students’ willingness to integrate game-based learning across various fields, yielding a high mean of 4.60. An astounding 71% of respondents strongly endorse this idea. Finally, the sixth question (Q6) in Category I regarding the necessity of new teaching methods in mathematics garnered a mean score of 4.42, further illustrating students’ eagerness for change. With over 60% of respondents agreeing that innovative teaching strategies are essential, it becomes evident that students recognise that traditional methods may not suffice to effectively meet their learning needs.
Overall, the data of Category I reveal that while students have a strong foundation of interest in and recognition of the importance of mathematics, they also clearly call for engaging, modernised instructional methods. The revealed preference for game-based teaching indicates that students are not just passive learners but active participants inclined to embrace innovative approaches.
In Category II (see
Table 16), the first two questions (Q7 and Q8) assessed students’ perceptions of the difficulty of questions presented in a pre-test and a post-test. For Q7, regarding the pre-test, the mean score is 3.37, indicating that students found the pre-test questions to be moderately easy, with 45.9% selecting the neutral option (3) and 25.1% feeling that the questions were easy (4). The response distribution shows several students found them challenging, as indicated by the 14.3% who strongly agreed (5) that they found it easy. In Q8, the post-test received a higher mean score of 3.71, suggesting that students perceived these questions to be easier than those in the pre-test. This improvement may imply that the learning experience facilitated by the DETerminator game effectively prepared students to face questions more confidently. Question 9 explicitly addresses how much of a game-like experience the learning process felt while using the DETerminator. A strong mean score of 4.37 indicates that 53.8% of participants felt that the learning process felt very much like a game, with a minimal percentage rating it below neutral. Question 10 evaluates how beneficial the learning phase between the two tests was, with a mean score of 4.41. The fact that 56.6% rated this as highly beneficial reveals that the students appreciated the learning phase, recognising its role in consolidating their knowledge. The next question (Q11) investigates the perceived ease of learning the technique of calculating determinants through the DETerminator card game, garnering a mean score of 4.38. The majority (59.5%) agreed strongly that the game made mastering the calculation process easier. Following up, Question 12 directly asks students about their enjoyment while using the DETerminator game. With a mean score of 4.40, the results reveal a high level of enjoyment, suggesting that the game not only serves an educational purpose but also creates a positive atmosphere for learning maths, a subject often viewed as dry or daunting by students.
In Question 13, students evaluate the effectiveness of the determinants presented on the game cards for their learning. A mean of 4.44, with 62% affirming their effectiveness, further emphasises that the materials used within the game were well-received and believed to be effective learning aids. Question 14 assesses the game’s graphic design, obtaining another mean score of 4.44. A significant 59.9% rated the design highly, indicating that the visual aspects of the game contributed positively to their learning experience, potentially making the game more appealing and engaging. Lastly, Question 15 queries students’ interest in an online version of the DETerminator game for revision purposes before the midterm exam. With a mean score of 4.44 and 62.4% expressing a strong desire to use it, the results suggest that students are willing and enthusiastic about supplemental learning tools.
The standard deviation values in
Table 16 suggest that the responses generally exhibit low variability. This indicates that students had similar perceptions about the DETerminator game and their learning experiences. The low standard deviations typically point to a strong consensus regarding the game’s effectiveness as an educational tool. The data from Category II confirm that the DETerminator card game is an effective educational tool that enhances students’ learning enjoyment and comprehension of mathematical concepts when calculating determinants. The high mean scores across various questions indicate that students found the game engaging, enjoyable, and beneficial for learning. Additionally, the positive feedback on the game’s structure, its perceived effectiveness, and students’ willingness to use it further suggest that GBL approaches can significantly improve educational outcomes in university mathematics.
Table 17 summarises the item-total statistics, and
Figure 7 shows the correlation heatmap.
The Cronbach’s alpha is 0.91. This value is very high, as Cronbach’s alpha ranges from 0 to 1. A value of 0.91 suggests that the questions are highly consistent and reliable. Generally, a Cronbach’s alpha above 0.7 is considered acceptable, above 0.8 is considered good, and above 0.9 is considered excellent. The value 0.91 falls into the excellent category, indicating that the items consistently measure the underlying construct. This level of reliability enhances the validity of the results and confirms that the survey is robust. When interpreting the results of a Cronbach’s alpha analysis, we are interested in the corrected item-total correlation and the Cronbach’s alpha when each item is omitted. Essentially, it is about assessing the internal consistency of your scale and how each item contributes to it. This is what the
Table 17 tells us. A high correlation here means that the item aligns well with the total score of the other items, contributing significantly to the scale’s internal consistency. Q3, Q5, Q6, Q9, Q10, Q11, Q12, Q13, and Q15 items exhibit very strong correlations, indicating they align particularly well with the overall measured construct and are reliable indicators of the underlying variable. Q1, Q2, Q4, and Q14 items demonstrate strong positive correlations with the total score, implying they contribute positively to the overall internal consistency of the scale. Q7 and Q8 items have lower corrected item-total correlations compared to others, suggesting that while they still contribute to the scale, their relationship with the total score is not as strong. This is not coincidental, as these two questions were for pre- and post-test questions, not for game-based learning.
The Pearson correlation analysis indicates several significant relationships between the questions. The Pearson correlation coefficient (
r) tells about the strength and direction of the linear relationship between the questions. We have made the correlation matrix, which includes both the correlation coefficients and the associated
p-values for each pair of questions (Q1 to Q15). The
p-values indicate whether the correlation coefficients are statistically significant. A
p-value
typically shows a statistically significant correlation. In our case, most of the correlations have
p-values less than 0.001, indicating strong statistical significance. The data were analysed to find 5 strong (
) and 16 medium correlations (
), as seen in
Figure 7. The correlation between Q11 and Q13 (
r = 0.85,
) is remarkably high, indicating a strong positive relationship. Similarly, Q12 and Q13 (
r = 0.80,
) demonstrate another strong positive correlation. The correlation between Q12 and Q11 (
r = 0.76,
) reflects a strong positive relationship between these two survey questions. On another note, the correlation between Q2 and Q3 (
r = 0.79,
p < 0.001) shows these questions are related. Lastly, the correlation of Q2 and Q1 (
r = 0.77,
p < 0.001) is also strong.