1. Introduction
Testing scenarios are not mainstream in the area of special education, especially in the setting of children with learning disabilities (LD) and test anxiety. There are limited studies that have evaluated the effects of test anxiety on students with LD. One study by Lufi, Okasha, and Cohen revealed that test anxiety affects people in every aspect of their lives, especially since men and women of all ages have to be evaluated, assessed, and graded with regard to their abilities, achievements, or interests [
1]. Their study concluded that test anxiety causes students with an LD to perform lower academically and have a higher level of psychopathology. Students with LD have more difficulties in test-taking situations than students who do not have LD [
2]. Students with LD have symptoms of test anxiety, such as stress, nervousness, frustration, and helplessness that are equated with poor performance. Heiman and Precel concluded that college students with LD have lower grades, lower test scores, and a lower perception of their academic abilities [
2]. Consequently, there appears to be a physiological component to testing situations that may impact all students, and this may be especially prevalent in children with LD.
Numerous stressors can lead to test anxiety, and these can affect how a student receives, stores, and understands information [
3]. Hence, a mental “block” occurs during testing situations, and students are not able to do well on the test. Stress causes a physiological response in the body that affects the hypothalamic–pituitary–adrenal (HPA) axis [
4], which may start in the brain but leads to overall body physiological responses ranging from rapid heart rate to long-term weight gain and diabetes [
4,
5,
6]. There are factors that contribute to the cognitive ability of individuals, which include methods of information delivery, prior knowledge, the emotional state of mind (psychological), glycemic state (physiological), and the accuracy with which the information is retained [
5,
6,
7]. Test anxiety may be an issue in poor test performance, and it is known to be an issue among students in grades four through twelve [
6,
8]. Approximately one-third of all students have test anxiety, regardless of their ability level [
6,
9]. A student’s emotional state is crucial when recalling information, which might affect the human brain and the physiological stressors that may hinder an individual’s cognitive abilities, leading to test anxiety [
6,
10]. It is essential to include those with neurodiversity backgrounds when considering testing constructs [
11].
Here, we examined whether stress during testing situations impacted the students’ learning ability in three different testing scenarios. We set out to comprehensively map the heart rate responses to testing in children with LD. We found that test anxiety in children with LD decreased and math performance increased when they took an examination in a small group setting using a digital testing platform on a computer and that this was directly related to stress responses in the student. We have developed a new method of analyzing physiological responses, which we termed “stress-performance evaluation” (SPE). It takes into consideration the heart rate and physiological responses as well as the performance on the examination. Taken together, our findings indicate that the SPE could be an essential measure to evaluate testing parameters for kids and that these measures are especially sensitive to students with LD.
3. Results
The focus of the study was to determine whether large or small group (pencil and paper or computer) settings are interventions that help students with LD reduce stress and test anxiety and improve math test and quiz scores. The study used a changing condition design to determine the best intervention that reduces test anxiety and improves math quizzes and test scores. The test contained 100 math questions that could be answered in 60 s. This testing parameter is considered impossible to answer all questions, but it provides a reasonable quantification of how each student responds to testing stress. The overall concept is that more correct answers in 60 s would be a better performance. The testing data consisted of the number of questions that a student answered correctly or incorrectly, and correctness was measured by test percentages. The study was completed remotely using Zoom video chat (to accommodate COVID-19 social distancing requirements), and Willful Fitness Tracker IP68 with Heart Rate Monitor wristbands were provided to quantify the heart rate responses before (baseline) and after testing.
The second part of this study quantified the stress response of each student with a learning disability. This was done using the Willful Fitness Tracker IP68 heart rate monitor. The participants wore the monitor during the study and were asked to report their heart rates before the one-minute test started for a baseline and also after the test. According to the American Heart Association, an average resting heart rate is 60 beats per minute. If a person is distressed, their heart rate increases above the normal average. When stressed, people may experience a racing heart, palpitations, pounding, or fluttering. Here, we used the heart rate as a measure of stress response.
Figure 1A shows Subject 1’s heart rate during the four testing parameters. All subjects started with the baseline, and the first parameter was in a large group setting with their typical peers doing mathematical calculations using P&P (
Figure 1A(i)). During this phase, which was six consecutive days, heart rates were measured before and after the test to see if there were significant changes in the physiological response during this setting. Subject 1’s heart rate before (around 62 beats/min) and after (69 beats/min) testing. The results showed no significantly different changes, with a
p = 0.2036 and an R
2 = 0.1562. In the second phase, each subject went into a small group setting and used P&P.
Figure 1A(ii)—collectively measured heart rate and mathematical performance measures for 12 consecutive days. During this phase, Subject 1 was in a small group setting with P&P. Their heart rate showed 52.5 beats/min, and after testing, it rose to 55 beats/min. The results showed that there were significant differences, with a
p-value = 0.0010, and an R
2 = 0.3958. In
Figure 1A(iii), the heart rate of Subject 1 in the third parameter was quantitated in a small group setting using a computer. In this intervention phase, ten data points were collected, and the average before-testing heart rate was measured at 53.6 beats/min; after the test, the heart rate increased to a heart rate of 57.3 beats/min. The results show that there were statistically significant changes with a
p-value from before to after with
p < 0.001 (R
2 = 0.8058). To ensure the accuracy of the parameters and which intervention was the strongest, the strongest intervention was retested.
Figure 1A(iv) shows that the strongest intervention was using a computer in a small group setting. This phase consisted of six consecutive data points with an average baseline of 54 beats/min and a medium heart rate of 57.3 beats/min after testing. Like parameter 3, the retesting validated the findings that there were significant changes for Subject 1 with
p = 0.0005 (R
2 = 0.7143).
Figure 1B shows the percent change in the heart rate for Subject 1 during each phase. For Subject 1, the medium percent change in heart rate for the large group was 111.67%, the heart rate in the small group P&P was increased by 97.91%, the heart rate in the small group computer was raised by 107%, and the retested heart rate of the small group computer was elevated by 106.3%.
Figure 1B shows the overall heart rate responses of Subject 1 during large group, small group P&P, and small group computer scenarios.
Figure 1C shows Subject 1’s testing evaluation in the large group setting using P&P, a small group setting using P&P, a small group using a computer, and retesting the strongest intervention, which was the small group setting using a computer. The bar graph includes the total number of questions answered, the number they got correct, and the number they got wrong. All participants were given 100 math questions with mixed operations (+, −, X, and /) to answer in one minute. For Subject 1, when in a large group setting using P&P, the average number of questions answered was 13.5 and the average correct answer was 12.3. In a small group setting using P&P, the average number of math questions answered was 15, the average correct was 14, and the average wrong was 2. In the small group setting and using a computer, Subject 1 finished, on average, 17 math questions and got an average of 16 correct math problems. In the third phase of the intervention, the strongest intervention was revisited (small group and use of a computer), and Subject 1 increased their total answers with a medium score of 19. Subject 1 also increased the correct answers to a medium score of 19. Then, they decreased their wrong answers to a medium score of 1.
Figure 1D shows the overall stress performance evaluation (SPE) for Subject 1. This is a unique way of looking at the overall physiological and testing evaluations by taking the test score correctly and dividing it by the percent change in each subject’s heart rate. The bar graph includes each phase of the study: large group P&P, small group P&P, and small group computer test. Subject 1 shows the
p-value as insignificant from the large group P&P setting to the small group P&P setting,
p = 0.4678. Subject 1 going from a small group to P&P to a small group computer setting was significant, with a
p-value of 0.039. From the large group P&P to the small group computer, there is a significance of
p < 0.0001. SPE showed that Subject 1 performed better in a small group setting and using a computer. Overall, Subject 1 performed better in a small group and using a computer. The SPE is very helpful in reducing test anxiety and improving testing performance.
Figure 2A shows Subject 2’s heart rate during the four testing parameters. All subjects started with the baseline, and the first parameter was in a large group setting with their typical peers doing mathematical calculations using P&P (
Figure 2A(i)). During this phase, which was six consecutive days, heart rates were measured before and after the test to see if there were significant changes in the physiological response during this setting.
Figure 2A(i) shows Subject 2 heart rates during baseline in a large group setting P&P, their
p-value was 0.2387, and their R
2 value was 0.1070. The medium heart rate before being around 59 and after being around 71 beats per minute.
Figure 2A(ii) shows Subject 2’s before medium beats (around 61 beats per minute) and after (70 beats per minute). The
p-value was significant at 0.0037, and the R
2 value was 0.3506.
Figure 2A(iii) shows subject 2 in intervention phase 2 using a computer in a small group setting. Before the test, the heart rate had an average baseline of around 65.3, and after the test, it had a medium of 57.4. The
p-value from before to after was significant, with
p = 0.0005 and R
2 = 0.5029.
Figure 2A(iv) shows the strongest intervention being retested: small group and use of a computer, with an average baseline of around 71.3, and after the test, the medium was 56.8. The
p-value was significant (
p = 0.0006), and the R
2 was 0.7064.
Figure 2B shows the percent change for Subject 2 from large group P&P, small group P&P, and small group computer settings. The percent change was calculated by taking the baseline before testing the heart rate, dividing it by the heart rate after testing, and multiplying the result by 100. The medium percent change in the large group was 120.8%; the medium in the small group P&P setting was 115%; and the medium in the small group computer setting was 88.4%. The medium in the small group using a computer retest was 80.67%.
Figure 2C shows Subject 2’s testing evaluations in the large group setting using P&P, small group setting using P&P, and small group setting using a computer. It includes the total number of math questions answered, the number they got correct, and the number they got wrong. The data show that in a large group setting and using P&P (
Figure 2C(i)), the subject had an average score of 9 questions, a medium score of 5 correctly answered, and an average of 3.7 questions wrong. In a small group setting using P&P (
Figure 2C(ii)), Subject 2, the medium completed math questions was 9.4, the medium correct was 8.5, and the average wrong was 1. In
Figure 2C(iii), in a small group setting and using a computer, the median score of answered math questions was 16, and the average right was 15.6. In
Figure 2C(iv), the strongest intervention was retested (small group and using a computer), Subject 2 increased their total answered math problems to a medium score of 21.5, increased the correct answers to a medium score of 21.3, and then decreased their wrong answers to a medium score of 0.17.
Figure 2D shows SPE for Subject 2. The bar graph includes each phase of the study: large group P&P, small group P&P, and small group computer test. In
Figure 2D, Subject 2’s SPE shows that they are insignificant from large group P&P to small group P&P with a
p-value of 0.2881. It does show a significant
p-value going from P&P to a small group computer with
p < 0.0001. Subject 2 shows a significant difference in
p-value going from large group P&P to small group computer (
p < 0.0001). Overall, SPE shows that Subject 2 performs significantly better in a small group setting and using a computer.
Figure 3A shows Subject 3’s heart rate during the four testing parameters. All subjects started with the baseline, and the first parameter was in a large group setting with their typical peers doing mathematical calculations using P&P (
Figure 3A(i)). During this phase, which was six consecutive days, heart rates were measured before and after the test to see if there were significant changes in the physiological response during this setting.
Figure 3A(i) shows Subject 3’s heart rates during baseline in a large group setting using P&P. Their
p-value was 0.1105, and their R
2 value was 0.11984. The medium heart rate before is around 60, and after, it is around 68 beats per minute.
Figure 3A(ii) shows subject 3 before medium beats (around 65 beats per minute) and after (75 beats per minute). The
p-value was 0.1527, and the R
2 value was 0.09949.
Figure 3A(iii) shows Subject 3 in intervention phase 2 using a computer in a small group setting. Before the test, the heart rate had an average baseline of around 58.2, and after the test, the median was 69. The
p-value from before to after was significant (
p = 0.004 and R
2 = 0.5085).
Figure 3A(iv) shows Subject 3 (retested) in intervention phase 3 in a small group setting using a computer with the test heart rate at an average of 62.8 beats per minute at baseline and a medium of 74.3 beats per minute after the test. The
p-value from the pre- to post-test was significant (
p = 0.0006 and R
2 = 0.7073).
Figure 3B shows the percent change for Subject 3 from large group P&P, small group P&P, and small group computer settings. The medium percent change in the large group was 113%; the medium in the small group P&P was 116.18%; the medium in the small group computer was 119%; and the medium in the small group using a computer that was retested was 119%.
Figure 3C(i) shows Subject 3’s testing evaluations in the large group setting using P&P, small group setting using P&P, and small group setting using a computer. It includes the total number of questions answered, the number they got correct, and the number they got wrong. The data for the large group setting using P&P shows that the student had an average score of 5.5 for problems they answered and approximately an average of 4.8 correctly answered.
Figure 3C(ii) shows Subject 3 in a small group setting P&P. The average score answered was 9.6, and the average score correctly answered was 8.6.
Figure 3C(iii) shows Subject 3 in a small group setting using a computer. The average answer was 22.5, and the average correct answer was 19.5. Subject 3 increased their total answers from an average score of 5.5 at baseline to an average score of 22.5. Subject 3 also increased the correct answers from a baseline of 4.8 to an average score of 19.5. Although Subject 3 answered more questions, their average score for wrong answers increased to an average score of 5.
Figure 3C(iv) shows that subject 3, when retested in a small group setting and using a computer, answered an average of 19.12 problems, got an average of 20.83 correct, and an average of 2 wrong. Subject 3 shows the strongest intervention was the small group and computer use.
Figure 3D shows Subject 3 SPE going from a large group P&P, a small group P&P, and then to a small group computer setting. When Subject 3 went from large group P&P to small group P&P, there was a significant difference with a
p = 0.0385. Subject 3 from small group P&P to small group computer shows a significance of
p < 0.0001. Then Subject 3 from large group P&P to small group computer shows a significant
p < 0.0001. Overall, SPE showed that Subject 3 performs better in a small group setting and using a computer.
4. Discussion
While we continue to grow and develop testing parameters for children with learning disabilities, recent findings show that anxiety and stress are big players in their poor performances and should also be considered. Here, we developed a new model that can measure test stress in children and compare it to correctness and responsiveness as an SPE. This is the first time these parameters and calculations have been used to assess child responsiveness and testing. These allow for physiological parameters to also be considered, as each testing scenario may be different per child, and our results here indicate that this may be true. While they do have differences overall, they also agree that a small-group computer-based learning strategy is the most favorable for a child with LD.
This study’s primary purpose was to determine how students with learning disabilities in mathematics react in different testing environments. Quantitative data was collected through questionnaires, heart rates, and math fluency probes. These data were used to test the hypothesis that children with LD in a large group setting compared to a small group setting may perform worse on standardized testing due to physiological stress responses and anxiety. The small group can be further improved by sectioning in P&P or computer-based testing, and the latter showed the most significant results in this study. We further analyzed the SPE, which looks at test correctness compared to the change in heart rates per day. These outcomes show that computer-based testing is the most favorable outcome for children with LD. They indicate that these testing parameters should be used more for these disabilities. However, our study was conducted with a small group, and more work is needed to reach these conclusions. Below, the findings and reasons that might support computer-based testing and how they may be more beneficial in school systems are discussed.
The examination parameter consisted of 100 questions over 60 s (one minute), and it would be considered impossible to answer all in the given time. The examination parameter lets us know how each student responded to stress, with some answering more correct questions in better settings where they may feel more comfortable. The overall concept is that more correct answers in 60 s would result in better performance and a less stressful scenario. The data were compared using a plethora of scatter plots, figures, and tables for each subject from baseline in a large group setting P&P, small group P&P, and small group computer testing. The overall data shows the most improvement and significant differences among all participants when they go from a large group P&P to a small group computer test. Therefore, the hypothesis was shown here to be possibly accurate, as Subjects 1, 2, and 3 benefited from taking tests in a small group setting using a computer. The reasons that the students performed better may be due to pre-programmed responses (psychological, hence conditioning) or physiological responses. However, all performed better in one environment, and it seemed to be related to their stress responses, as shown in their SPE.
The SPE is a unique measure that can help educators and others find the best intervention to quantify stressful scenarios better. The equation for SPE considers the total number of test questions that the subject correctly answered divided by the percent change in their heart rate for that day (see Methods in
Section 2). This equation can be used for other parameters, such as adults, college level or higher, work settings, or younger children, such as preschool.