Next Article in Journal
Innovative and Assistive eHealth Technologies for Smart Therapeutic and Rehabilitation Outdoor Spaces for the Elderly Demographic
Next Article in Special Issue
Two Decades of Touchable and Walkable Virtual Reality for Blind and Visually Impaired People: A High-Level Taxonomy
Previous Article in Journal
Self-Perception and Training Perceptions on Teacher Digital Competence (TDC) in Spanish and French University Students
Previous Article in Special Issue
Spot-Presentation of Stereophonic Earcons to Assist Navigation for the Visually Impaired
 
 
Article
Peer-Review Record

Virtual Reality Nature Exposure and Test Anxiety

Multimodal Technol. Interact. 2020, 4(4), 75; https://doi.org/10.3390/mti4040075
by Alison O’Meara *, Marica Cassarino, Aaron Bolger and Annalisa Setti *
Reviewer 1: Anonymous
Reviewer 2:
Multimodal Technol. Interact. 2020, 4(4), 75; https://doi.org/10.3390/mti4040075
Submission received: 24 August 2020 / Revised: 24 September 2020 / Accepted: 20 October 2020 / Published: 22 October 2020
(This article belongs to the Special Issue 3D Human–Computer Interaction)

Round 1

Reviewer 1 Report

This is a very well-written manuscript on a topic that is timely and important. The literature review is superbly organized and addresses every aspect of the study. 

The purpose of the study is valid as a starting point in the investigation of using VR as an intervention for students with high test anxiety. 

The low number of participants is a huge obstacle in making generalizations from this study. However, as a pilot study, it is one of the best. 

Descriptions of dependent variable instruments was good, although I would have liked to have seen more said about the selection of the Test Anxiety Questionnaire other than it is brief and is validated. Specifically, why did you choose this one over the others. This is important because it is such a predominant feature of the study. I have the same comment about the Nature Connection Index and the Motions Sickness Assessment Questionnaire, although motion sickness does not figure as large as the others in the overall scope of the study. 

Say a bit more about demographics and how the piloting group was selected. 

In section 2.3.7, you talk about how individuals "became restless." Can you please operationalize that a bit more? I am not certain what this would have looked like. Also there is a grammatical inconsistency in line 223 of this section (p.5). "immersion" needs to be "immersed."

In 2.4, there is a sentence that on p.6 (line 272) that needs to be re-worked so that it does not end in a preposition. 

Results are presented in an effective way and you drew conclusions from the statistics you ran. Findings were presented and deviations from earlier literature were adequately explained. 

Your discussion of results was thorough and the section on Limitations addressed the limitations that I discovered in the study. 

A very nice job, overall. I enjoyed the opportunity to review your work. 

Author Response

This is a very well-written manuscript on a topic that is timely and important. The literature review is superbly organized and addresses every aspect of the study. 

The purpose of the study is valid as a starting point in the investigation of using VR as an intervention for students with high test anxiety. 

We would like to thank the Reviewer for their appreciative comments on our work.

The low number of participants is a huge obstacle in making generalizations from this study. However, as a pilot study, it is one of the best. 

We agree with the Reviewer that not achieving the expected number of participants is a limitation of this work and we appreciate that this is a pilot study which stimulates further work.

Descriptions of dependent variable instruments was good, although I would have liked to have seen more said about the selection of the Test Anxiety Questionnaire other than it is brief and is validated. Specifically, why did you choose this one over the others. This is important because it is such a predominant feature of the study. I have the same comment about the Nature Connection Index and the Motions Sickness Assessment Questionnaire, although motion sickness does not figure as large as the others in the overall scope of the study. 

We thank the Reviewer for this comment. We have included in the Supplementary File 2, which provides details about the measures used in the study, further information on the rationale for choosing the Test Anxiety Questionnaire, the NCI and the motion sickness questionnaire. The information is included hereafter. Reference to this has been included in the manuscript in the opening line of section 2.3 Materials.

The Test Anxiety Questionnaire was chosen specifically because it can capture test anxiety with only few items, while focusing both on arousal and on cognitive aspects of the experience of test anxiety; there are other questionnaires available, such as the Revised Text Anxiety questionnaire and further versions [Benson, J & El-Zahhar, N. (2009). Further refinement and validation of the revised test anxiety scale. Structural Equation Modeling: A Multidisciplinary Journal, 203-221] which are lengthy and therefore deemed not suitable in our experimental context.

Similarly, the NCI was chosen as it has been validated with large samples and it captures with only few items how individuals experience nature, which is distinct, and more appropriate to our aims, than other scales such as the Meyer & Frantz (2005) [Mayer, FS, and Frantz, CM (2004). The connectedness to nature scale: a measure of individuals’ feeling in community with nature. J. Environ. Psychol. 24, 503–515. doi: 10.1016/j.jenvp.2004.10.001] which is more reflective of kinship with nature.

The Motion Sickness Questionnaire was utilised in previous studies from our lab and it was chosen for convenience.

Say a bit more about demographics and how the piloting group was selected. 

We included the recruitment method of the pilot group (‘Therefore, the VR green environment was piloted by 6 third-level students, who were recruited by convenience sampling among class-mates/acquaintances of the first author; they were of similar age of the experimental sample.’), other demographic characteristics such as gender were not recorded.

In section 2.3.7, you talk about how individuals "became restless." Can you please operationalize that a bit more? I am not certain what this would have looked like. Also there is a grammatical inconsistency in line 223 of this section (p.5). "immersion" needs to be "immersed."

We have now clarified what we mean with ‘restless’ and the text has been modified accordingly (‘Overall, individuals generally indicated that they were ready to stop watching the video’). We also changed ‘immersion’ into ‘immersed’, thank you for pointing this out.

In 2.4, there is a sentence that on p.6 (line 272) that needs to be re-worked so that it does not end in a preposition. 

 Thank you, we have now amended the sentence so that it reads “participants were fully debriefed about the nature of the experiment and to what group they had been assigned

Results are presented in an effective way and you drew conclusions from the statistics you ran. Findings were presented and deviations from earlier literature were adequately explained. Your discussion of results was thorough and the section on Limitations addressed the limitations that I discovered in the study. A very nice job, overall. I enjoyed the opportunity to review your work. 

We would like to thank the Reviewer for their positive comments on the Results and Discussion sections of our manuscript.

 

Reviewer 2 Report

The purpose of this work is to explore the impact of delivering VR scenes of a green environment before an exam can reduce exam anxiety and how this compares to scenes of an urban environment.  Exam anxiety was measured as negative affect, and findings show that the nature scenes did reduce exam anxiety in the “high anxiety” group.  The paper is thorough in its explanation of the field and well written.  There are some lingering questions about the experimental design and statistical methods and some suggestions for how to improve clarity and readability.   

Overall comments:

  • The introduction is thorough and sets up the paper comprehensively. Two hypotheses are stated at the end of the introduction: one concerning negative affect and the other concerning exam performance scores.  Later in the paper, in the results section, changes in positive affect are discussed.  These results are interesting and would be better framed and understood if a third hypothesis regarding positive affect were stated in the introduction. 
  • The way in which subjects are recruited is very confusing and also brings into question the way in which individuals were assigned to groups. It is stated that the Test Anxiety Questionnaire was used to determine whether a participant belonged in the “high” or “low” anxiety group. This designation would be better understood by the reader if the questionnaire were explained in more detail, including the scoring system and thresholds for going into the two groups.  Also, is it possible with this questionnaire for someone to score as “no anxiety”?  Wouldn’t this have been a better control than “low anxiety”?  It would also be helpful to add the timescale associated with the questionnaire and whether it is a single measure of average behavior or a discrete measurement that can be repeated after a specific event.  If it is the latter, why was it not used to assess test anxiety during the experiment instead of using negative affect as a proxy?  Additionally, if negative affect is a proxy for test anxiety, is positive affect known to be a proxy for the lack of anxiety?
  • In Line 59 it is stated that only half of the desired number of the participants for this study were tested. Adding an explanation of the impact of this reduction of sample size on the power of the statistical results is important.
  • In the experimental design, it is stated that “participants were reminded that it was only a pseudo test and that it would not impact their academic grades or determine their abilities in any way”. Additionally, there was no time limit put in place during the experiment.  Was this a restriction to the simulation of real test anxiety put in place by the institution you work with?  A concern here is that this limited the level of test anxiety the participants were able to experience, making the changes between test conditions smaller than they might have been with a more realistic test.  This limitation is stated in the “limitations” section, but it would be helpful to add a brief explanation of why these decisions were made.
  • The selection of VR scenes could use further explanation. The green environment chosen is a pleasant natural scene with trees and a stream, while the urban scene is an alley.  Is this difference not more of a positive natural scene and a negative urban scene?  Why not include a negative natural scene and a positive urban scene as well?  It is recommended to include a screenshot of each of the scenes in the paper to provide a representation of the films. 
  • It would also be beneficial to include more information on the exam participants were given (e.g., how many questions, what types of questions, etc.) and how the administration of it worked. Were the questions the same within each exam, such that they came as a set? If so, was the order randomized? If not, were questions randomly assigned to each test? Did the tests draw randomly from a larger set of questions?
  • In the statistical analysis of the results, the phrasing used to describe within subjects and between subjects effects are confusing. Stating that the test score is the only within subject variable would help with understanding the statistical approach.  There is also a question of whether the study is underpowered due to the inclusion of “time” as a variable (before/after).  Would it not make more sense to compare the differences between scores and remove time as a factor altogether (i.e., calculate the difference in score as the dependent variable as the measure against which to regress)?  Time isn’t really interesting here, what is interesting is whether or not the treatment had an effect resulting in a performance change, and that can be seen through the difference between scores pre- and post-test.
  • In the discussion, it is interesting to include participants’ subjective comments, but it’s important to also state whether participants were prompted for comments and how many within the group actually provided them. This was not noted as a form of data collected in the methods.
  • A question instead of a critique here: because only the high anxiety group showed differences in scores, is there an argument to be made that longer duration exposure could have had a stronger impact on the lower anxiety group? Could be interesting to look into. I was also wondering about carry over with time between testing and exams.
  • In the limitations, it is important to note there may be differences for younger participants, warranting further investigation (i.e. not college age students who also may suffer from test anxiety)
  • The quality of the figures in general could use improvement.
  • In tables, standard deviations should have 1, or at most 2 significant figures, and the means should then be rounded to the same unit location (e.g. 2.7 +/- 0.4, not 2.756 +/- 0.437)

Specific comments are listed in the line-by-line review:

  • Line 13: should this be 10 high and 10 low exam anxiety students?
  • Line 16: including statistics with “significant reductions” would be good
  • Line 89: “by whereby” looks like a typo?
  • Line 153/4 it is not clear if the #/50 is the number of participants or the score out of 50. Both pieces of information are desired. Does this scale have a standard by which to classify high and low?
  • Line 158: It is more typical to say “a priori” with italics and this capitalization
  • Line 197: including an age range for “third-level students” would help translate this level to other countries’ school systems
  • 209 standardized is spelled incorrectly
  • Line 220: Calling motion sickness “unpleasant sensations” is misleading
  • Line 223: typo – “being immersion”  “being immersed”?
  • Line 229ish: The discussion of scene exposure duration starts here. It would be clearer if it were stated that the optimal duration was determined to be 4 minutes first and after that the decision process can be explained thoroughly with better context. 
  • Line 249: I'm just learning this myself, but asking demographic questions before testing can influence participants' performance, especially for participants who are members of minority groups. It is better to ask these questions at the end of the study session. No need to change anything here, just an fyi for future studies.
  • Line 312, these effects ARE significant. Is this a typo?
  • Line 316 and 323: “was”  “were”, and “between subject”  “between-subject”
  • Line 345 (Figure 1): The quality of this figure needs to be higher. It would also be better if the bounds of the x-axis were less than and greater than times 1 & 2 so the data can be seen more clearly.  There should also be a label of units on the y-axis.  Perhaps also use different colors for the error bars so you can see the difference between them.  Also, explain what the error bars represent.
  • Line 358: Can “less sporadic” be defined statistically?
  • Line 361 (Figure 2): This figure also needs to be produced in higher quality.
  • Line 368: Why is “[46]” in this line?
  • Line 379: Mentioned this before, but saying “time” is the main effect is somewhat misleading. It would be more valuable to talk about differences between scores directly.
  • Line 428 – Incorporating motion sickness comes up in a jarring way. They used a metric to assess motion sickness, but this is not discussed here. Was motion sickness reported? If so, then this should be brought up much earlier
  • Line 430: et al should be italicized and punctuated as follows: et al.
  • Line 437: “surmounting” doesn’t appear to be the correct word here. Maybe “mounting”?
  • Line 475: There are many devices that measure physiological signals for biofeedback, why call out the pip explicitly?
  • Line 485: “Participants” is possessive (should be participants’)
  • Line 490: Is an HTC Vive really considered cost-effective? Particularly if many students would need this intervention? Could not a short film shown to many participants be used?

 

Author Response

The introduction is thorough and sets up the paper comprehensively. Two hypotheses are stated at the end of the introduction: one concerning negative affect and the other concerning exam performance scores.  Later in the paper, in the results section, changes in positive affect are discussed.  These results are interesting and would be better framed and understood if a third hypothesis regarding positive affect were stated in the introduction. 

 

We thank the Reviewer for this suggestion, as we do test positive affect in our results we introduced the following hypothesis ‘Second, we aimed to test whether positive affect was increased by virtual exploration of a natural scene.’ The paragraph has been modified accordingly.

 

The way in which subjects are recruited is very confusing and also brings into question the way in which individuals were assigned to groups. It is stated that the Test Anxiety Questionnaire was used to determine whether a participant belonged in the “high” or “low” anxiety group. This designation would be better understood by the reader if the questionnaire were explained in more detail, including the scoring system and thresholds for going into the two groups.  Also, is it possible with this questionnaire for someone to score as “no anxiety”?  Wouldn’t this have been a better control than “low anxiety”?  It would also be helpful to add the timescale associated with the questionnaire and whether it is a single measure of average behavior or a discrete measurement that can be repeated after a specific event.  If it is the latter, why was it not used to assess test anxiety during the experiment instead of using negative affect as a proxy?  Additionally, if negative affect is a proxy for test anxiety, is positive affect known to be a proxy for the lack of anxiety?

 

The method section has been reworded to clarify:

  • That the recruitment happened with an initial screening, in two steps, and a targeted invitation for the experimental study. We appreciate that the description of the screening was confusing and we feel that the re-phrasing clarified the point
  • The cut-offs are indicated in the paragraph
  • We utilise the approach of dividing the sample into quartiles, as we estimated that participants who report 0 or 50 in the scale (no-anxiety vs high-anxiety) would limit our sample and tap into extreme cases.
  • Positive (and negative) affect are distinct constructs, that related one another [Headey, B., Kelley, J., Wearing, A., (1993). Dimensions of mental health: Life satisfaction, positive affect, anxiety and depression. Social Indicators Research, 29, 63-82], we utilised PANAS instead of assessing test anxiety directly as the questionnaire is not suitable for short term test-retest as it does not refer to the present moment, unlike the PANAS.

The paragraph now reads as follows;

‘Firstly, two separate emails (see supplementary file 1) were sent out via two university emailing lists and social media platforms, one aimed at recruiting students who felt they have high test anxiety and one aimed at recruiting students who felt they do not have high test anxiety. The emails also contained a link to fill out a brief demographics and Nature Connection Index (NCI)[36] survey and also highlighted the study’s aim and detailed the inclusion criteria for participation. Inclusion criteria were that students were aged 18 years old or older, felt that they experience a high level of test anxiety but they are not seeking any clinical treatment for it (i.e. not a clinical population) (for the email target to high anxiety students). The second email was sent out aimed to recruit the control population. The content of this email was the same as the aforementioned email with the exception that it sought to recruit individuals who have healthy or low levels of exam anxiety (i.e. individuals who do not feel that they experience particularly high levels of test anxiety). This first step of recruitment had the scope of estimate whether there was an interest in the study from students who recognised themselves as having low or high anxiety, before proceeding with the administration of the test anxiety questionnaire.

In the second step of recruitment, an email was sent out to students who had responded to the first email and completed the demographics survey. This email instructed the participants to fill out a test anxiety questionnaire [37]. Based on the results of the test anxiety questionnaire, quartiles were calculated and only those who scored within the upper quartile (35/50 and above) or lower quartile (24/50 and below) of the test anxiety questionnaire were contacted to arrange a time to complete the experimental session. ‘

 

In Line 59 it is stated that only half of the desired number of the participants for this study were tested. Adding an explanation of the impact of this reduction of sample size on the power of the statistical results is important.

We fully recognise that not achieving the full sample size is a limitation of the study, which is acknowledged in the Discussion; in order to state this point clearly from the outset, we added the following to the participants section

‘Although we acknowledge that this is a limitation of this study, we refer to the effect sizes (medium to large) and observed power to support the validity of our results in this pilot study, which will require replication. ‘

 

In the experimental design, it is stated that “participants were reminded that it was only a pseudo test and that it would not impact their academic grades or determine their abilities in any way”. Additionally, there was no time limit put in place during the experiment.  Was this a restriction to the simulation of real test anxiety put in place by the institution you work with?  A concern here is that this limited the level of test anxiety the participants were able to experience, making the changes between test conditions smaller than they might have been with a more realistic test.  This limitation is stated in the “limitations” section, but it would be helpful to add a brief explanation of why these decisions were made.

We agree with the Reviewer that the situation is very likely not as anxiety-inducing as an actual exam situation. The reason of this choice was based on ethical principles; being this a pilot study, we felt that it would be too much to ask to put the participants in a very stressful situation, when our short interventions had never been tested before. In future studies, which we are currently planning, we aim to utilise more realistic scenarios.

We added the following statement in section 2.1 to clarify this point:

‘Given that this was a pilot study, we felt a more cautious approach to inducing anxiety than simulating an actual exam would be suitable as first step in determining the efficacy of our intervention’

 

The selection of VR scenes could use further explanation. The green environment chosen is a pleasant natural scene with trees and a stream, while the urban scene is an alley.  Is this difference not more of a positive natural scene and a negative urban scene?  Why not include a negative natural scene and a positive urban scene as well?  It is recommended to include a screenshot of each of the scenes in the paper to provide a representation of the films. 

We thank the Reviewer for the opportunity to clarify this point; the videos were selected based on the classic approach of Cognitive Restoration Theory, whereby exposure to a green environment is contrasted with exposure to an urban environment. While there are variations among each type of environment, generally green sceneries are considered more restorative than urban. In future studies it would be interesting to test a variety of videos for each of these two types to have a more nuanced approach to determining the restorative potential of natural and urban settings.

We added this suggestion to the Discussion

‘In addition, the authors selected the green and urban sceneries; in future studies it would be useful to pilot test different kinds of urban and natural environments for their restorative potential with a separate group of participants in order to obtain a more nuanced classification of restorative and non-restorative videos.’

 

It would also be beneficial to include more information on the exam participants were given (e.g., how many questions, what types of questions, etc.) and how the administration of it worked. Were the questions the same within each exam, such that they came as a set? If so, was the order randomized? If not, were questions randomly assigned to each test? Did the tests draw randomly from a larger set of questions?

The instructions used for the test are presented in the Supplementary files (n.2), which we report below, we added to this description that the order of administration of the questions was fixed. We have highlighted more clearly in the text that the reader can refer to the supplementary material for further information.

 

Supplementary file 2

‘The relevant instructions regarding how to complete the test were listed on the sheet before initiating this task. Each section contained a series of questions that became progressively difficult however, the total level of difficulty of each section is equal. Session 1 of the test comprised ten questions whereas session 1 of the test comprised eight questions. The questions were administered in fixed order across participants. To compare scores pre and post-intervention, two questions were semi-randomly omitted from participants’ total test score for session 1 (i.e. items 9 and 10 were omitted provided that the participant did not get either of these correct if they did, then another item that was answered incorrectly was removed). This was done to be conservative in terms of benefits to the test when taken the second time.’

 

 

In the statistical analysis of the results, the phrasing used to describe within subjects and between subjects effects are confusing. Stating that the test score is the only within subject variable would help with understanding the statistical approach.  There is also a question of whether the study is underpowered due to the inclusion of “time” as a variable (before/after).  Would it not make more sense to compare the differences between scores and remove time as a factor altogether (i.e., calculate the difference in score as the dependent variable as the measure against which to regress)?  Time isn’t really interesting here, what is interesting is whether or not the treatment had an effect resulting in a performance change, and that can be seen through the difference between scores pre- and post-test.

 

We thank the Reviewer for the opportunity to clarify this point, and we reworded the text to make sure that the variables appear clearly defined. In terms of the ANOVA we conducted, it matches directly the design we utilised, and it allows to test whether there are baseline differences between groups. We agree that the difference score is interesting and we conducted the analysis on difference scores for negative affect, which is the dependent variable with significant effects at pre-post but with differences at baseline.

 

The Design paragraph now reads as follows:

‘2.2. Design 

The present research employed a quantitative mixed between and within-participants design (2 (high vs low anxiety) x 2 (urban vs nature VR) x 2 (pre vs post VR intervention). There was an experimental (high anxiety) and a control (low anxiety) group. Participants were assigned to one of the two groups based on their scorings on the Test Anxiety Questionnaire [37]. Within each group, participants were randomly assigned to either an urban or nature virtual reality (VR) intervention condition. The dependent variables were the Positive and Negative Affect Schedule, PANAS [38] and the test performance scores obtained pre- and post-urban or rural VR intervention. ‘

 

We also clarified this point in the Data Analysis section

 

‘In order to match the design of the study, each of our dependent variables (PA, NA and test scores) was analysed by performing a 2 (high vs low anxiety) X 2 (nature vs urban) X 2 (pre vs post) mixed ANOVAs with nature connectedness, living environment , VR prior, Years of Education and motion sickness as covariates.’

 

The difference score for the PANAS (negative affect) is reported on pg. 9 l 365-376, pg. 10 l 381-392 and Figure 2.

 

In the discussion, it is interesting to include participants’ subjective comments, but it’s important to also state whether participants were prompted for comments and how many within the group actually provided them. This was not noted as a form of data collected in the methods.

 

The Reviewer is right in noting that this qualitative element was not mentioned in the methods; this is because it was an anecdotal collection of comments spontaneously produced by participants and noted by the experimenter in an informal way. We further clarified in the Discussion that this is not a qualitative component of the study. However, these spontaneous comments suggest that in future studies a properly designed qualitative element could be informative on the perceived experience of the intervention by participants.

We added the following statement to the description of such commentaries (pg. 11 l 431)

‘These were unsolicited comments which the experimenter noted and they were not collected, nor analysed as a qualitative component of the study, which would be useful to introduce in the future.’

 

 

A question instead of a critique here: because only the high anxiety group showed differences in scores, is there an argument to be made that longer duration exposure could have had a stronger impact on the lower anxiety group? Could be interesting to look into. I was also wondering about carry over with time between testing and exams.

 

Thank you for this interesting suggestion, it is possible that the high anxiety group effect can be generated by an acute bout of nature exposure, while restorative effects could be obtained in the low anxiety with longer exposure as they have a less urgent need for restauration. It is an interesting question for future studies, which we have mentioned in the Discussion (pg. 10 l 421-423)

 

‘It remains a question for future studies whether variations in the intervention, for e.g. a longer duration, may allow for a restorative effect to emerge also in the low anxiety group. ‘

 

In the limitations, it is important to note there may be differences for younger participants, warranting further investigation (i.e. not college age students who also may suffer from test anxiety)

 

We noted this possibility in our conclusions (p.g. 13 l513-514) ‘This indicates a potentially useful tool to apply to exam situations in an academic context. It is thought that further research seeking to replicate this study would help to validate this finding and extend it by utilising different tests and other more portable VR technology. Its applicability to other populations such as secondary school children could also be tested.’

 

The quality of the figures in general could use improvement.

We have edited both Figure 1 and Figure 2.

 

In tables, standard deviations should have 1, or at most 2 significant figures, and the means should then be rounded to the same unit location (e.g. 2.7 +/- 0.4, not 2.756 +/- 0.437)

Thank you for noting this. We have now amended all figures to show 2 decimal places unless smaller than 0.01.

 

Line 13: should this be 10 high and 10 low exam anxiety students?

It is 20 as 10 in each anxiety group were exposed to urban and 10 to green sceneries  

 

Line 16: including statistics with “significant reductions” would be good

Added

 

Line 89: “by whereby” looks like a typo?

Corrected ‘by which’

 

Line 153/4 it is not clear if the #/50 is the number of participants or the score out of 50. Both pieces of information are desired. Does this scale have a standard by which to classify high and low?

We utilised quartiles, the sentence has been reworded as follows ‘Based on the results of the test anxiety questionnaire, quartiles were calculated and only those who scored within the upper quartile (score 35/50 and above) or lower quartile (score 24/50 and below) of the test anxiety questionnaire were contacted to arrange a time to complete the experimental session.’

 

Line 158: It is more typical to say “a priori” with italics and this capitalization

Amended as suggested (line 169)

 

Line 197: including an age range for “third-level students” would help translate this level to other countries’ school systems

The mean age and SD has been added ‘The MSAQ [43] acted as a means of identifying any the group of symptoms including for e.g. nausea and dizziness associated with the discrepancy between sensory inputs potentially experienced by participants during the virtual reality intervention.’

 

209 standardized is spelled incorrectly

As we have employed British English for this paper, we have left “standardised” to reflect that.

 

Line 220: Calling motion sickness “unpleasant sensations” is misleading

A more precise definition of motion sickness has been introduced (p.g.5, l 235-237)

‘The MSAQ [43] acted as a means of identifying any the group of symptoms including for e.g. nausea and dizziness associated with the discrepancy between sensory inputs potentially experienced by participants during the virtual reality intervention.’

 

Line 223: typo – “being immersion” à “being immersed”?

Corrected

 

Line 229ish: The discussion of scene exposure duration starts here. It would be clearer if it were stated that the optimal duration was determined to be 4 minutes first and after that the decision process can be explained thoroughly with better context. 

We clarified the duration from the outset p.g. 6 l 244-145 ‘The duration of the videos was set to 4 minutes based on pilot work described below.’

 

Line 249: I'm just learning this myself, but asking demographic questions before testing can influence participants' performance, especially for participants who are members of minority groups. It is better to ask these questions at the end of the study session. No need to change anything here, just an fyi for future studies.

This is a very important point, we do ask demographic questions as last step in our studies with older adults, but the Reviewer raises a good point on minorities, which will keep into account for the future.

 

Line 312, these effects ARE significant. Is this a typo?

The Reviewer refers here to the results of the Anova for Positive affect, none of which were statistically significant. We report the text here, with p-values in blue font:

“Results from the mixed ANOVA conducted without covariates revealed that for the within-subject effects , there was no significant main effect of time (T1T2) on PA scores, F(1, 36) = 0.01, p = .98, ηp2 < 0.01. There was no significant interaction effect between time and anxiety, F(1, 36) = 2.16, p = .15, ηp2 = 0.06, or between time and VR condition, F (1, 36) = 2.93, p = .09, ηp2 = 0.07, on PA scores. Moreover, there was no significant interaction effect between time and anxiety and condition on PA scores, F (1, 36) = .37, p = .55, ηp2 = 0.01. There were no significant between subject effects.”

 

Line 316 and 323: “was” à “were”, and “between subject” à “between-subject”

Amended as indicated.

 

Line 345 (Figure 1): The quality of this figure needs to be higher. It would also be better if the bounds of the x-axis were less than and greater than times 1 & 2 so the data can be seen more clearly.  There should also be a label of units on the y-axis.  Perhaps also use different colors for the error bars so you can see the difference between them.  Also, explain what the error bars represent.

We have amended figure 1 to increase the resolution and made edits to figure and figure caption in accordance with the Reviewer’s comments. We have also ensured that the two figures have similar formatting.

 

Line 358: Can “less sporadic” be defined statistically?

We clarified the wording, the sentence now reads as follows ‘However, mean difference scores within the high anxiety nature group were consistently higher than in the high anxiety urban group (Figure 3b) and showed greater overall declines across participants compared to this group in line with the significant effect on NA scores.’

 

Line 361 (Figure 2): This figure also needs to be produced in higher quality.

We have figure 2 to increase the resolution and made edits to figure and figure caption in accordance with the Reviewer’s comments. We have also ensured that the two figures have similar formatting

 

Line 368: Why is “[46]” in this line?
We clarified this point as follows ‘(see [46] for classification of effect size)’

Line 379: Mentioned this before, but saying “time” is the main effect is somewhat misleading. It would be more valuable to talk about differences between scores directly.
Time in our analysis indicates such differences as clarified in the Design paragraph. As we note in our response to a previous comment, the Anova model matches directly the design we utilised, and it allows to test whether there are baseline differences between groups. We agree that the difference score is interesting and we conducted the analysis on difference scores for negative affect, which is the dependent variable with significant effects at pre-post but with differences at baseline

 

Line 428 – Incorporating motion sickness comes up in a jarring way. They used a metric to assess motion sickness, but this is not discussed here. Was motion sickness reported? If so, then this should be brought up much earlier.

We clarified why motion sickness is mentioned in this context, the sentence now reads as follows: ‘. It is possible that a longer exposure time would have increased the chance to obtain an effect on positive affect, however the short intervention time was linked to avoiding major feelings of motion sickness. Further research is needed on the association between IVR green exposure and positive affect. Motion or ‘Cyber sickness’ is a commonly reported issue amongst virtual reality research [12].’

 

Line 430: et al should be italicized and punctuated as follows: et al.
Corrected

 

Line 437: “surmounting” doesn’t appear to be the correct word here. Maybe “mounting”?
Corrected

 

Line 475: There are many devices that measure physiological signals for biofeedback, why call out the pip explicitly?

We agree with the Reviewer, the specific reference has been removed

 

Line 485: “Participants” is possessive (should be participants’)
Corrected

 

Line 490: Is an HTC Vive really considered cost-effective? Particularly if many students would need this intervention? Could not a short film shown to many participants be used?

This is a good point and this is why we refer to ‘more portable VR technology’ with the idea to look at technologies such as Google Cardboard or alike in future studies. We have mitigated the statement, which now reads as follows: ‘If this intervention were proved to be continually effective under more realistic testing circumstances such as a mock exam, this intervention may have the potential to serve as a first step towards a cost-effective and accessible therapeutic tool that could help students to cope with surges in exam anxiety before a testing scenario’

Round 2

Reviewer 2 Report

The authors have addressed all of my concerns. Thanks!
Back to TopTop