Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessPerspective

Peer-Review Record

Beyond the Testing Room: Virtual Reality as a Paradigmatic Solution to Ecological Validity Deficits in Neuropsychological Memory Assessment

Virtual Worlds 2026, 5(1), 7; https://doi.org/10.3390/virtualworlds5010007

by Ninette Simonian and Nicco Reggente^*

Reviewer 1: Anonymous

Reviewer 2:

Seong-Yoon Shin

Reviewer 3: Anonymous

Virtual Worlds 2026, 5(1), 7; https://doi.org/10.3390/virtualworlds5010007

Submission received: 20 October 2025 / Revised: 20 January 2026 / Accepted: 26 January 2026 / Published: 2 February 2026

(This article belongs to the Special Issue Contemporary Developments in Mixed, Augmented, and Virtual Reality: Implications for Teaching and Learning)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors argue that traditional memory assessments lack ecological validity and fail to detect early cognitive decline. They propose VR as a superior alternative, demonstrating enhanced diagnostic sensitivity, stronger alignment with real-world functioning, and reduced susceptibility to compensatory masking. While the manuscript is well organized and the survey addresses an important topic, the manuscript requires further improvement.

While the manuscript presents a compelling argument that VR enhances ecological validity in memory assessment, the discussion of immersion remains too general. The authors frequently invoke “presence” as mechanisms for improved diagnostic sensitivity, but they do not sufficiently unpack the specific features of immersive environments that may differentially influence cognitive performance. For example, prior work has shown that features like stereoscopic rendering can directly impact motor behavior and perceptual encoding in age-sensitive contexts [1], suggesting that a more granular analysis of immersion would strengthen the paper’s theoretical and translational contributions. Given that the manuscript cites empirical studies showing superior sensitivity of VR tasks in distinguishing MCI and AD, a deeper examination of which VR design elements drive those gains is warranted. I would recommend referring to that example, as it illustrates the level of depth and specificity the current discussion of immersion lacks.

[1] https://doi.org/10.3389/frvir.2024.1475482

While the manuscript highlights VR’s diagnostic sensitivity across several domains, the treatment of these findings remains too broad. Many claims are repeated across different types of memory without explaining which domains are most impacted or under what conditions VR outperforms traditional assessments. For example, episodic memory, spatial navigation, and prospective memory are presented as uniformly improved in VR, yet these are cognitively and neurologically distinct processes. A more differentiated synthesis of how VR affects specific memory functions, particularly in relation to disease stage or task type, would make the conclusions more informative and precise.

Another issue is that the paper presents an overwhelmingly positive picture of VR-based memory assessment but overlooks studies reporting mixed, null, or even unfavorable outcomes. This selective reporting gives an incomplete view of the current evidence base. I think a more balanced discussion that includes contradictory findings would add credibility and help clarify under what conditions VR is or is not beneficial.

Although the paper describes VR as a paradigm shift, it lacks a cohesive theoretical explanation for why VR assessments yield better diagnostic outcomes. While the authors mention engagement of regions such as the hippocampus or entorhinal cortex, these references are largely surface-level and are not supported by a consistent neurocognitive model. I feel it is difficult to evaluate the mechanisms through which VR enhances memory assessment.

The manuscript cites a wide range of VR platforms and tools, but it does not critically compare their methodological characteristics. The reviewed studies vary in interface, interaction methods, task complexity, and outcome measures, yet these differences are not systematically analyzed. As a result, the synthesis lacks structure, and the reader is left with the impression that all VR tasks are equally effective. I believe some level of organization is needed, like structuring the review by cognitive domain or task design.

The section on implementation challenges is brief and underdeveloped. The authors mention issues like standardization and hardware accessibility, but do not explore how these challenges affect clinical adoption or patient usability. There is little discussion of practical concerns, such as cost, training requirements, or integration with existing clinical workflows. I recommend expanding this section to directly examine barriers to clinical adoption which are currently underdiscussed.

Author Response

Comment 1: While the manuscript presents a compelling argument that VR enhances ecological validity in memory assessment, the discussion of immersion remains too general. The authors frequently invoke “presence” as mechanisms for improved diagnostic sensitivity, but they do not sufficiently unpack the specific features of immersive environments that may differentially influence cognitive performance. For example, prior work has shown that features like stereoscopic rendering can directly impact motor behavior and perceptual encoding in age-sensitive contexts [1], suggesting that a more granular analysis of immersion would strengthen the paper’s theoretical and translational contributions. Given that the manuscript cites empirical studies showing superior sensitivity of VR tasks in distinguishing MCI and AD, a deeper examination of which VR design elements drive those gains is warranted. I would recommend referring to that example, as it illustrates the level of depth and specificity the current discussion of immersion lacks.

[1] https://doi.org/10.3389/frvir.2024.1475482

Response 1: We appreciate this feedback regarding the need for greater explanation regarding the mechanisms of VR that contribute to diagnostic sensitivity. We have edited our “Empirical Evidence” section within Section A: Addressing Ecological Validity Limitations to include “Immersive Features and Experimental Controls.” In this section, we examine different VR design elements (visual fidelity, stereoscopic rendering, field of view, and field of regard) that influence memory performance, with narrative around how this impacts studies on MCI and AD. We have incorporated the recommended reference on stereoscopic rendering and include additional findings from Smith et al. (2019) and Ragan et al. (2010). We have also substantially enriched the Superior Sensitivity and Diagnostic Capabilities section to include additional earlier references that help substantiate our points.

Comment 2: While the manuscript highlights VR’s diagnostic sensitivity across several domains, the treatment of these findings remains too broad. Many claims are repeated across different types of memory without explaining which domains are most impacted or under what conditions VR outperforms traditional assessments. For example, episodic memory, spatial navigation, and prospective memory are presented as uniformly improved in VR, yet these are cognitively and neurologically distinct processes. A more differentiated synthesis of how VR affects specific memory functions, particularly in relation to disease stage or task type, would make the conclusions more informative and precise.

Response 2: Thank you for this thoughtful comment! We recognize the general and repetitive claims throughout the manuscript and have addressed them by creating a new subsection “Domain-Specific Efficacy: Why VR Benefits Vary by Memory Type” within Section A. Addressing Ecological Validity Limitations. We organized this section around three domains: 1) Spatial Memory – explaining VR’s advantages for allocentric navigation and path integration, with differential effects across MCI and AD stages; 2) Episodic Memory – distinguishing item memory from feature binding, noting that active navigation can paradoxically impair recall in older adults due to dual-task interferences; and 3) Prospective Memory – clarifying VR’s superior sensitivity for time-based versus event-based tasks.

Comment 3: Another issue is that the paper presents an overwhelmingly positive picture of VR-based memory assessment but overlooks studies reporting mixed, null, or even unfavorable outcomes. This selective reporting gives an incomplete view of the current evidence base. I think a more balanced discussion that includes contradictory findings would add credibility and help clarify under what conditions VR is or is not beneficial.

Response 3: We thank the reviewer for the call to present a more balanced perspective, which is particularly appropriate given some of our recent shortcomings in using VR in older adult populations. We have expanded the Limitations section that acknowledges: “VR’s potential to improve ecological validity is well-documented, but evidence is not uniformly positive. VR does not always enhance memory performance. In some cases, added cognitive load and interference complexity may introduce confounds.” We also draw increased attention to pragmatic issues with VR in specific populations (e.g., motion sickness in older adults unaccustomed to digital technologies).

Comment 4: Although the paper describes VR as a paradigm shift, it lacks a cohesive theoretical explanation for why VR assessments yield better diagnostic outcomes. While the authors mention engagement of regions such as the hippocampus or entorhinal cortex, these references are largely surface-level and are not supported by a consistent neurocognitive model. I feel it is difficult to evaluate the mechanisms through which VR enhances memory assessment.

Response 4: We have added a new section titled "Neurocognitive Mechanisms: Why VR Engages Vulnerable Neural Circuits" within Section B. “Expanding Scope and Diagnostic Differentiation”, providing a coherent theoretical framework organized around three neural systems: 1) the entorhinal cortex and grid-cell network—citing Kunz et al.'s (2015) demonstration of reduced grid-cell representations in at-risk individuals; 2) the retrosplenial cortex and mental frame syncing – explaining ego-to-allocentric translation demands; and 3) hippocampal feature binding –describing how VR encourages multimodal binding mechanisms. We cite Park et al.'s (2022) empirical validation showing 94.4% sensitivity and 96.4% specificity for distinguishing MCI from healthy aging: “diagnostic accuracy attributed specifically to the task's engagement of allocentric spatial representation, a hippocampal-dependent function compromised early in MCI”.

Comment 5: The manuscript cites a wide range of VR platforms and tools, but it does not critically compare their methodological characteristics. The reviewed studies vary in interface, interaction methods, task complexity, and outcome measures, yet these differences are not systematically analyzed. As a result, the synthesis lacks structure, and the reader is left with the impression that all VR tasks are equally effective. I believe some level of organization is needed, like structuring the review by cognitive domain or task design.

Response 5: We welcome this feedback, and while we are of the opinion that there is an umbrella by which all VR platforms have the shared characteristics that promote enriched memory testing, we have reorganized our synthesis around cognitive domains as suggested. The new “Domain-Specific Efficacy” subsection structures the analysis by spatial memory, episodic memory, and prospective memory rather than by VR platform, as we felt that may fall beyond the scope of this current manuscript (but does motivate follow-up work). We have clarified in the limitations that cataloging every platform's technical specifications exceeds this paper's scope; our contribution lies in synthesizing the theoretical rationale for VR's advantages across cognitive domains.

Comment 6: The section on implementation challenges is brief and underdeveloped. The authors mention issues like standardization and hardware accessibility, but do not explore how these challenges affect clinical adoption or patient usability. There is little discussion of practical concerns, such as cost, training requirements, or integration with existing clinical workflows. I recommend expanding this section to directly examine barriers to clinical adoption which are currently underdiscussed.

Response 6: We appreciate the call to increase granularity and feel it strengthens the manuscript. The Limitations section has been substantially expanded with three new subsections: (1) Patient Usability and Safety – addressing cybersickness, the digital divide, and physical comorbidities affecting controller use; (2) Integration with Clinical Workflows – discussing setup time, clinician training, and physical space requirements compared to brief screening tools; and (3) Standardization and Normative Data – addressing the absence of standardized protocols, reliable cut-off scores, and dismantle studies. We note that consumer VR's rapid improvement trajectory suggests many limitations may be temporary.

Reviewer 2 Report

Comments and Suggestions for Authors

1. What is the argument of this paper? No matter how much I read the abstract, I still can't figure out what it's claiming.
2. There's no mention of the patient-caregiver relationship in the main text.
3. These limitations seriously impede the early detection and treatment trajectory of mild cognitive impairment and prodromal Alzheimer's disease, but the nature and details of these limitations are missing.
4. The paper claims to utilize an immersive and natural environment to induce genuine cognitive processing while maintaining experimental control, thereby revealing subtle deficits often obscured by compensatory mechanisms in conventional assessments. However, I don't understand the description of the environment used or the experimental controls employed.
5. Studies have shown that tasks that capture the interdependent workings of executive function, spatial navigation, and memory can more effectively distinguish between healthy aging, mild cognitive impairment (MCI), and Alzheimer's disease (AD). What are the differences in criteria and classifications?
6. This paper lacks experimental or case study results. Isn't this research based on a set of findings? As you can see, this study is a survey-based paper, but I believe it requires some experimental data collection and analysis. Please provide this information.

Author Response

Comment 1: What is the argument of this paper? No matter how much I read the abstract, I still can't figure out what it's claiming.

Response 1: We apologize for the lack of clarity. We have completely revised the abstract to state our thesis directly: traditional assessments suffer from ecological validity limitations that compromise clinical utility, and VR offers a paradigmatic solution. The new abstract opens with the core problem, states our thesis explicitly ("We argue that virtual reality addresses these limitations"), summarizes supporting evidence in shorter sentences, and concludes with our main claim.

Comment 2: There's no mention of the patient-caregiver relationship in the main text.

Response 2: We have expanded our treatment of the patient-caregiver relationship in Section C, adding detailed subsections on: caregiver burden and psychological state; caregiver cognitive status effects; compensatory behavioral dynamics (the "do for" dynamic); and disease stage effects on caregiver-clinician agreement. We also added a new "Gaps in Current Research" subsection identifying needed studies on whether VR assessments improve patient-caregiver concordance, facilitate shared decision-making, and affect caregiver burden.

Comment 3: These limitations seriously impede the early detection and treatment trajectory of mild cognitive impairment and prodromal Alzheimer's disease, but the nature and details of these limitations are missing.

Response 3: We have elaborated on how traditional assessment limitations specifically impede early detection. In the Introduction, we now explain that traditional MCI criteria define memory deficits with preserved ADLs, but VR studies indicate functional impairment in complex IADLs may characterize predementia. In Section B, we cite specific examples: Tarnanas et al.'s VR-DOT predicted MCI-to-AD conversion more accurately than standard measures; Zygouris et al.'s Virtual Supermarket achieved 87.3% classification accuracy; and Howett et al.'s path integration task differentiated biomarker-positive from biomarker-negative MCI patients. In fairness, and perhaps more in line with the reviewer’s critique, we have also increased the limitations section to highlight how VR is far from a perfect solution given the current pragmatic limitations (e.g., cybersickness, clinical training and facilities, etc.).

Comment 4: The paper claims to utilize an immersive and natural environment to induce genuine cognitive processing while maintaining experimental control, thereby revealing subtle deficits often obscured by compensatory mechanisms in conventional assessments. However, I don't understand the description of the environment used or the experimental controls employed.

Response 4: We intended to discuss the utilization of immersive and natural environments in general, but felt compelled by the reviewer to make more granular claims around the specific environments utilized within leading VR-memory studies. Thus, we have now elaborated on VR environments and experimental controls in Section A. We now describe specific environments (Virtual Grocery Store, virtual kitchens, city-driving simulations) and their cognitive demands and ecological relevance. We explain how VR enables experimental controls: standardizing stimulus timing/location, enforcing invisible boundaries, teleporting between contexts, creating distraction zones, and introducing controlled interruptions. We also describe matched-pairs designs isolating volitional control from perceptual input.

Comment 5: Studies have shown that tasks that capture the interdependent workings of executive function, spatial navigation, and memory can more effectively distinguish between healthy aging, mild cognitive impairment (MCI), and Alzheimer's disease (AD). What are the differences in criteria and classifications?

Response 5: We have clarified diagnostic distinctions in the Introduction and Section B, explaining that traditional MCI criteria require memory deficits with preserved activities of daily living (ADLs), while VR studies reveal that complex instrumental ADL impairments may characterize predementia under ecologically valid conditions. We illustrate these distinctions through empirical examples demonstrating VR's capacity to differentiate along the healthy aging → MCI → AD continuum with greater precision than traditional classifications.

Comment 6: This paper lacks experimental or case study results. Isn't this research based on a set of findings? As you can see, this study is a survey-based paper, but I believe it requires some experimental data collection and analysis. Please provide this information

Response 6: This manuscript is a narrative review and theoretical synthesis, not an empirical study. Our contribution lies in: 1) systematically categorizing ecological validity limitations across four dimensions; 2) synthesizing empirical findings from the VR assessment literature; 3) integrating evidence regarding neural and cognitive mechanisms underlying VR's advantages; and 4) identifying implementation challenges to guide future research. This integrative framework provides value precisely because the field has produced many empirical studies but lacks a coherent theoretical synthesis. Throughout the manuscript, we have presented empirical evidence stemming from experimental data collection and analysis, and draw attention to the gaps in current research so as to motivate future empirical research.

Reviewer 3 Report

Comments and Suggestions for Authors

Virtual Reality is increasingly used in the rehabilitation of patients with special needs, which is why this topic is current and justified.

1.Abstract: the sentences are very long, which makes reading difficult, as well as the focus of the reader’s attention. 

2.The introduction is too long and should be divided into two independent sections, e.g.: “1. Introduction” and “2. Research Subject”.
 
2.1.Instead of naming the first section “1. Shortcomings of Traditional Tests”, I suggest the authors to change the title of this section to “1. Introduction”.

2.2.I suggest the authors to reorganize “1. Introduction” by forming a total of 5 key paragraphs: (i) introduce the reader to the field, (ii) describe the subject of the paper, (iii) present the motivation, (iv) briefly describe the goal and potential contribution of the research, and (v) provide the structure of the paper according to sections.

2.3 I suggest that authors form the “2. Research Subject” in the following way: (i) describe the research subject in more detail, (ii) define the research questions (RQs) in the form: “RQ1:...”, “RQ2:...”, (iii) what is the syntax (and strings within it) for reviewing the database, (iv) which databases were used for the literature search, as well as the search period, (v) what are the inclusion/exclusion criteria, and (vi) how was the data extraction done.

3.Form an independent “Discussion” section and within it: (i) review the most important findings in accordance with the set RQs, (ii) present the limitations, and (iii) in accordance with the findings, present perspectives and future trends.

NOTE: The paper is difficult to read due to numerous long sentences, which greatly distracts the reader's attention; in addition, some sentences are formulated more for a journalistic concept rather than for a scientific journal, which also makes reading and understanding the manuscript difficult, so I ask the author to work on all of this.

Author Response

Comment 1: Abstract: the sentences are very long, which makes reading difficult, as well as the focus of the reader’s attention. 

Response 1: We have revised the abstract, shortening sentences throughout. For example, the original 51-word opening sentence has been replaced with: "Traditional neuropsychological memory assessments lack ecological validity and fail to capture how memory functions in everyday life" (16 words).

Comment 2: The introduction is too long and should be divided into two independent sections, e.g.: “1. Introduction” and “2. Research Subject”.

2.1.Instead of naming the first section “1. Shortcomings of Traditional Tests”, I suggest the authors to change the title of this section to “1. Introduction”.

Response 2: While this particular sectioning was not particularly relevant to our paper, we drew similar inspiration and have updated the manuscript to contain a similar in-spirit organization. We have restructured the Introduction following the suggested five-paragraph framework: (i) field introduction, (ii) subject description, (iii) motivation, (iv) goals and contributions, and (v) paper structure with a clear roadmap. Regarding formal research questions (RQ1, RQ2) and systematic review methodology, we note this is a narrative review and theoretical synthesis; however, we have ensured the Introduction clearly articulates the conceptual questions guiding our analysis.

Comment 3: Form an independent “Discussion” section and within it: (i) review the most important findings in accordance with the set RQs, (ii) present the limitations, and (iii) in accordance with the findings, present perspectives and future trends.

Response 3: We have created a new Discussion section that: (i) synthesizes key findings, emphasizing VR's capacity to target vulnerable neural circuits where compensatory mechanisms cannot mask pathology; (ii) precedes a substantially expanded Limitations section; and (iii) presents future directions including standardized protocols, ecological validation, and clinical workflow integration. The Discussion includes a summary table (Table 1) mapping conventional constraints to VR innovations.

Comment 4: NOTE: The paper is difficult to read due to numerous long sentences, which greatly distracts the reader's attention; in addition, some sentences are formulated more for a journalistic concept rather than for a scientific journal, which also makes reading and understanding the manuscript difficult, so I ask the author to work on all of this.

Response 4: Readability is paramount, so we thank the reader for providing this feedback for improvement. We have made extensive edits throughout: (1) systematically shortened sentences, breaking complex constructions into simpler statements; (2) replaced rhetorical phrasings with direct scientific language; (3) reduced emphatic language in favor of precise, empirically grounded claims; and (4) eliminated redundant phrases. These changes are marked in red throughout the revised manuscript.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I am satisfied with authors' response and the revision.

Author Response

Comment 1: I am satisfied with authors' response and the revision.

Response 1:

Dear Reviewer,

Thank you for your feedback on the initial submission and for taking the time to review our revised manuscript. We appreciate your confirmation that the revisions have adequately addressed your concerns. We appreciate your patience and expertise throughout the review process.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper demonstrates that empirical evidence demonstrates the superior diagnostic sensitivity of VR and demonstrates that tasks that capture the interdependent interactions of memory, executive function, and spatial navigation can more effectively distinguish between healthy aging, mild cognitive impairment, and Alzheimer's disease. This demonstrates that rather than isolating artificial cognitive constructs, the paper reflects how cognition actually functions.
This paper adheres to the format and procedures of a peer-reviewed paper and is of relatively high quality. However, it would be helpful if the Abstract accurately describes the paper's argument in the format "This paper proposes..." This paper is accepted.

Author Response

Comment 1: This paper demonstrates that empirical evidence demonstrates the superior diagnostic sensitivity of VR and demonstrates that tasks that capture the interdependent interactions of memory, executive function, and spatial navigation can more effectively distinguish between healthy aging, mild cognitive impairment, and Alzheimer's disease. This demonstrates that rather than isolating artificial cognitive constructs, the paper reflects how cognition actually functions.

This paper adheres to the format and procedures of a peer-reviewed paper and is of relatively high quality. However, it would be helpful if the Abstract accurately describes the paper's argument in the format "This paper proposes..." This paper is accepted.

Response 1:

Dear Reviewer,

We have also updated our abstract to include "we propose" as opposed to "we conclude."

Article Menu

Beyond the Testing Room: Virtual Reality as a Paradigmatic Solution to Ecological Validity Deficits in Neuropsychological Memory Assessment

Further Information

Guidelines

MDPI Initiatives

Follow MDPI