Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Virtual World Platforms: A Comparative Analysis of Quality According to ISO 25010 Standards and Maturity Models

Virtual Worlds 2026, 5(1), 2; https://doi.org/10.3390/virtualworlds5010002

by Fabiola Sáez-Delgado¹

, Javier Mella-Norambuena²

, Paulo Coronado³

, Yaranay López-Angulo⁴

, Guillermo Ramírez⁵

, María Badilla-Quintana⁶

and Andrés Chiappe^7,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Virtual Worlds 2026, 5(1), 2; https://doi.org/10.3390/virtualworlds5010002

Submission received: 1 December 2025 / Revised: 22 December 2025 / Accepted: 24 December 2025 / Published: 5 January 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors,

congratulations on the choice of the topic, it is highly valuable and novel. I have several remarks that could enhance the understanding of the research procedure and improve the quality of presentation.

The abstract could start more directly. The opening sentences feel somewhat expansive and delay the moment when the reader learns what the study actually contributes. A more concise entry point would help foreground the research problem.
Parts of the introduction might be streamlined. The background is rich, but a few passages could be tightened to keep the focus on the problem statement and avoid dispersing attention across too many contextual elements. Maybe titles of the sections in the Introduction could be removed? E.g., you present a topic sentence and then you start a new paragraph with a new thought.
Readers would benefit from a short overview of the maturity levels (NM1–NM5) within the main text. The full definitions in the supplementary files are detailed, but a compact summary in the article would make the evaluation logic easier to grasp without constantly referring to external materials.
It might be useful to show one concrete example of how a platform was scored. Even a brief walkthrough for a single case would make the assessment process more tangible and would help readers understand how the criteria were applied.
The rationale for selecting the 23 platforms could be stated a little more explicitly. Although the link to Schultz’s taxonomy is mentioned, explaining the inclusion/exclusion logic would strengthen the methodological transparency.
Given how quickly metaverse technologies change, a short remark on the model’s adaptability would be valuable. A sentence or two on how the framework might be updated as platforms evolve would situate the study in a more dynamic technological context.
The manuscript could more clearly articulate what the proposed model adds beyond existing approaches. A slightly sharper contrast with previous frameworks would help underscore where the novelty lies—whether in the hybridization, the operational depth, or the evaluation pipeline.
The profile of the experts could be described in a bit more detail. It would help to know more about the range of their experience or perspectives, so that the reader can better judge the robustness of the expert-based ratings.
A brief reflection on the robustness of the results would enhance the discussion. For instance, whether the ranking of platforms remains stable if weights or aggregation assumptions change.
Figure 3 is a bit illegible. Could you enhance its quality?

I suggest minor revision.

Author Response

We present a detailed description of the responses to each of the comments made by the reviewers of the paper entitled: “Virtual world platforms: A comparative analysis of quality according to ISO 25010 standards and maturity models”. All revisions to the manuscript are highlighted in light blue.

Response to Reviewer 1’s Comments

Comment: The abstract could start more directly. The opening sentences feel somewhat expansive and delay the moment when the reader learns what the study actually contributes. A more concise entry point would help foreground the research problem.

Response to reviewer's comment:

The paragraph was originally as follows:

Metaverses integrate technologies that push the boundaries of human experience. Their potential to transform areas such as education, mental health, and social-emotional support has sparked growing academic interest. However, despite their expansion, one of the main challenges for their implementation lies in the proliferation of metaverse platforms with diverse characteristics, architectures, and purposes, which complicates the task of informed technology selection. Given this diversity, a systematic approach is required to compare platforms based on functional and non-functional attributes relevant to specific application contexts. The objective of this study was to propose a model for evaluating the quality of metaverse-type platforms based on a hybridization of the aspects defined in the ISO/IEC 25000 family of standards, a maturity model extracted from recent literature, and the Metagon metaverse characterization typology. Using this model, 23 metaverse platforms were evaluated using a hierarchical ranking strategy with tolerance. The results show that platforms such as Decentraland and Roblox achieve the highest levels of maturity (ML5), although open-architecture platforms demonstrated superior structural robustness in comparative tie-breakers. The results provide a taxonomy of characteristics refined and validated by experts that were used in the evaluation of a set of platforms, offering a rigorous and reproducible classification useful for guiding technology adoption decisions in emerging contexts. The discussion presents the basis for future studies focused on the evaluation of specific categories, such as educational, therapeutic, or social interaction platforms.

The paragraph has been revised as follows:

The rapid proliferation of metaverse platforms with heterogeneous architectures, functionalities, and purposes poses a significant challenge for informed technology selection. Consequently, there is a need for structured evaluation approaches that enable comparison based on functional and non-functional attributes relevant to specific application contexts. The objective of this study was to propose a model for evaluating the quality of metaverse-type platforms based on a hybridization of the aspects defined in the ISO/IEC 25000 family of standards, a maturity model extracted from recent literature, and the Metagon metaverse characterization typology. Using this model, 23 metaverse platforms were evaluated using a hierarchical ranking strategy with tolerance. The results show that platforms such as Decentraland and Roblox achieve the highest levels of maturity (ML5), although open-architecture platforms demonstrated superior structural robustness in comparative tie-breakers. The results provide a taxonomy of characteristics refined and validated by experts that were used in the evaluation of a set of platforms, offering a rigorous and reproducible classification useful for guiding technology adoption decisions in emerging contexts. The discussion presents the basis for future studies focused on the evaluation of specific categories, such as educational, therapeutic, or social interaction platforms.

Comment: Parts of the introduction might be streamlined. The background is rich, but a few passages could be tightened to keep the focus on the problem statement and avoid dispersing attention across too many contextual elements. Maybe titles of the sections in the Introduction could be removed? E.g., you present a topic sentence and then you start a new paragraph with a new thought.

Response to reviewer's comment: We sincerely thank the reviewer for this observation. The Introduction has been revised according to your suggestions. The section headings within the Introduction have been removed, as they were found to fragment the argumentative flow. Repetitive sentences have been eliminated, and some passages have been shortened to maintain focus on the problem statement. Finally, the paragraph discussing the benefits of the metaverse in education has been significantly condensed, as it was previously too long.

The paragraph was originally as follows:

The metaverse has positioned itself as an immersive digital space that combines augmented reality (AR), virtual reality (VR), artificial intelligence, and collaborative 3D environments. Its application in education is transforming the ways of teaching, learning, and connecting with knowledge. Among its main benefits and importance in education are the followings : (a) providing immersive and experiential learning, allowing students to have more meaningful learning experiences by interacting with realistic virtual environments, facilitating the understanding of abstract or complex content by being able to experience it firsthand, as well as promoting discovery learning and the simulation of real scenarios without the risks or costs involved in the physical world [6]; (b) providing a personalized and adaptable learning path, i.e., the metaverse offers the possibility of adjusting the difficulty, pace, and resources according to the needs of each student, promoting inclusive education, as experiences can be designed that are accessible to people with different abilities and learning styles [7, 8]; promoting collaboration and networking, this is possible because virtual spaces allow synchronous and asynchronous interactions between students and teachers from different parts of the world, as well as enhancing real-time collaborative work, with the possibility of building, solving problems, or designing projects together in the same digital environment, helping to develop networking, digital communication, and global citizenship [9]; (d) facilitating didactic-pedagogical innovation, given that teachers can explore new active methodologies (gamification, project-based learning, role-playing, situated learning), opening up the possibility of creating virtual laboratories, digital campuses, immersive museums, and interactive classrooms [10]; (e) fostering the development of digital and 21st-century skills, since, by interacting in metaverse environments, students develop key skills such as critical thinking, creativity, problem solving, digital collaboration, and technological literacy, thus preparing future professionals to perform in contexts of digital transformation, typical of the knowledge society and the 4.0 economy [11]; (f) promoting the democratization of access to knowledge and educational equity, as this technology makes it easier for people in remote geographical locations or with infrastructure limitations to access high-level educational experiences, reducing gaps in access to physical laboratories, cultural experiences, or professional practices by replicating them in a virtual environment [12].

The paragraph has been revised as follows:

The metaverse has positioned itself as an immersive digital space that combines augmented reality (AR), virtual reality (VR), artificial intelligence, and collaborative 3D environments: (a) provides immersive and experiential learning through interaction with realistic virtual environments, facilitating the understanding of abstract or complex content and the simulation of real scenarios without the risks or costs of the physical world [6]; (b) enables personalized and adaptable learning paths by adjusting difficulty, pace, and resources to students’ needs, thereby supporting inclusive education for diverse abilities and learning styles [7, 8]; (c) promotes collaboration and networking through synchronous and asynchronous interactions in shared virtual spaces, supporting collaborative work and the development of digital communication and global citizenship skills [9]; (d) facilitates didactic and pedagogical innovation by enabling the use of active methodologies and the creation of virtual laboratories, digital campuses, immersive museums, and interactive classrooms [10]; (e) fosters the development of digital and 21st-century skills, including critical thinking, creativity, problem solving, digital collaboration, and technological literacy [11]; (f) promotes the democratization of access to knowledge and educational equity by enabling high-level educational experiences for learners in remote or resource-limited contexts [12].

Comment: Readers would benefit from a short overview of the maturity levels (NM1–NM5) within the main text. The full definitions in the supplementary files are detailed, but a compact summary in the article would make the evaluation logic easier to grasp without constantly referring to external materials.

Response to reviewer's comment: Thank you for this valuable suggestion. We add a table that summarizes the interpretation of the maturity levels (ML1–ML5) across the seven evaluation categories of the proposed model. Detailed criteria, indicators, and decision rules used to assign maturity levels to individual attributes are reported in the Supplementary Materials.

Comment: It might be useful to show one concrete example of how a platform was scored. Even a brief walkthrough for a single case would make the assessment process more tangible and would help readers understand how the criteria were applied.

Response to reviewer's comment: We appreciate the reviewer’s observation regarding this point. A brief description illustrating the process of evaluating a platform and a given attribute has been added to the results section. The added text is included below.

As an illustrative example, the evaluation process of the OpenSimulator platform is presented. To simplify the explanation, only the assessment of the Dynamic Interactivity attribute is reported. This attribute is one of the nine attributes belonging to the Technical Aspects category. Overall, a total of 35 attributes were evaluated for each platform.

During the researchers’ training session, a conceptual alignment was carried out to establish a shared understanding of what Dynamic Interactivity entails. This step was necessary because one evaluator initially regarded the term as pleonastic, whereas the analytical framework of the study distinguishes interactivity as the mere capability for bidirectional communication, without necessarily characterizing the quality, richness, or complexity of such interactions. Once consensus on the concept was reached, the criteria associated with each maturity level were explained and clarified.

For this specific attribute, a set of elements was defined to operationalize concepts with an inherent degree of ambiguity, such as simple navigation, noticeable latency, dynamic simulation, complex interaction, and real-time transformation. Subsequently, each evaluator interacted independently with the platform and assessed the attribute by progressing sequentially through the maturity levels in ascending order. A maturity level was assigned only when all criteria corresponding to that level were fully satisfied, and the evaluation process concluded when a level could no longer be assigned.

For the selected attribute, Evaluators 1 and 2 assigned a maturity level of ML4, while Evaluator 3 assigned ML3. As full consensus was not achieved, the final maturity level was determined by simple majority, given that the maximum divergence between ratings was limited to one level. Had the divergence exceeded this threshold, a dedicated consensus session would have been required.

Comment: The rationale for selecting the 23 platforms could be stated a little more explicitly. Although the link to Schultz’s taxonomy is mentioned, explaining the inclusion/exclusion logic would strengthen the methodological transparency.

Response to reviewer's comment: Thank you for this helpful suggestion. In response, we expanded the Case Selection section, adding the following text:

Additional inclusion criteria included search engine visibility (Google and DuckDuckGo), the size and activity of the user base, evidenced by active users or engagement in discussion channels such as Discord or Reddit, user interaction dynamics, and the availability of platform updates released within the year preceding this comparative analysis.

Comment: Given how quickly metaverse technologies change, a short remark on the model’s adaptability would be valuable. A sentence or two on how the framework might be updated as platforms evolve would situate the study in a more dynamic technological context.

Response to reviewer's comment: We appreciate the reviewer’s remark on this issue. In response, we added text to the Hybrid Artifact Design section to explain how we recommend approaching this aspect.

Although the evaluation criteria were defined to be as technology-agnostic as practicable, a moderate risk of obsolescence affecting 14 attributes was identified, attributable to their dependency on hardware components or software standards with short life cycles. Consequently, the continued validity of these evaluation criteria should be periodically reviewed at intervals of 6 to 12 months.

Comment: The manuscript could more clearly articulate what the proposed model adds beyond existing approaches. A slightly sharper contrast with previous frameworks would help underscore where the novelty lies—whether in the hybridization, the operational depth, or the evaluation pipeline.

Response to reviewer's comment: In response we added a paragraph to articulate what the proposed model adds beyond existing approaches.

While existing evaluation frameworks for virtual worlds and metaverse platforms typically rely on single aggregation strategies, fixed maturity models, or descriptive taxonomies, the proposed model advances the state of the art through a hybrid, operational, and evaluable evaluation pipeline. Unlike prior approaches that focus either on conceptual maturity levels or isolated quality dimensions, this work integrates geometric aggregation (radar-area analysis), statistical latent modeling (PCA-based synthesis), and structural typology (k-means clustering) within a unified and reproducible framework.

The novelty of the proposed model lies not merely in combining methods, but in their complementary operational roles: radar geometry captures global magnitude of maturity, PCA reveals latent multivariate structure and trade-offs, and clustering introduces a discrete, comparative typology of platform profiles. This hybridization enables both continuous scoring and categorical interpretation, which are rarely addressed simultaneously in existing frameworks.

Comment: The profile of the experts could be described in a bit more detail. It would help to know more about the range of their experience or perspectives, so that the reader can better judge the robustness of the expert-based ratings.

Response to reviewer's comment: In response, we add a table with detailed description of the experts.

Table 2. Profile of domain experts involved in the evaluation process.

Relevant Expertise	Years of Experience	Professional Background	Primary Domain	Expert ID
Gameplay programming and system development in user-generated platforms, including Roblox (Luau scripting) and Minecraft	>15 years	Engineer, Master’s degree holder in computer sciences, and Software Engineering Specialist with experience in virtual world platforms.	Software Engineering	E1
Development and management of immersive environments using OpenSimulator, including scenario configuration, and integration with learning management systems (LMS)	>15 years	Engineer, Master’s degree holder in Software Engineering, PhD candidate in Educational Technology, and specialized in LMS, and Virtual Worlds in education	Software Engineering	E2
Research, peer-reviewed publications, and evaluation of digital learning technologies, with a focus on pedagogical effectiveness	>10 years	Researcher and academic in educational technology	Educational Technology	E3

Comment: A brief reflection on the robustness of the results would enhance the discussion. For instance, whether the ranking of platforms remains stable if weights or aggregation assumptions change.

Response to reviewer's comment: In response,

While this metric provides a valuable summary of performance, it is sensitive to the arbitrary ordering of axes. Therefore, to prevent classification artifacts derived from axial arrangement, the evaluation model triangulates this geometric indicator in the third stage and compares the platforms using three complementary approaches: k-means Cluster Analysis, Principal Component Analysis (PCA), and the geometric Radar Area method.

Comment: Figure 3 is a bit illegible. Could you enhance its quality? I suggest minor revision.

Response to reviewer's comment: The quality of Figure was enhanced.

We sincerely appreciate the reviewer’s careful reading, constructive comments, and the time and effort devoted to evaluating our manuscript. Their observations have significantly contributed to enhancing not only its clarity, accuracy, and conceptual coherence, but also its methodological rigor, overall consistency, and quality, thereby strengthening the manuscript’s potential contribution to advancing research in the field.

Authors.-

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Thanks for the submission;

Abstract says “23 metaverse platforms were evaluated”, while Method says “a set of 22 metaverses” and Table 5 lists 23 (P1–P23). This must be made consistent throughout. The abstract frames the work as broadly about metaverse platforms, yet the Purpose Statement later narrows to socio-emotional support for teachers. That context is not mentioned in the abstract.Phrases like “push the boundaries of human experience” and “offering a rigorous and reproducible classification useful for guiding technology adoption decisions in emerging contexts” could be made more precise and less promotional. Also, You don’t mention that 35 attributes across 7 categories were evaluated – this is a big contribution.

The introduction spends many paragraphs on generic benefits of the metaverse in education, but the core contribution is a quality/maturity evaluation model for platforms. keep the educational section, but shorten and refocus it to explicitly motivate why educational and socio-emotional use cases require robust quality assessment (e.g., security, accessibility, persistence, governance).

Later, the Purpose Statement narrows to “socio-emotional support experiences for teachers,” but your evaluation does not actually focus on teacher-specific or socio-emotional features.

Either broaden the purpose statement to “educational and socio-emotional” in general and reduce the very specific teacher emphasis; or

Include at least one concrete teacher-focused scenario in the discussion to show how the model answers that question.

ISO 25010, the metaverse maturity model, and Metagon typology only appear in detail in the Methods. A short conceptual overview in the introduction would help frame the contribution

You describe the study as “mixed research design,” but data are almost entirely expert-based ratings and secondary data. There is no real primary quantitative dataset from end users.

Consider describing it as “Design Science Research with an expert-based quantitative evaluation” rather than “mixed methods,” unless you explicitly add qualitative components (e.g., interviews)

Under Phase 2, you say “a set of 22 metaverses” but Table 5 lists 23 platforms. Table 6 and 8 also reflect 23. Make the number consistently 23 and revise all related text.

You alternate between ML1–ML5 and NM1–NM5. Pick one notation and use it consistently throughout text, figures, and tables.

You propose using “Cohen’s Quadratically Weighted Kappa and ICC” but then only report κ = 0.7643 and mention a threshold issue (21% < 0.6) with consensus workshops.

You talk about radar areas, clusters, and PCA, and later mention equal weighting of metrics in the limitations. However, the exact formula for the final classification is not fully explicit

Table 7 lists clusters labelled 6, 3, 2, 4, 1, 5 (non-sequential) with assigned relevance values 5–0.

Table 8 then shows a “Cluster” column with values like 5, 3, 4, 1, 2, 0, This is confusing and potentially misleading: the cluster numbers in Table 7 and Table 8, and in the narrative, must be reconciled.

In Table 6, “Radar Area (%)” and “Score (1–5)” are presented, but the derivation of the Score is not clearly explained (though it seems quintile-based).

In Table 8, you list a PCA score (e.g., 1, 0.81, −0.74) without explaining what the direction and magnitude mean (e.g., PC1 loading orientation).

Add 2–3 sentences explaining how positive vs negative values are interpreted and why they matter for quality classification.

Consider trimming slightly to avoid repetition with the Discussion and to end on a forward-looking note about how future context-specific weighting (e.g., for education, therapy, industry) can be implemented using your model.

There are some conceptual drift between “metaverse in education / socio-emotional support” and “general platform quality”. Also, inconsistencies and ambiguities in numbers (22 vs 23 platforms), cluster labels, maturity level notation, and radar/score aggregation.

The educational/socio-emotional support angle is not fully integrated into results and discussion; it mostly appears in the aim.

Author Response

Response to Reviewer 2’s Comments

Comment: Abstract says “23 metaverse platforms were evaluated”, while Method says “a set of 22 metaverses” and Table 5 lists 23 (P1–P23). This must be made consistent throughout. The abstract frames the work as broadly about metaverse platforms, yet the Purpose Statement later narrows to socio-emotional support for teachers. That context is not mentioned in the abstract.Phrases like “push the boundaries of human experience” and “offering a rigorous and reproducible classification useful for guiding technology adoption decisions in emerging contexts” could be made more precise and less promotional. Also, You don’t mention that 35 attributes across 7 categories were evaluated – this is a big contribution.

Response to reviewer's comment: We thank the reviewer for this observation. The Methods section, which previously mentioned 22 metaverses, has been corrected to 23 metaverses. Additionally, the phrases suggested by the reviewer have been removed. For example, the following sentence was removed: “These forms expand the boundaries of physical experience and, in their most sophisticated manifestations, allow experiences previously restricted to the extrasensory realm.” In the case of the next phrase, it has been reformulated to communicate the idea more precisely.

The paragraph was originally as follows:

“The results provide a taxonomy of characteristics refined and validated by experts that were used in the evaluation of a set of platforms, offering a rigorous and reproducible classification useful for guiding technology adoption decisions in emerging contexts”.

The paragraph has been revised as follows:

“The results provide a taxonomy of characteristics refined and validated by experts and used in the evaluation of the analyzed platforms, resulting in a reproducible classification that enables systematic comparison across different application contexts”.

Finally, the reviewer’s suggestion has been incorporated into the abstract. Specifically, the following sentence was added:

“The proposed model operationalizes 35 evaluation attributes grouped into seven categories, enabling a comprehensive assessment of metaverse platforms”

Comment: The introduction spends many paragraphs on generic benefits of the metaverse in education, but the core contribution is a quality/maturity evaluation model for platforms. keep the educational section, but shorten and refocus it to explicitly motivate why educational and socio-emotional use cases require robust quality assessment (e.g., security, accessibility, persistence, governance).

Response to reviewer's comment: We thank the reviewer for this valuable suggestion. In the revised manuscript, the Introduction has been streamlined to reduce the emphasis on general benefits of metaverse technologies in education. The educational background has been refocused to explicitly motivate the need for robust quality and maturity assessment of metaverse platforms, particularly in relation to aspects such as security, accessibility, governance, persistence, and interoperability.

These revisions strengthen the conceptual link between educational use cases and the core contribution of the paper, namely the proposed quality and maturity evaluation model.

The paragraph on the benefits of metaverses in education was significantly condensed, as it was previously too long.

The paragraph was originally as follows:

The paragraph has been revised as follows:

The metaverse has positioned itself as an immersive digital space that combines augmented reality (AR), virtual reality (VR), artificial intelligence, and collaborative 3D environments: (a) provides immersive and experiential learning through interaction with realistic virtual environments, facilitating the understanding of abstract or complex content and the simulation of real scenarios without the risks or costs of the physical world [6]; (b) enables personalized and adaptable learning paths by adjusting difficulty, pace, and resources to students’ needs, thereby supporting inclusive education for diverse abilities and learning styles [7, 8]; (c) promotes collaboration and networking through synchronous and asynchronous interactions in shared virtual spaces, supporting collaborative work and the development of digital communication and global citizenship skills [9]; (d) facilitates didactic and pedagogical innovation by enabling the use of active methodologies and the creation of virtual laboratories, digital campuses, immersive museums, and interactive classrooms [10]; (e) fosters the development of digital and 21st-century skills, including critical thinking, creativity, problem solving, digital collaboration, and technological literacy [11]; (f) promotes the democratization of access to knowledge and educational equity by enabling high-level educational experiences for learners in remote or resource-limited contexts [12].

Además, se eliminó todo lo referido con los profesores o socioemocional. Se mantuvo la ortiención sobre educación en general. A continuación se proporciona dos ejemplos concretos de estas mejoras en el articulo.

Example 1

The paragraph was originally as follows:

Metaverses integrate technologies such as virtual reality (VR), augmented reality (AR), and mixed reality (MR), creating persistent, interactive, multisensory virtual environments that expand the limits of human experience. Their potential to transform areas such as education, mental health, and social-emotional support has sparked growing academic and commercial interest [1].

The paragraph has been revised as follows:

Metaverses integrate technologies such as virtual reality, augmented reality, and mixed reality to create interactive and persistent digital environments, whose use and potential application in educational contexts have generated growing academic and commercial interest [1].

Example 2

The paragraph was originally as follows:

“…supporting socio-emotional learning processes for teachers in immersive educational environments…”

The paragraph has been revised as follows:

“…supporting learning processes and interaction dynamics in immersive educational environments…”

Comment: Later, the Purpose Statement narrows to “socio-emotional support experiences for teachers,” but your evaluation does not actually focus on teacher-specific or socio-emotional features.

Response to reviewer's comment: We appreciate the reviewer for pointing out this inconsistency. In the revised version of the manuscript, references to teacher-specific experiences and socio-emotional support have been removed from the Purpose Statement and related sections. The objective is now consistently formulated in terms of evaluating metaverse platforms, accurately reflecting the scope and application of the proposed evaluation model.

Comment: Either broaden the purpose statement to “educational and socio-emotional” in general and reduce the very specific teacher emphasis; or Include at least one concrete teacher-focused scenario in the discussion to show how the model answers that question.

Response to reviewer's comment:

Following the reviewer’s recommendation, we opted to broaden the purpose statement rather than introduce a narrowly focused evaluation that was not explicitly operationalized in the proposed model. In the revised manuscript, the study’s objective is consistently framed as the evaluation of metaverse platforms for educational applications in general, in full alignment with the scope of the model and the conducted analysis. To acknowledge the relevance of specific educational use cases without overstating the model’s claims, we added a brief paragraph in the Discussion highlighting how selected evaluation dimensions (e.g., governance, identity management, accessibility, and persistence) may support future, more targeted analyses. This revision preserves conceptual coherence while clearly positioning such applications as directions for future research rather than as part of the current evaluation.

The following paragraph was added to the Discussion section:

Although the proposed model was applied at a general platform level, several of its dimensions, such as governance, identity management, accessibility, and persistence, are relevant to future analyses of specific educational use cases. For example, platforms with mature governance and identity mechanisms may better support sustained learning communities, while accessibility and interoperability are essential for inclusive educational initiatives. These considerations suggest that the model can serve as a foundation for more targeted evaluations in subsequent studies.

Comment: ISO 25010, the metaverse maturity model, and Metagon typology only appear in detail in the Methods. A short conceptual overview in the introduction would help frame the contribution.

Response to reviewer's comment: In response,

To frame the contribution of this study, the proposed quality assessment model integrates three complementary conceptual foundations. First, the ISO/IEC 25010 standard provides the reference framework for defining and organizing functional and non-functional quality characteristics relevant to virtual world platforms. Second, a metaverse maturity model is incorporated to support the assessment of platforms across progressive levels of development (ML1–ML5), enabling the estimation and comparison of maturity status rather than isolated quality attributes. Third, the Metagon typology is used to contextualize the evaluation by characterizing platforms according to their purpose and technological orientation, ensuring that quality attributes are interpreted in relation to distinct application contexts. The integration of these perspectives enables a systematic and multidimensional comparative analysis of platforms.

Comment: You describe the study as “mixed research design,” but data are almost entirely expert-based ratings and secondary data. There is no real primary quantitative dataset from end users. Consider describing it as “Design Science Research with an expert-based quantitative evaluation” rather than “mixed methods,” unless you explicitly add qualitative components (e.g., interviews).

Response to reviewer's comment: We thank the reviewer for this clarification. We agree that the term “mixed methods” may be misleading in the context of the present study, as the empirical component is based on expert-driven quantitative ratings and secondary platform data rather than on primary qualitative data collected from end users. In the revised manuscript, we have therefore reframed the methodological approach as Design Science Research with an expert-based quantitative evaluation. This revised description more accurately reflects the nature of the research process, the role of expert judgment, and the type of data analyzed in this study.

The paragraph was originally as follows:

This study was developed using a mixed research design, organized into two se-quential phases that respond to the two main objectives. The first phase focused on building a quality assessment model for metaverses, using the Design Science Research (DSR) approach. The second phase consisted of the empirical application of this model to a set of 22 metaverse platforms to validate its usefulness and generate a comparative analysis.

The paragraph has been revised as follows:

This study was developed using a research design organized into two sequential phases that respond to the two main objectives. The first phase focused on building a quality assessment model for metaverses, using the Design Science Research (DSR) approach. The second phase consisted of the empirical application of this model to a set of 23 metaverse platforms using an expert-based quantitative evaluation to validate its usefulness and produce a comparative analysis.

Comment: Under Phase 2, you say “a set of 22 metaverses” but Table 5 lists 23 platforms. Table 6 and 8 also reflect 23. Make the number consistently 23 and revise all related text.

Response to reviewer's comment: We thank the reviewer for this observation. All instances where “22 metaverses” appeared have been corrected to “23 metaverses.” The entire manuscript has been reviewed in relation to this issue to ensure consistency throughout.

Comment: You alternate between ML1–ML5 and NM1–NM5. Pick one notation and use it consistently throughout text, figures, and tables.

Response to reviewer's comment: We picked ML1 - ML2 notation, and the document was reviewed to ensure consistency.

Comment: You propose using “Cohen’s Quadratically Weighted Kappa and ICC” but then only report κ = 0.7643 and mention a threshold issue (21% < 0.6) with consensus workshops.

Response to reviewer's comment: We appreciate the reviewer’s comment on this aspect of the manuscript. Due to the translation process, only the Kappa index was reported in the document, but this was used in the document validation activity. The ICC was used to validate the assessment, and its explanation has now been included in the body of the article.

For the Quality Analysis Methods component, a staged inferential approach was adopted. First, three evaluators independently assigned maturity levels (ML1–ML5) using the indicator-based scoring rules and source matrices. Inter-rater reliability was assessed on the independent, pre-consensus ratings using complementary statistics. Global reliability across the three evaluators was estimated using the intraclass correlation coefficient (ICC), adopting a two-way random-effects model with absolute agreement and average measures (ICC(2,k)). Individual cases exhibiting a maximum divergence greater than one maturity level were examined in structured consensus workshops. The resulting adjudicated matrix was subsequently used for quantitative aggregation and multivariate analysis.

…The inter-rater reliability analysis yielded an ICC(2,k) for absolute agreement of 0.763, with a 95% confidence interval of [0.733, 0.790], indicating good reliability among the evaluators. The narrow confidence interval reflects the stability of the estimate given the large number of evaluated targets (n = 805). Inter-rater reliability was assessed using the original evaluation matrices produced independently by the experts, prior to any consensus process. Although approximately 21% of the attributes exhibited discrepancies greater than one maturity level between evaluators, these discrepancies were intentionally preserved for the ICC computation in order to assess the intrinsic reliability of the evaluation instrument. Consequently, consensus workshops were conducted to review these cases and generate an aggregate matrix for subsequent quantitative analysis (see Appendix F of the dataset [47]).

Comment: You talk about radar areas, clusters, and PCA, and later mention equal weighting of metrics in the limitations. However, the exact formula for the final classification is not fully explicit.

Response to reviewer's comment: We appreciate the reviewer’s feedback on this matter. We adjust the classification model to use a lexicographical hierarchical strategy. Therefore, the limitation was removed and a section was created in the document to explain the classification method in detail.

To avoid the multicollinearity issues inherent in summing latent variables with observed metrics, a lexicographical hierarchical strategy with tolerance was adopted. This method prioritizes the latent structural maturity of the platform (Cluster Value) while using functional performance (Radar Area) as the primary differentiator. However, to account for measurement variability, a tolerance threshold was introduced.

Let be the set of platforms. The ranking order is defined such that for any two platforms :

Where:

: Structural Tier (0–5), derived from K-Means clustering on principal components.

: Functional Performance (%), derived from the radar chart area (Table 6).

: Latent Robustness Score (PCA Score), representing structural consistency.

: Tolerance Threshold, set at 5.0%.

A threshold of 5% was selected to account for feature measurement variability. Within this margin, functional differences are considered negligible, and priority is given to the platform with superior latent structural robustness (). This ensures that architecturally solid platforms are favored over those with superficial feature bloat in cases of functional parity.

Comment: Table 7 lists clusters labelled 6, 3, 2, 4, 1, 5 (non-sequential) with assigned relevance values 5–0. Table 8 then shows a “Cluster” column with values like 5, 3, 4, 1, 2, 0, This is confusing and potentially misleading: the cluster numbers in Table 7 and Table 8, and in the narrative, must be reconciled.

Response to reviewer's comment: We thank the reviewer for bringing this issue to our attention.The column in Table 8 (now Table 10) has been correctly named Cluster Relevance Value. Likewise, the relationship between Cluster and assigned value has been clarified in the text.

Comment: In Table 6, “Radar Area (%)” and “Score (1–5)” are presented, but the derivation of the Score is not clearly explained (though it seems quintile-based).

Response to reviewer's comment: We acknowledge the reviewer’s comment. The score is indeed derived using quintiles. This clarification has been added to the text.

Comment: In Table 8, you list a PCA score (e.g., 1, 0.81, −0.74) without explaining what the direction and magnitude mean (e.g., PC1 loading orientation). Add 2–3 sentences explaining how positive vs negative values are interpreted and why they matter for quality classification.

Response to reviewer's comment: We thank the reviewer for raising this point. In response, the text clarifies how the PCA Score is calculated, and the note in the table explains how PCA is used in the classification.

PCA-Based Synthetic Indicator

Principal Component Analysis (PCA) was employed to extract a latent representation of the multidimensional maturity space. PCA was applied to the covariance matrix of the ordinal maturity levels (NM1–NM5 mapped to 1–5) after mean centering.

The PCA score for platform ? was computed as:

con

Where:

is the score of platform ? on the j-th principal component,
is the proportion of variance explained by component ?.

The first five components were retained, jointly explaining approximately 82% of the total variance, which provides a balance between parsimony and information preservation.

Note: Platforms are ranked primarily by a structural tier value based on Cluster Relevance Value (see Table 9). Within tiers, Radar Area determines the rank, except where the difference is within the 5% tolerance threshold. In these instances (e.g., Decentraland vs. Roblox; Vircadia vs. Bigscreen), the PCA Score was used as the decisive tie-breaker, favoring the platform with greater latent architectural robustness.

Comment: Consider trimming slightly to avoid repetition with the Discussion and to end on a forward-looking note about how future context-specific weighting (e.g., for education, therapy, industry) can be implemented using your model.

Response to reviewer's comment: We thank the reviewer for this suggestion. In response, we have added a forward-looking paragraph at the end of the Future lines of research section explaining how the proposed model can be adapted to specific contexts through domain-specific weighting schemes. This addition clarifies how the model can support tailored evaluations for education, therapy, and industrial applications, while preserving its core structure and methodological rigor.

Building on this potential, the proposed model is structured to support the derivation of domain-specific weighting schemes without modifying its core architecture. Given its compatibility with the SQuaRE framework and its formulation as a metamodel, domain adaptation can be systematically addressed by defining a set of domain-specific use cases, which act as a reference framework to assess the suitability and relevance of the existing evaluation categories, attributes, and maturity indicators. Within this process, use cases are used to assess whether each element of the general model is applicable, meaningful, or non-essential for the domain under consideration. Based on this analysis, domain-specific weighting profiles can be established by assigning relative importance to the selected attributes and indicators, while the overall structure of the model remains unchanged.

For instance, educational scenarios typically emphasize accessibility, collaborative interaction, and pedagogical affordances; therapeutic contexts prioritize usability, privacy, emotional safety, and system stability; and industrial applications focus on scalability, interoperability, performance efficiency, and security. This approach enables the model to function as a stable and reusable evaluation framework, supporting multiple domain-specific assessment instances in a consistent and comparable manner. Accordingly, future research can focus on formally defining and empirically validating such domain-specific weighting profiles, extending the applicability of the model while preserving its methodological coherence and alignment with international quality evaluation standards.

In addition, repetitions in the Discussion have been removed. An example is provided below.

The paragraph was originally as follows:
The first objective of this study was to propose a model for evaluating the quality of metaverses. As a result, unlike previous studies that focused on lists of functionalities, this study validated a model for evaluating metaverse platforms by integrating three approaches: the ISO/IEC 25010 software quality standard, the Weinberger and Gross [32] maturity model, and the Metagon typology by Schöbel et al. [33]. This hybridization provides a multidimensional assessment that ranges from technical aspects (e.g., system scalability) to governance attributes (policy development) and user experience (accessibility and inclusion). Thus, this proposal overcomes traditional approaches focused solely on lists of functionalities by offering a multidimensional perspective that comprehensive considers technical, governance, and user experience aspects.

Therefore, it contributes to multidimensional integration and methodological robustness. On the one hand, the combination of the ISO/IEC 25010 standard with a maturity model provides the proposed model with a formal and gradual basis for technical evaluation, while, on the other hand, the Metagon typology adds specificity by characterizing types of metaverse from perspectives such as scalability, accessibility, inclusion, and development policies. This hybrid strategy is aligned with emerging approaches that promote more holistic and adaptive assessments in immersive technological environments [48].

The paragraph has been revised as follows:

The first objective of this study was to propose a model for evaluating the quality of metaverses. As a result, unlike previous studies that focused on lists of functionalities, this study validated a model for evaluating metaverse platforms by integrating three approaches: the ISO/IEC 25010 software quality standard, the Weinberger and Gross [32] maturity model, and the Metagon typology by Schöbel et al. [33]. This hybridization provides a multidimensional assessment that ranges from technical aspects (e.g., system scalability) to governance attributes (policy development) and user experience (accessibility and inclusion). Thus, this proposal overcomes traditional approaches focused solely on lists of functionalities by offering a multidimensional perspective that comprehensive considers technical, governance, and user experience aspects.

Therefore, it contributes to multidimensional integration and methodological robustness. This hybrid strategy is aligned with emerging approaches that promote more holistic and adaptive assessments in immersive technological environments [48].

Comment: There are some conceptual drift between “metaverse in education / socio-emotional support” and “general platform quality”. Also, inconsistencies and ambiguities in numbers (22 vs 23 platforms), cluster labels, maturity level notation, and radar/score aggregation.

Response to reviewer's comment: We sincerely thank the reviewer for this valuable suggestion. The manuscript has been thoroughly revised to address the conceptual alignment and numerical inconsistencies highlighted. Regarding the conceptual drift between “metaverse in education / socio-emotional support” and “general platform quality,” we have aligned the focus throughout the document. Specifically, references to teachers and socio-emotional support have been removed, and the discussion now emphasizes education in general. This adjustment ensures consistency between the paper’s background, objectives, and scope. Detailed explanations of these improvements can be found in the responses to the reviewer’s specific comments.

Concerning the inconsistencies and ambiguities related to the number of platforms (22 vs. 23), cluster labels, maturity level notation, and radar/score aggregation, all issues have been carefully addressed. The error regarding the number of platforms has been corrected to 23 and has been consistently applied throughout the manuscript. Other numerical and labeling discrepancies have also been resolved, ensuring clarity and coherence. Detailed responses to these points are provided in the specific comments addressed to the reviewer.

Comment: The educational/socio-emotional support angle is not fully integrated into results and discussion; it mostly appears in the aim.

Response to reviewer's comment: We thank the reviewer for this observation. Indeed, the educational/socio-emotional support angle was not fully integrated into the results or discussion. As previously noted in our responses to earlier, more detailed comments, this issue has been addressed throughout the manuscript. Specifically, all references to socio-emotional support have been removed to ensure consistency. The discussion now retains only the general educational context, which is included in the Future Lines of Research section, aligning the focus with the study’s objectives and the evidence presented.

Authors.-

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you

Article Menu

Virtual World Platforms: A Comparative Analysis of Quality According to ISO 25010 Standards and Maturity Models

Further Information

Guidelines

MDPI Initiatives

Follow MDPI