1. Introduction
The integration of virtual reality (VR) technology in education has gained significant attention in recent years as it offers a unique opportunity for learners to engage in immersive and interactive learning experiences [
1,
2]. However, for educators, researchers, and instructional designers who are not experts in information technology, the process of selecting and implementing VR software can be daunting. This article aims to provide a non-technical guide for individuals who are responsible for designing and implementing educational programmes, such as teachers, professors, instructional designers, and curriculum developers. These individuals often face the challenge of selecting the most suitable VR software solutions for their specific instructional needs without having the ability to delve into the technical details of the software.
The increasing popularity of VR in education is driven by its potential to enhance student engagement and motivation, as well as its ability to provide a more immersive and interactive learning experience [
3]. Recent research further emphasizes the potential of digital technologies, such as 360-degree interactive videos and VR environments, in enhancing cognitive engagement and supporting sustainable teaching and learning. For instance, Shadiev et al. (2024) [
4] demonstrate that interactive 360-degree videos foster deeper cognitive processing and increased learner attention. Similarly, further research highlights how digital learning environments contribute to long-term educational outcomes and sustainability by promoting active, student-centred learning [
5].
Another key benefit of VR in education is its ability to provide a safe and controlled environment for learners to practice and develop social skills, such as communication, collaboration, and conflict resolution [
6,
7]. In the real world, it can be difficult for learners to find opportunities to practice such social skills, especially in situations that may be uncomfortable or intimidating. VR, on the other hand, allows learners to practice these skills in a virtual environment that is tailored to their needs and comfort level [
8].
To address the requirements of our target audience, we have conducted an extensive review of the literature on VR in education, as well as an analysis of the current market offerings of VR software solutions. Our research has identified several key players in the VR education market, including companies that offer a range of VR software solutions, from basic educational platforms to more advanced artificial intelligence (AI)-powered tools. We will provide a brief overview of these companies, highlighting selected key features, benefits, and limitations, as well as their potential applications in education.
Our goal is not to provide an exhaustive review of all VR software solutions available, but rather to offer a starting point for educators and researchers who are interested in exploring the potential of VR in education. By providing a concise and accessible overview of the current state of VR software solutions, we hope to facilitate the process of selecting and implementing VR software and to encourage interested parties to explore the many benefits of VR in education.
Taken together, this study aims to close the gap between technical innovation and educational usability by offering an accessible, action-oriented guide for educators and researchers interested in integrating VR into educational scenarios, more precisely into social skills training. Through a systematic, four-phase search strategy, we identified and analyzed ten commercial VR platforms using ten key evaluation criteria. By highlighting differences in, among others, communication modes, avatar customization, language support, and authoring tools, the article provides a structured overview to support informed decision-making. Unlike technical reviews, this study is tailored to non-technical users seeking practical guidance for implementing VR in diverse educational settings.
2. Theoretical Background
VR refers to a computer-generated, three-dimensional environment that enables users to engage with digital content in a highly interactive manner. The cognitive affective model of immersive learning (CAMIL) by Makransky and Petersen [
9] identifies two key elements that are essential for learning in VR: presence and agency. Presence refers to the subjective experience of being there in the virtual environment, which can enhance engagement and emotional investment in the learning process. Agency, on the other hand, describes the degree to which users perceive that they can influence the virtual world through their actions. When learners feel a strong sense of agency, they are more likely to engage in active learning processes, leading to improved comprehension and retention of knowledge. VR’s potential for learning is rooted in its ability to create immersive experiences that combine these elements. By fostering presence, learners experience a heightened sense of realism that can enhance their cognitive and emotional responses to educational content. Simultaneously, high levels of agency allow users to experiment, make decisions, and observe the consequences of their actions, which is particularly valuable in training scenarios that involve complex decision-making or procedural knowledge acquisition [
6,
10,
11]. The CAMIL [
9] has become a foundational framework for understanding how VR technologies can influence learning outcomes through the dual processes of presence and agency. Empirical studies have supported the model’s claim that higher levels of presence and agency can enhance motivation, cognitive engagement, and the transfer of learning [
12,
13]. However, while the CAMIL has been successfully applied in multiple experimental studies, most of these focus on short-term interventions or specific learning domains such as science education. There remains a need for further research that examines the model’s applicability across diverse educational settings, learner populations, and long-term instructional scenarios. Overall, the CAMIL provides a valuable theoretical foundation, but ongoing research is needed to refine its constructs and evaluate its predictive power in complex, real-world learning environments.
VR technology enables the creation of learning experiences that would otherwise be inaccessible due to logistical, financial, or safety constraints, such as practicing social skills in realistic settings, which is especially advantageous for the development of those skills, where repeated practice in authentic environments is crucial [
14]. For example, research has shown that VR-based training in communication and leadership skills leads to measurable improvements in real-world interactions [
15,
16]. Similarly, VR has been used to train healthcare professionals in patient communication, enabling them to practice sensitive conversations in a risk-free environment [
17,
18]. Furthermore, a study by Seufert et al. [
19] found that VR-based training in classroom management skills, such as classroom organization and student behaviour management, significantly improved student teachers’ competency. Moreover, a study by Hassan et al. [
20] explored the use of AI-driven talking avatars in VR for investigative interviews of children, demonstrating the potential of VR to create realistic and interactive environments for sensitive conversations. These characteristics make VR a powerful tool for education, providing unique learning experiences that are otherwise difficult, too costly, or impractical to achieve through traditional instructional methods.
In total, the technological landscape of VR systems is diverse and varies in complexity, ranging from high-end head-mounted displays (HMDs) to more accessible mobile VR solutions. High-end devices, such as Meta Quest or HTC Vive, provide full spatial tracking and realistic rendering, enabling users to move freely and interact naturally within the virtual space. These systems often require powerful computing hardware to run complex simulations, making them suitable for highly immersive experiences. In contrast, mobile VR solutions, such as Google Cardboard, use smartphones as display and processing units, making them more affordable but significantly less immersive [
21,
22]. Thus, VR hardware is often categorized based on its level of immersion. Less-immersive VR systems, such as 360-degree videos and desktop VR applications, provide a limited sense of presence as users can observe the virtual environment but have restricted interaction capabilities [
23]. In contrast, more-immersive VR systems utilize advanced HMDs with real-time motion tracking and haptic feedback, allowing for a deeper sense of embodiment and interactivity [
24]. Research has demonstrated that high-immersion VR leads to greater learning gains by enhancing engagement and cognitive absorption [
9]. However, it also presents challenges, such as increased cognitive load [
25], high costs, and the need for specialized hardware [
26], which may limit its widespread adoption in educational settings.
Despite these challenges, VR’s technological evolution continues to drive new educational applications. Studies have shown that immersive VR environments can significantly enhance the learning of procedural skills, such as surgical training [
27] or the training of automotive painting techniques [
28], as well as social skills, where authentic interaction scenarios are crucial for effective learning [
29]. Hence, the application of VR in education spans various sectors, including vocational training, school education, and higher education. In vocational training, VR has been successfully implemented to enhance practical skill acquisition in fields such as medical training and industrial maintenance. Studies indicate that medical trainees who practice procedures in VR environments demonstrate improved accuracy and confidence in real-world applications [
30,
31]. Similarly, VR-based training programmes in industrial settings allow learners to familiarize themselves with machinery and safety protocols without the risks associated with hands-on training [
32,
33]. In school education, VR is particularly valuable for teaching complex scientific learning content by enabling students to visualize and interact with abstract phenomena that would otherwise be difficult to grasp [
34,
35]. For example, research has shown that students who use VR to explore biological processes or physical simulations develop a deeper conceptual understanding compared to those using traditional instructional methods [
36]. In STEM education, in particular, VR supports the visualization of abstract scientific concepts, such as molecular structures, physics simulations, or anatomical models, enabling learners to interact with complex phenomena in an intuitive and engaging manner. For example, Pellas and colleagues [
35] highlight how immersive VR environments facilitate a deeper understanding of STEM topics by allowing students to explore virtual laboratories or conduct simulated experiments. Furthermore, VR technology can promote sustainable behaviour and climate change awareness among young learners [
37]. In higher education, universities integrate VR into curricula to provide experiential learning opportunities. Virtual field trips, laboratory simulations, and interactive case studies allow students to engage with course material in ways that traditional classrooms cannot offer. The flexibility of VR-based learning environments also supports distance education by enabling remote students to participate in shared virtual spaces [
38]. Additionally, VR has been applied in social sciences and humanities to simulate historical events, conduct virtual ethnographies, and support intercultural learning [
39,
40]. These diverse use cases underscore VR’s potential as a cross-disciplinary educational tool that supports both cognitive and affective learning objectives across educational contexts. While this literature highlights the breadth of VR’s educational potential, our article focuses specifically on the use of VR for social skills training, with an emphasis on commercially available tools that can be implemented by non-technical educators.
In addition to the diverse examples highlighting the educational potential of VR, a more critical look at the current research landscape reveals several methodological and conceptual challenges. Still, VR has not yet achieved widespread adoption in education [
26]. One major limitation is the lack of readily available and pedagogically sound VR software solutions. Many educators and institutions struggle to identify which VR applications effectively support their learning objectives. Furthermore, the rapid technological advancements in VR hardware have not always been matched by equally sophisticated educational software, leading to a gap between technological capability and practical usability in educational settings. Moreover, a majority of existing research relies on small sample sizes, short intervention durations, and context-specific implementations, limiting generalizability as well as knowledge gain for educational practice. Furthermore, most research emphasizes cognitive outcomes, while affective and social dimensions of learning, particularly in social VR environments, are underexplored. This lack of comprehensive evaluation hampers the formulation of evidence-based guidelines for educational practice.
Therefore, this article seeks to address this issue by mapping existing commercial VR providers and their solutions for training social skills. By providing a structured overview of available software solutions, this article aims to support researchers and practitioners in selecting VR applications that align with their educational goals and instructional needs.
In this regard, the emergence of social VR platforms and AI-driven avatars marks a significant shift in how VR is used in education. In social VR environments, multiple users can simultaneously interact within a virtual 3D space using an HMD. In most social VR platforms, users can create and modify their own avatars before engaging with others in the virtual space [
41]. Recent research has begun to explore the pedagogical potential of such multi-user VR spaces, particularly in fostering collaboration, empathy, and intercultural competence [
7,
20,
41,
42]. However, empirical studies on the effectiveness of social VR are still in their infancy, and questions remain regarding integration, learner support, and ethical considerations such as privacy and psychological safety. Similarly, while AI-powered dialogue systems offer personalized learning opportunities, their actual pedagogical impact and reliability require further scrutiny. Current AI models struggle with nuanced feedback, contextual understanding, and language-specific performance, particularly in non-English settings. Future studies should address these limitations by combining usability research with pedagogical effectiveness evaluations.
In summary, while the benefits and drawbacks of VR technology in educational settings have been well-documented in the research literature, researchers and practitioners in educational contexts who have determined that VR is the most appropriate tool for their specific learning environment still face a significant challenge. Without technical expertise in designing and developing their own VR environments using complex tools like Unity or Blender, they need suitable VR software that can be tailored to their specific instructional needs, learning objectives, and the needs of their learners. However, the market for VR technology is vast and complex, making it difficult to navigate. In this article, we therefore aim to address the following research question and provide researchers and practitioners with a first orientation guide.
This research question will be addressed in the following section, where we will present the results of our search on the current state of VR software solutions in social skills education and provide recommendations for researchers and practitioners.
4. Results
The following section presents two tables with providers and corresponding analysis criteria, covering technical requirements, avatar customization, training content, material adaptability, and communication features within the application. Based on our market analysis (outlined in the Methodology section), we identified ten relevant providers: (1) Bodyswaps v2.8, (2) EngageVR v4.0, (3) Mursion, (4) PIXOVR v.1.0, (5) Polycular, (6) Talespin v4.0.3, (7) 3spin v.2023.3, (8) Virbela v3.2.6, (9) VRChat v2024.4, and (10) WondaVR v.1.17.3.
Table 1 focuses on technical prerequisites and communication aspects, while
Table 2 addresses content-related factors such as avatars and training materials. Each criterion is analyzed in detail, highlighting clear distinctions and providing selected examples from the software solutions. Comparisons between the providers are then presented based on these criteria, supplemented with images sourced from the respective providers’ websites or demo versions.
4.1. Software
The criterion software (see
Table 1) specifies whether the application is a VR or a social VR environment. With reference to social scenarios, the distinguishing factor between social VR and VR is the following: while social VR environments allow multiple users to interact and communicate with each other in real time, VR environments only enable social communication with a pre-programmed or AI-driven NPCs. Bodyswaps, PIXOVR, Polycular, Talespin, 3spin, and WondaVR are focused on VR applications, whereas EngageVR, Mursion, VRChat, and Virbela are categorized as social VR applications. However, Mursion (see
Figure 1) represents a distinct software model as it involves actors who control the avatars through haptic gloves during training sessions. These actors, briefed on the specific use case, can operate multiple avatars simultaneously, providing an authentic simulation experience.
4.2. Technology
The criterion technology refers to the types of technical devices. In
Table 1, a binary distinction is made between HMD and desktop VR. The classification as desktop indicates that the software can be accessed via a laptop or computer screen, as well as on mobile devices such as tablets. Most applications are compatible with both an HMD and a desktop. Only two providers restrict their software to a single platform: Mursion to a desktop (see
Figure 2) and Polycular to an HMD. However, in the case of Polycular, this limitation applies exclusively to its Virtual Skills Lab and not to other applications, such as virtual escape games (see also
Section 4.8 Exemplary Content Elements).
4.3. Communication
During our analysis, the following question arose: how does communication with an interlocutor function within the VR environment? The criterion communication seeks to answer that question by differentiating between three types of interaction: (1) scripted communication with an NPC, (2) AI-driven communication with an NPC, and (3) communication with a real person. Scripted communication means that the dialogue between the learner and the NPC within the software is pre-programmed so that the NPC replies according to a predefined pattern. The learner must either follow scripted options or provide free answers based on speech recognition. In contrast, free and open-ended dialogues between learners and interlocutors are possible if the conversation partner is either an AI-driven NPC or a real person.
For example, in the Talespin application, the learner’s communication with an NPC takes place within the framework of a pre-established script. The dialogue follows a path determined by researchers and/or practitioners in advance. Within this dialogue path, various responses to the NPC’s statements can be provided, which the learner must follow closely. Depending on the response, a predefined reaction is triggered (see also
Section 4.8 Authoring Tool).
Applications such as Bodyswaps (see
Figure 3) extend beyond the selection of predefined response patterns, as seen in Talespin. In these systems, learners engage with a pr-programmed NPC but can communicate more freely through AI-powered speech recognition. A further enhancement in NPC interaction occurs when the NPC is fully controlled by AI, as exemplified by EngageVR (see
Figure 4).
Although EngageVR is primarily a social VR environment, their clients can design custom learning scenarios. In contrast to NPC dialogues, social VR applications enable interaction with one or more real individuals. VRChat, another social VR platform, facilitates interaction and communication between users across various digital environments, offering a more flexible and open form of communication compared to the other aforementioned approaches.
Table 1 reveals that many providers offer a combination of different communication styles. A total of 60% of the software providers incorporate AI-powered communication options. Only two out of ten providers focus on scripted dialogues with an NPC. It is worth noting that, due to the rapid development of AI, this may change quickly depending on the provider.
4.4. Language of Communication
An important factor for VR social skills training is the language in which the training scenarios are available (see
Table 1). This criterion describes whether English, German, or other languages are supported for conducting various VR scenarios. VR platforms offered by British or American companies without social VR functionality tend to have a more limited language availability. For instance, Bodyswaps, PIXOVR, and Talespin provide numerous training scenarios available in English. Language customization is, according to the respective providers during sales discussions, possible for an additional fee. However, it remains unclear to what extent speech recognition and NPC speech output can be adjusted to the user’s language. In contrast, if a provider, such as WondaVR, utilizes AI in its VR application instead of pre-programmed NPCs, the range of available languages is significantly expanded as it is working with a backend large language model. It should be noted that social VR applications, like Virbela, have the advantage that no predefined language is required as communication occurs between two or more real individuals.
4.5. Avatar
The section on avatar (see
Table 2) describes, on the one hand, to what extent the avatars depicted in the VR application appear photorealistic or cartoonish. Photorealistic avatars are characterized by proportional body parts and a realistic depiction of the face and clothing. In contrast, cartoonish avatars may appear disproportionate in terms of physical stature and may look inauthentic due to features such as oversized eyes. On the other hand, this criterion also addresses whether the available avatars have a human or animal-like appearance. A total of 60% of the providers, including Polycular and PIXOVR (see
Figure 5), offer photorealistic characters, while the remaining 40% use cartoonish figures, which can vary significantly in terms of their level of detail.
While Virbela features highly cartoonish characters, the figures from 3spin or WondaVR (see
Figure 5) seem to fall between the boundaries of cartoonish and photorealistic. Regardless of the avatar style offered by the provider, all software solutions provide the option to use human-like characters. Additionally, VRChat allows users to utilize animal-like characters as well.
4.6. Avatar Creation
The criterion of
avatar creation examines in greater detail whether an application allows users to create or modify avatars and if so then to what extent (see
Table 2). Three levels are distinguished: (1) not available, (2) selection of predefined avatars, and (3) modification of individual features. It is important to note that avatar creation is considered available only if it is integrated within the application itself, without requiring modifications or making an additional effort to implement it in the software.
While Bodyswaps and Mursion do not offer avatar creation, five providers, including Polycular and Talespin, provide a limited selection of predefined characters. In Polycular, users can apply basic adjustments such as selecting the gender of their counterpart. In contrast, Talespin offers a more diverse range of options allowing users to choose not only between male and female characters but also to differentiate them by age, skin tone, and clothing style.
EngageVR, Virbela, and WondaVR offer avatar creation, allowing users to modify more individual features. In Virbela, users can only personalize their own character, whereas EngageVR and WondaVR also enable the creation of virtual counterparts. Consequently, the customization options vary significantly depending on the provider. In Virbela, users have extensive customization options for their avatars. The avatar creation process is structured into five adjustable categories: (1) body, (2) hair, (3) face, (4) clothing, and (5) accessories. Within the body category, users can modify both height and weight, as well as specific attributes such as hip width, chest size, abdominal shape, and muscle definition. Additionally, skin tone can be finely adjusted using a coordinate system, where the Y-axis ranges from cooler to warmer tones and the X-axis from darker to lighter shades. For hair, users can choose from various hairstyles and colours. Facial hair can be added or removed, and eyebrows as well as eyelashes can be customized. The face section allows for the selection of different head shapes, eye shapes and colours, nose structures, lip and mouth styles, ear shapes, and makeup options. Further refinements can be made using coordinate-based adjustment systems. Additionally, users are able to define the avatar’s apparent age. Clothing customization follows a three-step process. Users can select from a variety of tops, pants or skirts, and shoes, with further options to modify individual garment colours. The available styles range from casual and sporty to formal and elegant. Finally, in the accessories section users can equip their avatar with various items, including hats, glasses, earrings, scarves, and belts, providing additional personalization options.
4.7. Contentlibrary
The criterion content library assesses whether the respective providers offer existing so-called off-the-shelf content from which clients can select their training materials. Initially, a dichotomous distinction is made between availability and non-availability. In a subsequent step, exemplary content from selected providers is presented to give researchers and practitioners an initial overview of potential training scenarios (see
Table 2).
A total of 70% of providers offer a content library. Only Virbela, VRChat, and WondaVR do not. Particularly in the cases of Virbela and WondaVR, the primary focus is on the creation of customized VR content (see
Section 4.8). At a first glance, the homepages of the individual providers give the impression that primarily immersive soft skills trainings, such as leadership skills, are offered. Therefore, the following will take a detailed look into the content libraries provided.
Mursion, for instance, provides VR trainings programmes in leadership development, sales excellence, customer service excellence, healthcare, and education. The leadership development programme aims to teach individuals how to provide effective feedback and confidently adapt to various situations. Training participants are expected to develop the ability to connect with their teams to achieve better results. The sales excellence training, in turn, offers users practical recommendations on sales techniques, coaching, and customer orientation. The customer service excellence module supports users in acquiring conflict resolution strategies and practicing empathetic communication. The healthcare module offers scalable, resource-efficient training to improve clinical reasoning and patient interaction skills. Likewise, the education module provides aspiring teachers with immersive, multimodal practice in classroom management, student engagement, and communication.
Similarly, 3spin offers courses designed to enable learners to conduct goal-oriented customer interactions. For example, 3spin provides ready-made courses on sales training, customer success training, and leadership training. Each of these three courses is further divided into three modules, which focus on key elements of the respective competencies in more detail. All training modules offer around 80 min of learning time, divided into three learning units and an exam.
In the sales training, users are taught how to improve their own sales performance. This is achieved through the modules handling objections, sustainable negotiation, and closing deals. After completing the training, participants should have acquired the skills necessary to conduct successful negotiations aimed at strengthening customer relationships. The customer success training focuses on the long-term nurturing of customer relationships. In the modules dealing with complaints, mastering stressful situations, and understanding customer needs, participants develop skills that enable them to address the concerns of various customers with precision and patience, thus intensifying business relationships. In the leadership training, participants are meant to learn how to collaborate more successfully. This will be taught through the modules fundamentals of communication, resolving conflicts and giving feedback. The goal of the training is to develop a pleasant and positive work atmosphere as well as a company culture through purposeful and appreciative communication. In addition to social skills training, providers like PIXOVR also offer other training scenarios, such as workplace safety trainings. Here, users can choose from up to 57 different courses on occupational safety. These include courses on first aid or fire protection, as well as more specialized fields such as gas inspection or electrician training.
4.8. Authoringtool
The final criterion we have identified is the availability of an authoring tool. In this context, an authoring tool refers to a feature integrated within the software that enables clients to independently modify existing VR content or create entirely new VR scenarios. A central aspect of our definition of an authoring tool is that users should not require programming or design expertise to develop and customize content. In VRChat, for example, it is possible to create content using Unity and subsequently upload it to the software. In contrast, providers such as Polycular and Mursion develop the desired VR scenarios themselves without granting users the ability to modify the content independently. According to our definition, these three providers therefore do not have their own authoring tool. Talespin serves as an example of how an integrated authoring tool can be designed. With this tool, clients can customize both the virtual environment and the avatars within it by selecting from various templates. Simple modifications, such as adjusting the seating position of avatars, are also possible. Beyond visual customization, clients can fully adapt their training scenarios to match their specific topics and instructional goals. It is also possible to create your own dialogue paths. Within these paths, the dialogue between the NPC and the learner can be individually adjusted using response and reaction fields. It is important to note that the dialogue itself is not AI-driven but follows the predefined dialogue path. In addition to the design of training conversations, users can modify and adjust the facial expressions and gestures of virtual avatars without requiring programming skills.
Similarly, WondaVR enables clients to personalize environments and avatars using templates and avatar creation tools. However, unlike Talespin, WondaVR allows for significantly more flexible and individualized training scenarios due to its AI-driven dialogue system.
In summary, seven of the ten providers offer an authoring tool. The degree of customization varies, ranging from simple modifications of the environment and dialogue to more complex adaptations and AI-driven scenarios (see
Table 2).