1. Introduction
Recent research findings published in recent years indicate that soft skills—such as communication, collaboration, emotional management, and adaptability to change—have become critical not only in social life but also in the labor market [
1,
2,
3]. The outbreak of the pandemic in 2019 revealed new challenges and uncertainties faced by both employers and employees. These emerging global challenges and uncertainties have reshaped the perception of human skills and soft competencies, necessitating adaptation to dynamic changes in work execution, team management, and career development.
According to the Technavio report [
4] published in March 2025, the global soft skills training market is projected to grow by USD 207.8 billion between 2025 and 2029, representing a compound annual growth rate (CAGR) of approximately 34.5%. Businesses have recognized that in the era of automation and advancing artificial intelligence, soft skills are the key differentiating factor among employees and teams, enhancing their work efficiency and adaptability to changing business conditions. The growing importance of these skills underscores the need for soft skills development, driven by the profound transformation of the labor market [
5]. The evolving expectations of employers have also influenced the evolution of soft skills training, which varies in scope and format depending on the industry.
In the IT (Information Technology) sector, hybrid online training formats—which combine self-paced online modules with live virtual or in-person sessions—are popular, due to their adaptability to individual learner needs. Given the prevalence of remote work, crucial skills include communication, collaboration, and managing cultural diversity. IT companies invest heavily in soft skill development, especially in communication and project management to support teams. Trends driving this include cost-effective e-learning and immersive technologies like VR and AR (Technavio). Due to low adoption barriers, IT and tech sectors lead in integrating new edutech for upskilling, emphasizing interpersonal communication and team management in simulated environments.
In the financial sector, skills related to risk management, decision making, work ethics, and client communication are essential, driven by regulatory demands. Training is focused on empathy and active listening to foster trust and personalize client interactions [
6]. With automation and AI, adaptability and stress management are becoming core competencies. The rise of remote work and globalized services increase the importance of relationship management and cross-team collaboration [
7]. Environmental, Social, and Governance (ESG) considerations further highlight the need for responsible decision making and transparent communication, as financial professionals are increasingly expected to align their actions with sustainability and ethical standards.
In healthcare, empathy, stress management, and interpersonal communication are vital for patient care. Soft skills training improves patient satisfaction and reduces medical errors (with up to 70% of such errors linked to communication failures) [
5,
8]. Simulation-based modules help practice conflict resolution and teamwork, enhancing patient care quality and staff well-being [
9,
10,
11]. Immersive technologies are increasingly used not only for medical procedures but also for soft skills development [
12,
13].
In the retail sector, training emphasizes empathy, adaptability, and communication to improve customer service and retention. Personalized interactions and active listening strengthen customer relationships [
14,
15]. Stress management is crucial during high-pressure situations [
16,
17]. Creativity and effective teamwork support dynamic and customer-focused environments.
In the industrial sector, teamwork, communication, and conflict resolution are key soft skills, developed through simulation games and workshops [
14]. With industry 4.0 and automation, soft skills complement technical expertise. Manufacturers value collaboration and clear communication for smooth operations and efficiency.
Table 1 below provides a more in-depth overview of soft skills training trends across various economic sectors. It highlights the distinct requirements, training priorities, and technological adaptations shaping employee development in each industry.
 As evidenced above, the scope and format of soft skills training vary across industries. In IT and finance, hybrid online training formats dominate. In healthcare, intensive workshops combined with mentoring programs facilitate practical skill application. In retail and manufacturing, short modular training programs are prevalent, allowing easy implementation without disrupting daily operations [
14]. Soft skills training is becoming increasingly integrated into employee development processes across various industries, with innovative technologies supporting the advancement of these skills. For this reason, development process designers seek new solutions to enhance the effectiveness of training and maximize their integration into the functioning of companies and organizations. Currently, many training methods focus primarily on theoretical instruction, which raises questions about their effectiveness. Traditional approaches, mainly based on workshops and lectures, limit opportunities to practice essential soft skills in simulated operational environments. For example, negotiation training often involves discussing techniques and strategies along with brief group exercises, without the chance to test acquired skills in realistic scenarios with individuals outside the training group. Employees who acquire new knowledge may struggle to apply it in real negotiations due to a lack of practical experience. Additionally, emotions, stress, and time pressure in new situations increase the likelihood of failure during the initial attempts to use new skills, leading to employee discouragement. Another disadvantage of traditional training formats is their declining effectiveness. Previously used training formats are losing efficiency and impact on new generations. Younger generations of employees, particularly Millennials and Generation Z, expect modern learning methods that are engaging and tailored to their individual needs. For instance, customer service training often relies on presentations and theoretical worksheets, whereas Millennials and Generation Z expect simulations and exercises in realistic situations and environments. Traditional and still widely used training formats tend to have a uniform structure that does not consider different learning preferences. For example, in team communication training, some participants may prefer practical exercises, such as conflict simulations, while others might favor case study analysis. The lack of flexibility in adjusting teaching methods to different learning styles reduces training effectiveness. It is also important to note that soft skills, such as conflict resolution, require extensive practice, particularly with individuals outside the training group. However, single-session workshops do not provide the opportunity for repeated and broad practice in simulated operational conditions. For example, in the financial sector, where conflict situations are frequent, employees could benefit from an e-learning platform offering cyclical, varied, and recurring conflict management exercises. Unfortunately, traditional training methods rarely consider skill reinforcement and retention, leading to rapid knowledge loss. Moreover, participants often learn in abstract conditions that do not reflect their daily work realities. For instance, team management training conducted in a classroom setting does not account for the actual challenges and pressures that managers face. In many cases, training participants receive insufficient feedback on their performance. For example, after completing a team communication course, employees may not know which aspects of their communication were effective and which need improvement. Implementing modern training platforms with automatic feedback and progress analysis could significantly enhance learning effectiveness. Lastly, training programs are often mandatory, negatively impacting participant motivation and causing them to view training as a mere formality. In the case of mandatory empathy training in healthcare, a lack of participant engagement may result in medical staff failing to develop essential competencies.
The entire analysis above raises a key research question: To what extent can immersive technologies, such as VR, combined with AI and personalized feedback, enhance the effectiveness of soft skills training compared to traditional methods? This main question can be further explored through the following subquestions:
- How does immersive technology influence the effectiveness of soft skills development? 
- What role does AI particularly in delivering personalized feedback play in shaping learning outcomes? 
- Is the integration of AI and VR more effective than conventional classroom-based training in developing soft skills? 
These questions underscore the need for a comprehensive review of current literature to build a strong academic foundation for the use of AI and VR in soft skills training.
  2. Literature Review
Historically, vocational education emphasized manual and technical skills aligned with industrial production, particularly during the first and second industrial revolutions [
18]. With the shift toward a knowledge economy and the onset of the third industrial revolution, there was increasing demand for cognitive, interpersonal, and problem-solving competencies [
19,
20]. This evolution paralleled the rapid development of science and technology each influencing the other and shaping educational practices accordingly.
Traditional instructional models often fail to provide the realism, scalability, and personalization required in today’s fast-paced and globally connected work environments, prompting the need for more immersive and adaptive approaches.
One early response to the need for more experiential learning was the emergence of immersive technologies. The concept of VR dates back to the 1950s when Morton Heilig developed the Sensorama, a prototype of a multisensory immersive device [
21], followed by the Telesphere Mask, the first stereoscopic head-mounted display [
22]. In 1968, Ivan Sutherland and Bob Sproull introduced the Sword of Damocles, the first computer-based HMD system capable of basic user interaction [
23]. These early innovations laid the foundation for immersive education tools. In the 1970s and 1980s institutions like MIT and NASA advanced VR research through systems such as LEEP and VIEWlab [
24,
25]. Around the same time, Jaron Lanier’s VPL Research contributed tools like the DataGlove, EyePhone, and AudioSphere, helping to establish the modern vision of VR [
26,
27,
28].
The broader adoption of digital technologies in the 1990s and early 2000s catalyzed educational transformation, including the rise of blended learning, which integrates face-to-face and distance learning modalities. Initially focused on content delivery, these systems evolved to support complex competencies such as communication and leadership [
29,
30,
31,
32,
33,
34].
Research by Garrison and Kanuka [
18] and Graham [
19] emphasized that blended learning not only improves engagement and reflection but also supports the development of soft skills, which are increasingly essential in the modern workforce.
Simultaneously, simulation-based learning gained traction in domains like medicine and aviation, offering safe, repeatable experiential environments. Virtual patients enabled diagnostic practice in healthcare [
35], while flight simulations became standard in pilot training [
36]. These examples demonstrate how immersive high-fidelity environments can train both technical procedures and real-time decision making.
In recent years, VR has emerged as particularly effective in soft skills training by providing an embodied, emotionally engaging experience. Studies by Slater and Sanchez-Vives [
37] and Parong and Mayer [
38] showed that VR enhances presence and emotional engagement—factors linked to behavioral change. VR-based scenarios in leadership, public speaking, and conflict resolution have been shown to improve decision making, empathy, and self-efficacy [
35,
39,
40]
Recent technological advances have increased the realism and effectiveness of VR-based training. Improvements in motion tracking, high resolution displays, haptic feedback, and spatial audio have significantly enhanced user immersion [
41,
42]. Foveated rendering, which optimizes graphical performance based on eye gaze, enables greater visual detail without overloading system resources [
43], allowing for emotionally engaging, high-fidelity learning environments [
44]. Recent work by Gehrke introduced a reinforcement-learning-driven framework. This system significantly improved user engagement and reduced cognitive overload by aligning multisensory feedback with learner states. Such reinforcement learning mechanisms allow VR systems to autonomously tailor the training experience in real time offering personalized and responsive feedback loops [
45].
Meanwhile AI has introduced new possibilities for personalization and adaptive learning. Intelligent Tutoring Systems dynamically adjust instruction based on learner profiles, enhancing educational outcomes [
46]. Voice-based conversational agents and chatbots support natural language interaction, improving realism and continuity in simulated scenarios [
47,
48].
Emotion recognition plays a critical role in making VR systems emotionally adaptive. Recent research including a comprehensive review by Pereira et al. [
49] has highlighted the growing effectiveness of deep learning methods in recognizing emotions based on facial expressions and body posture. The study systematically analyzed 77 papers, proposing a taxonomy that included expression type, dataset context, and DL models categories such as CNNs, region-based networks, and Vision Transformers. Results showed a clear trend toward hybrid and transformer-based architectures, which improve recognition accuracy in real-word settings. The field is rapidly evolving with ongoing challenges, like cultural differences, occlusions, and the need for better annotated datasets. Emotion recognition enabled by affective computing techniques allows avatars to adapt their responses based on users’ emotional states, further increasing engagement and perceived authenticity [
50,
51].
Recent studies have validated the effectiveness of AI-enhanced VR environments. For example, AI-supported leadership training in VR environments led to improved decision making and empathy compared to traditional video instruction [
52]. In healthcare simulations, emotion-aware avatars increased empathy and prosocial behavior among participants [
53]. Across domains, commonly used models include reinforcement learning for feedback optimization, convolutional neural networks for affect detection, and scripted or AI-driven dialogue management systems for interpersonal interaction [
54,
55,
56].
While previous studies have explored AI and VR applications independently, few have investigated the integration of both in fully immersive, emotionally adaptive training systems. This study addresses that not very well filled gap by focusing on real-time interaction, feedback, and affect-sensitive simulation in soft skills development.
In summary, the evolution from industrial training to AI- and VR-enhanced immersive learning reflects both technological progress and pedagogical shifts from content-centered to learner-centered approaches. Immersive tools offer promising solutions to the limitations of traditional soft skills training. The integration of VR and AI in training has received considerable attention, yet remains a rapidly evolving and fragmented field. This study contributes by bridging these domains.
  3. VR as an Innovative Method for Soft Skills Training
The authors of this article decided to create a training simulator in the field of soft skills using the form of a virtual avatar, AI, and a speech synthesizer. For this purpose, a solution for the VR Meta Quest 3 goggles was developed. The choice of this particular model was made due to their growing popularity, high quality of immersion, mobility, and price availability. Compared to some other VR sets, Meta Quest 3 works wirelessly, which eliminates the need to use a computer or cables—this is important in training environments where freedom of movement and ease of implementing technology counts. Thanks to its high resolution, hand tracking sensors, and extensive developer support, Meta Quest 3 was the optimal tool for use in the described project. It is also worth noting that according to data from November 2023, the Meta Quest 3 was used by 5.1% of SteamVR users—a sharp increase of 5.05 percentage points in just one month following its launch [
57]. Additionally, Meta Quest 3 users show a higher level of engagement and use the application more often compared to owners of other models, making it an attractive choice for companies investing in the development of soft skills through immersive VR training. Soft skills training using artificial intelligence and virtual reality is an innovative approach in the field of interpersonal skills development. Traditional methods, especially for younger generations, do not provide participants with the opportunity to practically apply acquired knowledge in realistic, non-linear scenarios. The use of AI and VR allows for the creation of immersive environments in which participants can safely practice and improve their skills in simulated situations similar to real professional challenges. As indicated by the market analysis carried out by e-learning.pl [
14], at the time of writing the article, there were no commercial solutions on the market offering exactly what the authors propose in this article (understood as a complete system ready for sale). The VR solutions known to the authors and used by companies for training in the field of soft skills are mainly based on linear scenarios. Among the solutions available for sale, one can find linear scenarios using recorded videos with actors, simulated non-linear environments with a text interface, or traditional training from web browsers transferred to VR. Test providers have provided solutions using AI to create non-linear adaptive scenarios, but they have not included a realistic avatar, speech synthesizer, and realistic voice interface. Companies are increasingly using VR to train employees, especially in areas requiring practical skills, such as operating equipment or developing soft skills. For example, companies such as UPS, Walmart, and Volvo have implemented VR training, which has proven to be more effective than traditional methods [
58]. The IDC report predicts that global deliveries of AR/VR headsets will increase by 41.4% in 2025, driven by lower prices and the integration of artificial intelligence functions [
59]. These global trends clearly indicate that the combination of VR and AI in training is a topic worth considering, which will develop intensively in the coming years. Nevertheless, there are numerous academic and open-source solutions available that provide some components of the system described by the authors and found in the literature review—such as motion recognition, voice interface, adaptive feedback, realistic avatars, and others.
VR technology refers to a system that enables the creation and exploration of computer-generated, interactive three-dimensional environments that either simulate reality or offer entirely fictional spaces. By integrating hardware and software, VR immerses users in a digital world through visual and auditory experiences.
Key components of VR systems include specialized VR goggles, which display stereoscopic images, and motion controllers, which track and replicate hand movements within the virtual environment. Advanced graphic engines, such as Unity 3D and Unreal Engine, are commonly used to develop these virtual spaces, ensuring high-quality rendering and interactive functionality.
VR simulations provide participants with the opportunity to test themselves in realistic scenarios, fostering skill development within a controlled environment. Additionally, well-structured VR training modules support adaptive learning, adjusting the pace and content of learning to the individual needs of each participant. By leveraging HR data analysis and dynamically adapting training content, employees can learn at their own pace while focusing on the most relevant skills. This approach enhances knowledge retention and ensures a higher level of engagement among participants. VR delivers an immersive experience that engages participants on both emotional and cognitive levels. Realistic simulations create an authentic sense of presence, increasing concentration and long-term memory retention. Furthermore, the VR environment isolates learners from distractions such as mobile phones, emails, or office noise, allowing them to fully focus on the training. Learning through experience in a VR setting could be significantly more effective than passive knowledge acquisition, as participants immediately practice real-world professional situations, accelerating the learning process. Moreover, VR engages multiple senses—sight, hearing, and sometimes even touch—which reinforces memory retention. This multisensory engagement enhances the overall effectiveness of training, making VR an invaluable tool in the development of soft skills. By offering interactive, scenario-based learning, VR not only improves competency acquisition but also fosters a deeper understanding of complex interpersonal dynamics, ultimately preparing employees for real-world challenges in their professional roles.
As Dubiel et al. (2025) [
60] emphasized, virtual reality has emerged as a promising technology for developing soft skills in vocational education, offering immersive and controlled environments that traditional methods often fail to replicate. Their systematic review of 33 studies reveals a growing diversity in the use of VR for interpersonal, cognitive, and emotional competencies. The authors identify key trends in targeted skills, technological approaches, teaching strategies, and assessment methods. The following summary synthesizes the main features and capabilities of existing VR solutions, illuminating both their potential and current limitations, as summarized in 
Table 2.
As Dubiel pointed out, VR uses two main technologies to present content: 3D space, which increases interactivity, freedom of movement, dynamic scenarios, and a high level of immersion, and 360-degree video, which allows for realistic visualizations. The latter is easier to produce, but its interactivity raises doubts. She also pointed out the limitations of the solutions studied:
- Lack of adaptation to specific professional groups—for example, teachers, doctors, and leaders have different needs; 
- Little research on effectiveness in business environments, onboarding, and leadership; 
- Overly general scenarios—applications often fail to take into account the specifics of professional roles and situations; 
- A small number of solutions based on long-term analysis and behavior transfer; 
- Very few solutions implement real-time feedback based on user behavior, and adaptive systems supported by AI remain largely absent from the existing tools 
The vast majority of solutions she analyzed utilized predefined choices or system responses, without real-time voice communication. Only one solution featured a unique moderator panel that allowed for scenario adjustments during the simulation. Most 3D VR solutions utilize avatars representing other participants or NPCs (non-player characters). In the case of 360° video, the trainer or interlocutor’s character is played by an actor but is not interactive.
As shown in the comparison presented in 
Table 3, our VR solution distinguishes itself in several key areas. Unlike the majority of existing systems, it enables real-time voice interaction, integrates AI to dynamically adapt the scenario based on user input, and provides real-time feedback. These capabilities support more personalized and context-specific training experiences. Additionally, while most systems reviewed by Dubiel et al. offer generic scenarios, our platform is designed to be tailored to specific professions and their communication contexts (e.g., teachers, medical staff, team leaders).
While this paper focuses on the functional and experiential aspects of AI-driven components in our VR system, future research will involve a deeper technical evaluation of emotion detection accuracy, feedback responsiveness, and speech interface performance.
  4. Meta Quest 3—Hardware Specifications and Capabilities for Training Environments
The Meta Quest 3 is a standalone virtual-reality (VR) and mixed-reality (MR) headset designed to deliver immersive reality experiences (see 
Figure 1). Its advanced hardware and enhanced performance make it a suitable platform for training environments, simulations, and educational applications [
61]. The most important hardware parameters include [
18,
19]:
- Processor: Qualcomm Snapdragon XR2 Gen 2, providing improved performance and efficiency for handling complex VR simulations; 
- Display: Dual LCD panels with a resolution of 2064 × 2208 pixels per eye, offering a 4K+ experience with a high pixel density for sharper visuals; 
- Refresh Rate: Up to 120 Hz; 
- Field of View (FOV): Approximately 110 degrees; 
- Optics: Pancake lenses, reducing device weight and increasing clarity while minimizing distortion; 
- Inside-Out Tracking: six-degrees-of-freedom (6DoF) tracking without the need for external sensors, allowing natural movement in virtual space; 
- Wireless and Wired PC VR Support: Compatible with Meta Link and Air Link, allowing for a tethered or wireless connection to a PC for high-end VR applications; 
- Memory and Storage: Available in 128 GB and 512 GB versions, with 8 GB RAM for efficient multitasking and performance in demanding VR scenarios; 
- Battery Life: Approximately 2–3 h of continuous use. 
Meta Quest 3 headset hardware features offer several advantages relevant to immersive learning. The implementation of pancake lenses (compared to older Fresnel systems) contributes to reduced visual distortion and eye strain, which is especially beneficial during extended training sessions. This supports sustained learner engagement and comfort factors critical for knowledge retention in VR.
The device’s support for six-degrees-of-freedom (6DoF) tracking allows users to move freely and interact manually within the virtual environment. This is particularly important in training scenarios involving soft skills development, such as role-playing, negotiation, or teamwork where spatial positioning and non-verbal cues play a key pedagogical role.
Additionally, the Quest 3’s improved processing power and high resolution passthrough enhance the realism of fully immersive training scenarios. These improvements support the design of more responsive, dynamic, and context-rich learning environments allowing for real-time feedback and adaptive instruction.
By aligning technical features with pedagogical design, the use of this headset helps optimize both immersion and learning outcomes which is especially valuable in applied training contexts.
  5. Simulated Immersive VR Environment
We present an innovative and original virtual training environment (VR) designed as a three-dimensional interactive space that replicates realistic conditions for meetings and negotiations. The developed VR system enables users to participate in simulated interactions under conditions that closely resemble real-world scenarios, enhancing the effectiveness of the training process and ensuring compliance with simulation-based operational standards. The designed training is mainly aimed at professionally active people aged 22–45 (but older people can also use it). There are no restrictions on the gender of people using this training, but there is a slight overrepresentation of people working in environments oriented towards teamwork or working with people (e.g., HR, sales, medicine). The method of developing interactions and scenarios is adapted to people with basic or moderate experience with VR technology, but not necessarily previously participating in training using it. The main assumptions and goal of the developed solution is the development of interpersonal skills in distributed teams, preparation for conflict management and training in empathy, active listening, and communication in multicultural environments.
To ensure a high level of immersion, the training environment has been designed as a virtual office room, with dimensions and spatial layout reflecting a real-world conference space. This environment includes standard equipment such as a desk, chair, and whiteboard, as well as additional details like paintings, decorative plants, and furnishings to create a natural and realistic setting. This approach allows users to intuitively navigate the simulated environment, increasing both comfort and engagement during the training session.
One of the key elements of the proposed VR solution is a realistic character model that serves as a virtual trainer, as shown in 
Figure 2. This character enables users to engage in interactive dialogues, simulating realistic negotiation scenarios. The virtual trainer is equipped with an integrated response recognition system, allowing for the dynamic adaptation of the training process based on the decisions made by the user. The avatar in soft skills training plays a key psychological, educational, and neurological role, as it promotes more authentic social interactions. The avatar influences emotional and behavioral reactions, which become more similar to real ones [
61]. Without an avatar, the brain is not activating the above mechanisms, and the training would become artificial. The lack of a face or gestures dehumanizes interactivity, limiting the effectiveness of training [
62]. Thanks to the use of avatars, the participant can experiment with different communication styles or behavioral strategies without the risk of social consequences. The avatar can also assume different identities (e.g., gender, age), which helps to better empathize with the role and conduct dialogue appropriately to the recipient. Seeing their interlocutor, the participant becomes more engaged, takes the task more seriously, and enters the role more quickly—which is conducive to learning through experience [
63].
In the first version of our environment, the virtual trainer is represented as a man with an athletic build and short light brown hair. His realistically rendered physique aims to enhance user immersion and improve the quality of interaction in the VR training. The character is dressed in a light blue, fitted shirt with rolled-up sleeves and dark jeans, giving him a professional yet casual appearance, suitable for a business environment. In the background, modern black-and-white architectural photographs are visible, complementing the realistic atmosphere of the office space.
Future development of the virtual training environment includes expanding the diversity of avatars to enhance inclusivity and user engagement. Planned improvements involve introducing a wider range of character models, incorporating different skin tones, facial features, hairstyles, and body types to better reflect real-world diversity. Additionally, customizable clothing options and adaptive facial expressions will be implemented to increase personalization and realism. These advancements aim to ensure that users can interact with a virtual trainer that feels more relatable and representative of different professional and cultural backgrounds, further enhancing immersion and user experience in the training simulations [
37].
The operation of the interaction system is detailed in a block diagram (see 
Figure 3), which illustrates how user input is processed, how the virtual trainer generates responses, and how the training scenario adapts in real time. This model allows for dynamic adjustments to the course of training based on participant decisions, enhancing realism and learning effectiveness.
The developed training scenario focuses on improving negotiation skills, providing users with the opportunity to practice communication strategies, argumentation, and decision making in a realistic virtual environment. The system enables the simulation of various negotiation situations, allowing users to explore different approaches to business conversations and analyze the effectiveness of their actions based on the received feedback. Communication between user and avatar takes place in natural language via the headset’s built-in microphone, which is activated by pressing the designated button on the controller. This activation mechanism ensures the microphone only records intentional speech, minimizing the capture of unwanted or accidental sounds.
To ensure access control and personalized user experiences, the designed VR application includes a login module, shown in 
Figure 4. This feature allows for user identification, enabling tracking of participant progress and adjusting training scenarios to their individual needs.
Additionally, the user management system enables the assignment of access to specific training modules only to selected individuals. This allows administrators to control which user groups have access to particular training sessions, which is especially crucial for corporate training, sectors requiring high data confidentiality, and specialized training programs.
This flexible approach to access management facilitates the personalization of training experiences, optimizing the learning process, and adapting the program to the specific requirements of an organization or training institution.
  6. Advanced AI-Driven Virtual Training System: Integration of OpenAI and Cloud-Based Technologies
The technologies implemented in our VR training solution are deeply integrated with the most modern LLM models and cloud-based services, primarily leveraging OpenAI’s language processing tools. This approach enables a highly interactive, adaptive, and immersive learning experience, where real-time voice interaction enhances user engagement and simulates human-like dialogue dynamics.
The training scenario is designed as a decision tree model, which simulates complex, multipath interactions. This tree is loaded into the OpenAI ChatGPT-4o language model through an API (Application Programming Interface), allowing for the dynamic adaptation of the training session based on the user’s responses.
      
- At the beginning of the session, the virtual trainer introduces itself and outlines the training plan. 
- As the user progresses, their responses determine the next steps, ensuring personalized learning paths tailored to their interactions. 
- The virtual trainer dynamically adjusts the tone, complexity level, and content of its responses, enabling realistic negotiation scenario simulations and providing a fully interactive learning experience. 
To manage the flow of each training scenario, we represent the entire session as a JSON-encoded decision tree, in which every node corresponds to a distinct conversation branch (e.g., negotiation tactic, feedback prompt, or follow-up question). At runtime, the VR client extracts the active node—comprising the scenario identifier, current difficulty level, and most recent trainee response—and incorporates this state into a structured prompt. This prompt is divided into three logical parts:
We first supply a “system” message embedding global metadata: the scenario version, intended learning objective. This ensures the language model understands the overall training framework before generating a response.
- 2.
- Decision-Tree State 
Next, an “assistant” message conveys the precise point within the decision tree. By providing the model with the serialized node data—such as the branch index, available response options, and expected next steps—we grant it full visibility into the conversation’s branching structure. The model can, therefore, produce replies that align exactly with the designed scenario logic.
- 3.
- User Utterance 
Finally, the “user” message contains the speech-to-text transcript of the trainee’s spoken input. This raw user text is integrated last so that the model’s response directly addresses the trainee’s lates utterance while remaining grounded in the predefined scenario context.
This three-stage prompt-engineering approach—often referred to as “API prompting”—allow us to maintain tight control over the conversation flow while still leveraging ChatGPT’s generative capabilities. By separating global context, decision tree state, and user input into distinct messages, we avoid overloading the model with unnecessary data at each turn and ensure that every response is both contextually coherent and pedagogically relevant. Continuous updates to the decision tree, based on user choices, keep the dialogue dynamic and fully aligned with our learning objectives.
One of the core features of our VR training environment is real-time speech interaction, enabling seamless and natural communication between the user and the virtual trainer. The pipeline for voice processing and AI-driven responses operates as follows:
- Voice Input Capture—The user’s voice is recorded via a high-fidelity microphone embedded in the VR headset; 
- Speech-to-Text Conversion—The captured audio is processed using OpenAI. This model provides highly accurate transcriptions; 
- Context-Aware Language Processing—The transcribed text is sent to ChatGPT, where:
		   - ○
- The model interprets the response within the broader context of the conversation and the predefined training scenario. 
- ○
- The decision tree logic is applied to determine the appropriate response, ensuring logical flow and adaptive guidance. 
 
- Text-to-Speech Synthesis—Once the response is generated, it is converted back into natural-sounding speech; 
- Delivery to the User—The synthesized response is played in real time through the VR headset’s audio output, ensuring a fluid, uninterrupted, and immersive conversational experience. 
The integration of all these tools takes place on the AWS server using PHP. Communication between the developed VR training environment and the server is handled through a dedicated API. Our Meta Quest 3 application is built with Unity 3D and C#.
  7. VR Training Environment Testing
Between October 2024 and March 2025, a preliminary study was conducted to explore the use of virtual reality (VR) technology in soft skills training in our advanced virtual training system. The study involved a group of approximately 120 participants representing 25 different companies from various industries, including healthcare, construction, retail, sales, and manufacturing. All tests and presentations took place in Warsaw, within modern training spaces specifically designed for immersive technology applications.
The primary goal of the study was to assess the effectiveness and potential benefits of immersive VR experiences in developing interpersonal skills, stress management, and effective communication. Another important aspect was the comparison of modern VR-based training methods with traditional approaches, aiming to determine the extent to which VR can serve as a viable alternative and complementary tool to existing training practices. The study included participants from various professional levels, providing a comprehensive perspective on the efficiency of VR-based training. Among the participants were entry-level employees, corporate training specialists, department managers, and senior executives. A significant criterion for participation in the study was voluntary inclusion and an interest in training and competence development.
Participants had the opportunity to test the developed VR training solution, designed to engage users and simulate realistic business and interpersonal situations. The Meta Quest 3 VR headset, known for its high-resolution display and precise motion controllers, was used for the presentations. The training scenarios included:
The study was an experiment with an experimental group (VR) and a control group (traditional training methods). The demographic structure of the respondents is presented in 
Table 4. A quasi-randomized experimental model was adopted with measurement before (pretest), after (posttest), and in a delayed time (follow-up). In the group involved in the tests, 100 participants were assigned to the experimental group (VR training) and 20 to the control group (traditional training). The qualification criteria included age from 22 to 45, at least one year of professional experience in teamwork or in contact with people, no health contraindications to using VR devices (e.g., balance disorders, epilepsy). A total of 65% of participants were women. In total, 10 people already had significant experience in using VR in training and 48 had no major experience with VR before. Participants were recruited among e-learning.pl customers. Due to trade secrets, detailed information about representatives of specific companies will not be disclosed. All participants were informed about the method of processing data entered while using the VR simulator. The division into the test and experimental groups was made pseudo-randomly, ensuring that each group included representatives who were, were not, and were very familiar with VR.
Participants in the experimental group underwent training in the form of two 45-min sessions (90 min in total). The training included interactive scenarios with a realistic avatar enabling verbal communication, practicing active listening skills, resolving conflicts and expressing emotions in difficult situations, and automatic feedback after the session. The participants of the control group took part in a classic workshop training in the form of two 45-min sessions. The training included a lecture, case analysis, group exercises, and role-playing. An experienced trainer from the e-learning.pl team acted as a moderator and observer.
Before the training, each participant completed a test of theoretical knowledge in the field of communication and conflict resolution (20 questions, point scale 0–100%), a self-assessment scale of soft skills (five-point Likert scale), and a situational scenario assessed by the trainer. All questions and the scenario were developed and constitute e-learning.pl know-how. Immediately after the training, each participant took the same knowledge test again, completed the self-assessment scale, and solved a new situational scenario (different from the pretest). After 4 weeks, participants were asked to participate in a study consisting of a knowledge test, a short practical task (behavioral assessment), and a survey regarding the transfer of skills to the work environment and general feedback.
The data collection method and measurement tools are summarized in 
Table 5 below.
The study revealed high levels of engagement and learning effectiveness associated with VR-based training:
- In total, 87% of participants reported a higher level of engagement in VR training compared to traditional training methods, such as multimedia presentations, lectures, or in-person workshops. 
- Participants emphasized that the immersive nature of VR training significantly increased their motivation to actively participate. 
- Younger employees particularly appreciated the gamification elements and interactivity, which positively impacted their engagement and willingness to take part in training sessions. 
In 
Table 6, included below, the equivalent concepts of a knowledge test conducted before (pretest) and after its issuance (posttest) are shown.
Looking at the results of the conducted test, it can be seen that the participants of the experimental group (VR) achieved a significantly higher improvement in results, which confirms the effectiveness of immersive training using AI and VR goggles in acquiring knowledge. For the proper interpretation of the results, three basic indicators in statistical analysis were also determined—useful in comparing results before and after the training. Δ average shows how much the results increased (or decreased) in a given group.
Figure 5 illustrates the distribution of knowledge test scores before and after training for both groups: VR and traditional. A clear increase in the median and a noticeable narrowing of the score distribution can be observed in the VR group after training, which may indicate more consistent knowledge acquisition among participants. In the traditional group, improvement is also visible but the score spread remains wider and the median increase is smaller. To verify whether the VR training group achieved significantly greater learning outcomes than the traditional training group, additional between-group statistical analyses were conducted. The difference in improvement (Δ score) between groups was compared using independent samples 
t-tests. The VR group showed an average increase of +25.3 percentage points (from 61.4% to 86.7%), while the control group improved by +11.7 points (from 60.8% to 72.5%). An independent samples 
t-test comparing the gain scores revealed a statistically significant difference, indicating that the VR group achieved higher learning gains than the control group: 
t(118) = 4.93, 
p < 0.001, Cohen’s d = 1.12 (large effect size). This confirms that the VR-based training was not only effective in itself but also significantly more effective than the traditional training in enhancing knowledge.
 Observers (soft skills trainers) assessed participants’ behavior during the simulation using a 10-point competence scale.
Based on the assessments of e-learning.pl trainers, participants of training using VR showed a significantly greater improvement in practical behaviors. The group working with the VR simulator achieved a result of 3.6 points higher after the training (an increase from 4.5 to 8.1). On the other hand, the group working with traditional methods also achieved better results after the training, but this increase was only 1.7 points (an increase from 4.6 to 6.3). The above data indicate a significant effectiveness of the solution proposed by the authors. An independent t-test showed that the improvement in behavioral competency scores was significantly greater in the VR group: t(118) = 4.12, p < 0.001, Cohen’s d = 0.95
During the study, participants also self-assessed their soft skills before and after the training. The changes presented by the participants are presented in 
Table 7 below. The greatest changes were observed in the areas of empathy, active listening, and conflict management.
Each of three assessed competencies showed a statistically significant difference in favor of the VR group. The results were as follows:
- Empathy: t(118) = 3.45, p < 0.001, Cohen’s d = 0.78; 
- Active listening: t(118) = 3.31, p < 0.001, Cohen’s d = 0.75; 
- Conflict handling: t(118) = 2.87, p < 0.001, Cohen’s d = 0.66. 
These are all medium to large effect sizes, indicating practical significance.
Four weeks after completing the training, participants were asked to retake the knowledge test and conduct a simplified simulation of a conversation with an avatar. The test results are presented in 
Table 8. The conducted research shows that the group working with VR retained 89% of knowledge compared to the result immediately after the training. In the case of the group working with traditional methods, this result was slightly weaker—72% of knowledge was retained. This difference was statistically significant: 
t(118) = 3.78, 
p < 0.001, Cohen’s d = 0.83. A key factor contributing to this was the real-time feedback mechanism, which enabled instant analysis and behavioral adjustments.
To confirm the validity of applying independent samples t-tests, additional analyses were conducted to test core statistical assumptions:
- Normality of score distributions was assessed using the Shapiro–Wilk test for each key variable (knowledge test scores, behavioral competence, self-assessment). No significant deviations from normality were observed (all p > 0.05). 
- Homogeneity of variances between the VR and traditional training groups was examined using Levene’s test. All comparisons showed non-significant results (all p > 0.05), validating the use of the standard t-test assuming equal variances. 
For robustness, Welch’s t-test was also computed in borderline cases and consistently confirmed the significance of between-group differences. Furthermore, the absence of a physically present trainer reduced stress levels among participants, allowing them to focus on the learning process itself, ultimately facilitating knowledge retention and effective skill development and an attractive and engaging new form of training.
Additionally, participants were asked to subjectively evaluate the training (scale 1–5). The results are presented in 
Table 9 below.
Despite the positive outcomes, several important limitations should be acknowledged when interpreting results. Firstly, group assignment was not fully randomized, which may affect the internal validity of the findings. Additionally, the control group was considerably smaller (n = 20) compared to the experimental VR group (n = 100), limiting statistical power and the ability to generalize certain comparisons.
The between-group results clearly demonstrate that the immersive VR-based training, supported by AI-driven interaction, leads to higher gains in knowledge, behavioral competence, and self-assessed skills. The findings support the use of VR as a superior method for developing interpersonal competencies in modern professional environments. Assumption checks for normality and homogeneity of variance were conducted. Both Shapiro–Wilk and Levene’s tests confirmed that the data met the assumptions required for valid use of independent t-tests. Additionally, Welch’s correction was applied where appropriate and did not alter the significance of the results, further supporting their robustness. Nevertheless, future research is encouraged to apply more advanced statistical models (e.g., ANCOVA or mixed-effects models) to control for potential confounders and baseline differences.
Although the between-group comparisons revealed statistically significant differences in effectiveness, the imbalance in group sizes may violate assumptions of equal variance and limit the reliability of some estimates. Further research should aim to balance group sizes and use full randomization to increase validity. Furthermore, while tests such as the independent t-test and Cohen’s d provide strong evidence of statistical and practical effects more advanced modeling (e.g., ANCOVA and mixed-effect modeling) could be applied in further studies to control for baseline differences.
In terms of measuring soft skills the tools used—including self-assessment scales and expert observation—involve a degree of subjectivity. While assessments were conducted by experienced trainers, potential observer bias or expectation effects cannot be entirely ruled out. Furthermore, the absence of standardized psychometric instruments (e.g., for measuring stress levels, empathy, or emotional resilience) restricts the depth and precision of the analysis. In order to formally confirm a subjectively reported greater sense of psychological comfort in the VR training environment, future research should consider using validated instruments such as the STAI or PSS to objectively evaluate stress levels and better support claims related to emotional safety and training comfort.
It Is also worth noting that knowledge retention was assessed only four weeks after training. While the VR training group showed high retention, longer-term follow-ups (e.g., after 3–6 months) would be necessary to evaluate the lasting impact of the training.
  8. Limitations
Despite the promising results presented, several significant limitations and risks need to be acknowledged. These limitations highlight potential challenges and areas for further research and improvement.
One of the most critical technical limitations of our solution arises from the integration of LLM, used for generation avatar responses. While these models significantly enhance realism and interactivity, they are inherently susceptible to generating incorrect, biased, or contextually inappropriate responses, often referred to as “hallucinations” [
64,
65]. Hallucinations in LLMs can lead avatars to provide trainees with misleading or factually incorrect information, potentially impairing the effectiveness of training scenarios and diminishing user trust.
Recent literature clearly identifies hallucination and bias as critical risks associated with the deployment of LLM-based interactive systems. Ji et al. (2023) [
64] highlighted that models trained on extensive datasets tend to perpetuate biases present in their training data, thereby reproducing stereotypical or discriminatory content, which can negatively impact training outcomes and reinforce inappropriate behaviors. Similarly, Sheng et al. (2022) [
65] demonstrated that without proper controls, generative models often produce outputs that inadvertently reinforce gender, racial, or cultural stereotypes, particularly problematic in training scenarios designed for fostering diversity, equity, and inclusion.
To mitigate these risks, we have already implemented a structured prompt-engineering approach, clearly separating conversation context, scenario parameters, and user inputs, significantly reducing the likelihood of generating inappropriate or out-of-context responses. However, it is crucial to acknowledge that, despite these implemented mechanisms, biases and stereotypes present in generated responses may still distort and limit the obtained research results. Future planned strategies include the introduction of real-time content filtering based on predefined ethical guidelines and scenario constraints. This method could involve analyzing generated responses using semantic and keyword filters to detect and eliminate potentially harmful or inappropriate content before reaching participants.
From an ethical and privacy protection perspective, another limitation concerns data security and participant privacy. Due to the sensitive nature of training interactions, particularly those involving personal or professional scenarios, ensuring the confidentiality of collected data, conversation transcripts, and behavioral analyses is crucial. To address this, we applied advanced encryption standards (AES-256 [
66]), end-to-end encryption (TLS 1.3), and strict anonymization protocol, promptly deleting raw audio recordings after processing. However, ensuring absolute data security remains challenging, especially in cloud-based environments.
Another important limitation of this study is the uneven sampling between the experimental and control groups, with 100 participants assigned to the VR training group and only 20 to the control group using traditional training methods. Although the allocation was pseudo-randomized and efforts were made to ensure diversity in VR familiarity, gender, and professional background in both groups, this imbalance introduces several methodological concerns. Most notably, the unequal group sizes may affect the statistical power of the analyses and increase the risk of Type I or Type II errors. Furthermore, it can violate the assumption of homogeneity of variances, which is critical for the validity of parametric tests such as the independent samples t-test. While statistical checks for these assumptions were conducted, this imbalance remains a potential source of bias and should be taken into account when interpreting the results. Future research should aim for more balanced group sizes or consider applying statistical techniques that compensate for such discrepancies.
  9. Discussion
This study demonstrates the potential of VR technology combined with AI in enhancing soft skills training. The developed training system, integrated with AI-driven virtual coaching, provides a highly immersive and adaptive learning experience, allowing users to engage in interactive dialogue simulations and practice negotiation strategies in realistic virtual environments.
The empirical study conducted with professionals from diverse industries confirms that VR-based training fosters higher engagement and improved knowledge retention compared to traditional training methods. The high immersion factor, real-time feedback mechanism, and adaptive learning pathways contribute to enhanced learning effectiveness, particularly in areas such as communication skills, stress management, and decision making.
Key findings from this research indicate that:
- In total, 87% of participants reported increased engagement and motivation in VR training compared to traditional classroom-based approaches; 
- A total of 72% of users demonstrated higher content retention, particularly in managing interpersonal communication and handling challenging business scenarios; 
- The removal of instructor presence helped reduce stress levels, leading to a more comfortable and focused learning experience; 
- Younger participants found gamification and interactivity essential in making training more engaging and effective. 
It can be definitely stated that VR training participants more often described the training as “modern”, “engaging”, and “the most practical so far”. Real interaction with the avatar (visibility of the face, verbal communication) was repeatedly indicated as a key advantage. The results clearly indicate that the use of VR technology (Meta Quest 3 + NLP for avatars) in soft skills training increases efficiency, supports knowledge and skill retention, engages participants more than traditional forms of training, and better meets the expectations of digital generations (Millennials, Gen Z). Thanks to immersion and interaction with realistic avatars, participants enter the role faster and train behaviors better, which they can then transfer to the work environment. The results confirm the need to implement modern technologies in training processes and redefine the approach to teaching soft skills. Thus, the authors obtained confirmation of their assumptions regarding the questions they asked and promising prospects for the developed solution. The results obtained by the authors, as well as by other entities, suggest significant potential for using VR in soft skills training. However, the authors continue to monitor the market and the development of available solutions, particularly in terms of long-term effectiveness—including analyzing the impact of VR training on employees’ real workplace behaviors, exploring the integration of VR with other training methods (e.g., coaching, mentoring, or traditional workshops), and assessing the return on investment (ROI).
We compared the results of our study with two similar articles: Sapkaroski et al. (2022) [
67] and Mayor Silva et al. (2023) [
68], both of which also focus on the use of immersive virtual reality for developing soft skills, particularly communication skills. Both studies confirm the positive impact of VR training on the development of these skills, indicating higher participant engagement and better knowledge retention compared to traditional methods. In the study by Sapkaroski et al., an 11% increase in communication confidence was observed, along with an 89% retention of this effect after four weeks, demonstrating the effectiveness of immersive VR compared to role-play simulations. Similarly, Mayor Silva et al. showed that nursing students participating in immersive VR simulations achieved significantly greater improvements in communication skills than those attending traditional case-based workshops. In our study, we observed even higher levels of engagement (87% of participants) and content retention (72%), which may be attributed to different avatar interactivity [
56,
57]. In the articles selected for comparison, the authors used avatars based on fixed, prerecorded responses according to a branching script, whereas our system employs a generative natural language processing model that enables dynamic, context-aware reactions. This increased interactivity likely translates into greater immersion, higher participant engagement, and more effective knowledge acquisition. These findings suggest that the integration of AI-driven avatars represents a significant advancement over traditional scripted VR training modalities.
Training using VR and AI technologies involves significant ethical challenges, especially in the area of protecting participants privacy and personal data. Recording biometric data (head movements, hand movements, physiological responses), as well as the user’s voice and image, requires the use of advanced anonymization and end-to-end encryption mechanisms and secure data storage in accordance with the requirements of a given region. An additional risk is the possibility of using the collected data to profile participants (in the sense of abuse), which can lead to unethical practices if appropriate transparency and consent policies are not implemented. It is, therefore, crucial to ensure voluntary participation, clearly define the purpose of data processing, and guarantee the right to view and delete it. The user should also know when they are talking to a human and when to they are talking to a bot and have the opportunity to report irregularities.
  10. Conclusions
The result clearly indicates that the use of VR technology in soft skills training increases efficiency, supports knowledge and skill retention, and engages participants more effectively than traditional forms of training. Furthermore, the integration of AI-driven avatars offering dynamic, context-aware responses leads to higher immersion and greater learning outcomes, especially for digital-native generations. The findings confirm the need to implement modern technologies in training processes and redefine the approach to teaching soft skills.
The promising results obtained both in this study and in comparable research suggest significant potential for using VR in soft skills training. However, it remains essential to continue monitoring the market and the development of available solutions, particularly regarding their long-term effectiveness—including the impact on real workplace behaviors, the integration of VR with other training methods (e.g., coaching, mentoring, or traditional workshops), and the assessment of return on investment (ROI).
Future work should focus on further anonymizing and securing stored data, as well as enabling multiuser VR collaboration for team training. Overall, the findings form a strong foundation for further research and the broader implementation of VR technology in training processes.