1. Introduction
The emergence of virtual reality (VR) technology has fundamentally transformed the way users engage with digital content and facilitated the creation of immersive environments that blur the boundaries between the physical and virtual realms. VR enables users to interact with three-dimensional, computer-generated spaces in real time, providing a level of engagement far beyond that of traditional two-dimensional interfaces. Although VR initially gained popularity in the gaming and entertainment sectors, its applications have since expanded to various fields, such as education, healthcare, and industrial training.
When VR technology first began to flourish, VR headsets needed to be connected to a computer to render frames of VR scenes, which are known as tethered headsets. With advancements in technology, such as powerful mobile processing, mature VR software platforms, and faster wireless connectivity, All-In-One VR (AIO VR) headsets have become the industry standard. These market-dominating VR devices include their own operating systems (OSs) within the headset. Recent examples include Meta Quest, PICO Neo, HTC Focus, and Apple Vision Pro. Compared to their wired counterparts, the AIO VR headset can perform all computing within itself, freeing users from the constraints of cables and allowing for portable, computer-independent usage scenarios.
Despite these advantages, AIO VR headsets come with some limitations. Smoothly rendering immersive VR environments requires substantial computing power and high-end hardware, which are difficult and costly to incorporate into the compact form factor of an AIO VR headset. Consequently, the accessibility and performance of VR experiences on standalone AIO devices may be limited due to this constraint. Another challenge for developers is the difference in operating systems compared to traditional, PC-connected VR applications, which are typically Windows-based. Migrating existing VR applications to AIO devices requires significant effort and resources to ensure compatibility and optimal performance, especially given the distinct OSs provided by different VR vendors. It becomes even more pronounced when dealing with embedded devices, as their firmware is highly dependent on the underlying OSs.
Recent advancements in multimedia streaming technology provide a solution to the hardware limitations of AIO VR headsets. Streaming technology enables users to experience high-quality VR content via the internet by shifting the heavy computational workload to remote servers. Instead of performing complex VR rendering locally, the user’s device only needs to parse input sensory signals and transmit them online. Accordingly, the VR content is generated in the cloud, then rendered and transmitted back to the user’s device to display in real time. By offloading these intensive computations to the cloud, users can access high-quality VR content on lower-end devices [
1]. For consumers, this implies that purchasing high-end hardware is not necessary. For developers, VR streaming alleviates compatibility issues across various VR OS platforms, thus increasing development efficiency. This democratization of VR technology opens up new possibilities for both content creators and consumers, potentially leading to the widespread adoption of VR in everyday life. Additionally, rendering VR in the cloud enables users to interact with each other—cloud servers instantly render VR frames according to users’ poses and interactions, and synchronize to each user’s headset.
However, VR streaming also faces several challenges. One of the most pressing issues is the requirement for extremely high bandwidth and low latency to ensure responsive and high-quality VR experiences. Unlike traditional one-way 2D video streaming, VR frames must be transmitted according to the users’ pose in real time with minimal delay to maintain their immersion. Any noticeable lag can lead to motion sickness and disrupt the overall VR experience. In addition, VR streaming demands much higher network bandwidth to transmit high-resolution, 360-degree video that covers the user’s entire visual field. According to [
2], data rates of more than 530 Mbps and a latency of less than 10 ms are required by high-quality VR, which are much more stringent than today’s general network applications.
On the VR headset side, current commercial VR streaming solutions are primarily built on the OpenXR standard, which registers as the active OpenXR runtime on the headset system to abstract hardware differences. It allows compatibility with existing applications using the standardized OpenXR API and eliminates the need to modify source codes. While offering convenience, it comes at the cost of reduced flexibility for developers. For instance, OpenXR lacks support for streaming custom sensory data, such as motion capture (MOCAP) or EEG data. This limitation poses a challenge for researchers aiming to analyze players’ behaviors or conditions via peripheral sensors in detail. Additionally, although OpenXR supports connections of multiple VR devices simultaneously, most video streaming systems do not inherently support action synchronization of multiple players, further constraining the potential for collaborative or multiuser experiences in VR.
In this paper, we propose Loka (
https://github.com/ncu-wmlab/LOKA.Core, accessed on 22 January 2024), a VR streaming software toolkit developed using Unity Engine, a leading game engine widely used in VR application development. Loka integrates the benefits of AIO VR and streaming, allowing developers to avoid the complexities of cross-platform development and device-dependent SDKs. The toolkit provides a unified interface, handling the underlying specific organization of various VR OS platforms. We have tested Loka on various device models to ensure compatibility and performance. By eliminating the need for developers to “reinvent the wheel”, we aim to enhance VR application development efficiency, as well as extend its functions on integrating sensory data.
Figure 1 illustrates the key components of Loka.
From the user’s perspective, Loka eliminates the need for high-end devices to experience high-quality graphics, which also conserves the storage space and energy consumption of devices as only a lightweight streaming app is required to gather and send input data, then processes the frames received from the server. Furthermore, our solution inherently supports multiplayer (multicasting) functionality, enabling developers to easily add users to the same virtual environment. For researchers, Loka can track a variety of data sources, including connection performance metrics and data from IoT devices linked to the VR headset, with the flexibility to extend support for additional data as needed.
4. System Architecture
Loka’s architecture is designed to provide a robust VR streaming solution, allowing for real-time interaction, data collection, and multiplayer functionality. As illustrated in
Figure 2, the system comprises several key components, including clients, which are VR devices connected with peripheral sensors, a signaling server, and a host server. The architecture facilitates the seamless data flow between these components by leveraging Web Real-Time Communication (WebRTC) for low-latency communication. This setup enables real-time VR streaming, custom data integration, and multiplayer support, all in a highly flexible environment.
4.1. Client: VR Devices and Data Collection
The client typically denotes the VR devices worn by users. These devices are responsible for receiving streamed frames from the host server and transmitting real-time user inputs to the server. In addition to tracking basic interaction data (e.g., head movements and controller inputs), the system supports the collection of more advanced data streams in real time, such as tracked data and IoT sensor data. The former denotes motion tracking data supported by devices; the latter collects custom data streams from peripheral embedded IoT devices, such as physiological sensors.
These data are continuously sent to the host for processing and rendering, which could also be retrieved by the virtual scene in real time, enabling a responsive and interactive VR experience while also supporting research applications that require detailed data tracking.
As illustrated in
Figure 3, Loka processes data on the client side by categorizing them into a standardized format before transmitting the formalized data to the host. When integrating new sensors or VR devices into Loka, developers can utilize the two groups of modules located in the top-left corner—sensor API and device SDK—to connect to the relevant functions within Loka. Depending on the specific sensors and devices, it may be necessary to extend these modules to transform the data into the required format. This modular design minimizes effort and eliminates the need for extensive modifications to the overall architecture, ensuring seamless and efficient integration.
4.2. Host: Rendering Server
The host server is the backbone of Loka’s architecture, managing the environment (game modules) and handling computationally intensive tasks, such as rendering. The server is capable of handling multiple clients simultaneously, each represented as LokaPlayers, interacting within the same virtual environment. It processes incoming data from the clients and generates the corresponding frames to be sent back to the devices. The host server is also responsible for the following:
Environment management: run the virtual environment, ensuring that all users experience a synchronized, immersive experience.
Custom data handling: process specialized data streams, such as physiological signals and IoT data, for real-time integration into the VR experience.
Data and log creation: collect users’ input and generate reports and systematic logs for further analysis, such as player behavior or research-related metrics.
4.3. Signaling Server: Signal Communication
Loka employs a signaling server to establish and maintain communication between the clients and the host. This server is integral to setting up peer-to-peer (P2P) connections via WebRTC, which handle real-time signal exchange. The signaling server is responsible for the following:
Host–server matching: The signaling server matches available host servers for clients based on their connection requests. When a client device initiates a connection, the signaling server selects an appropriate host server to handle the session.
Signal exchange: The signaling server is responsible for the exchange of signaling information, which is required to establish a connection between the client and host. This includes exchanging available connection methods and the state of the client–host connection.
The detailed steps involved in the connection process will be further discussed in the next section.
5. Technical Implementation
Loka (the source code is publicly accessible from GitHub:
https://github.com/ncu-wmlab/LOKA.Core, accessed on 22 January 2024) is built atop Unity Render Streaming (URS). This experimental yet powerful framework leverages real-time rendering in Unity and facilitates remote content delivery, and its source code is publicly reachable on GitHub [
21]. At the core of URS’s data transmission is WebRTC, which enables low-latency, real-time data exchange between the client and server, ensuring smooth interactions and responsive VR experiences.
To support cross-platform VR functionality, Loka integrates the Unity XR Interaction Toolkit along with the OpenXR Plugin, ensuring that the system can be deployed across various VR devices without needing significant modifications to the application code. This approach abstracts device-specific dependencies and allows developers to seamlessly integrate different VR hardware into their applications. On the client side, Loka interfaces with device-specific SDKs to ensure full compatibility with the unique features of each VR headset, such as Quest, PICO, and other popular AIO VR devices. The framework’s modular design enables Loka to be efficiently extended to accommodate new devices and hardware environments. Additionally, automated compatibility testing pipelines can be incorporated to ensure seamless cross-platform performance as new devices and operating systems emerge.
5.1. Establishing Connection
The connection between the client and the host is established through the signaling server, following the standard WebRTC connection setup process. We extended the example web app of URS written using Node.js. It achieves its role as a WebSocket server, which is responsible for establishing and maintaining WebRTC connections across multiple devices. As shown in
Figure 4, the client first initiates the connection by sending a connect request to the signaling server. Upon receiving the request, the signaling server creates a
connectionPair, initially linking the client’s WebSocket connection. Next, the client sends an offer to the signaling server, which is forwarded to the host. The host then responds with an answer, which the signaling server relays back to the client. Once the answer is received by the client, the signaling server updates the
connectionPair to effectively pair the two. After both parties exchange their offer and answer messages, the Session Description Protocol (SDP) connection is established, allowing real-time data exchange to begin between the client and the host.
5.2. Host View Adaptation and Rendering
Once the connection is established, the host dynamically adjusts the field of view (FOV) to match the client’s FOV settings. This ensures that the visual experience on the client’s VR device aligns with the intended perspective. The FOV can still be adjusted during runtime, allowing for seamless adaptation to user preferences or device constraints.
The host captures the in-game view from the perspective of the LokaPlayer—the virtual representation of the user in the virtual scene. This view is then rendered into an image with an aspect ratio of 2:1. The rendered image is then transmitted to the client and displayed in front of the user. If the client’s display size does not match this ratio, the system automatically adjusts and fits the image to ensure a consistent and clear visual presentation without distortion. For instance, Meta Quest 3 has a resolution of 4128 × 2208; therefore, the rendered frame will be displayed in 4416 × 2208.
5.3. Client Input Handling
Loka’s input system is built on Unity’s new Input System, which offers a flexible and modular way to handle user input. This system separates input into devices and controls. Each device (e.g., controller) consists of multiple controls (e.g., button presses, joystick movements, device position). The input is abstracted into input actions and input action maps, where actions listen for changes in the corresponding controls to determine whether an action is triggered or to retrieve specific input values. On top of that are input action assets, which serve as the container of action maps and could be serialized as files.
As illustrated in
Figure 5, an action map can contain multiple actions. For example, the action Position in action map XRI LeftHand is configured to detect the position data from either the Left Controller or Left Hand (captured by hand tracking) by looking for the appropriate controls within those devices. We also provide ControllerPos and HandPos to let developers retrieve the dedicated data. This system allows for a flexible mapping of device inputs to in-game actions, which is particularly useful in VR environments with multiple input devices.
Loka extends this system by synchronizing the input system from the client to the host machine via a WebRTC data channel, ensuring that the host can receive all relevant input data from connected VR devices. This approach allows the host to mirror the actions and input values of each player, ensuring that the virtual environment behaves consistently based on the real-time actions of each user.
WebRTC data channels support buffering of outbound data, enabling real-time monitoring of the buffer state. Notifications can be triggered when the buffer begins to run low, allowing the system to ensure a steady stream of data without memory overuse or channel congestion. This mechanism is critical in a dynamic multiplayer environment, as it minimizes input delays, prevents data loss, and ensures synchronization accuracy for players.
Loka uses a “first-touch” mechanism to handle simultaneous input on the same virtual object, relying on the queued sequence of the WebRTC data buffer to determine input priority. While network latency can influence input timing, for research located on a local network, as in our experimental setup described in
Section 6, the difference in input timing can be negligible.
In conventional VR streaming solutions that rely on OpenXR runtime, the supported input signals are typically limited to fundamental data such as the poses of heads or controllers (
Figure 6). These signals are essential for the basic VR interaction supported by most devices. However, this restricts the ability to integrate device-specific or advanced input features, such as eye tracking or EEG sensors. Loka addresses this limitation by extending input systems to support a wider range of data types. For instance, device-dependent data like eye tracking and hand tracking are fully integrated into the framework. In addition, since the device-dependent data are not standardized, the same type of data collected from each device may use varying nomenclatures.
To address this problem, Loka interprets the data on the client side, categorizes them into a standardized format, and then transmits the formalized data to the host. Beyond device-specific capabilities, Loka supports the real-time integration of custom IoT sensor data. For devices with unique input formats, Loka employs adapters to preprocess and map raw signals to the framework’s standardized format, maintaining compatibility across various VR headsets and IoT sensors. Also, since some sensors require a startup signal to work, Loka reserves a direct channel to allow server–sensor communication (as
Figure 3 depicts). These capabilities provide researchers and developers with a versatile and customizable environment, facilitating advanced applications such as biofeedback systems and personalized user interactions.
Loka standardizes input signals across devices by translating device-specific formats into a unified internal representation. This is achieved through Unity’s Input System and XR Interaction Toolkit, which manage inputs dynamically and ensure consistent interaction experiences. For devices with unique input formats, Loka employs adapters to preprocess and map raw signals to the framework’s standardized format, maintaining compatibility across various VR headsets and IoT sensors.
5.4. Multicasting Capability
Unlike typical streaming or PCVR solutions, Loka natively supports multiuser functionality within a single host server, allowing multiple users to interact in the same virtual scene simultaneously. This capability creates opportunities for VR research in areas such as social interaction, a key component of the Metaverse. This functionality is achieved by decoupling the OpenXR runtime from our solution. Since the OpenXR runtime does not support multiple devices connected to the same computer simultaneously, we use it solely to translate input signals. The actual input data are transmitted via the WebRTC channel to the host, as described earlier.
Since Loka supports multiple users in the same scene simultaneously, we developed a custom action map in the host, which stores various types of VR inputs, including controller poses and eye-tracking data. Each time a player connects to the system, a cloned version of this action map is assigned to the player’s corresponding LokaPlayer instance. By binding the player’s inputs to their respective action map, Loka ensures that each player’s inputs are correctly tracked and processed in real time. Whenever a player’s input is updated, the system reads the bounded action map to retrieve the correct values, providing precise input handling for multiplayer VR environments. This architecture is illustrated in
Figure 7, where the Input Action Asset defines actions and action maps for each player. The LokaPlayer on the host is assigned a cloned action map, which ensures that input data (e.g., device position, rotation, button presses) are accurately reflected in the host environment.
6. Results
In this section, we evaluated the performance and effectiveness of Loka. We conducted a series of tests focused on key areas such as networking load or responsiveness under different scenarios.
6.1. Testbed Setup
Our experiments were conducted on a controlled testbed within our lab. To simplify the network topology, we ran the host and signaling server on the same machine, as the signaling server only functions as a state exchanger. The machine used was a Windows 11 desktop, equipped with an Intel i7-13700K processor and an NVIDIA RTX 4060 Ti graphics card. The streamer program is built on Unity 2020.3.33f1.
To facilitate performance analysis, we implemented a logging system in the Loka framework, which records connection metrics during runtime. These metrics include WebRTC-specific performance indicators such as latency, packet loss, and bitrate. The metrics can be monitored on the host machine in real time and are also saved to the file system for later analysis. The recorded data will be used to evaluate system performance under different conditions. For instance, the logging system can be used to study Loka’s performance under high-frequency data input scenarios by monitoring real-time CPU usage and tracking latency variations in relation to the volume and frequency of input data.
On the client side, we used the PICO Neo 3 Pro Eye (PICO Interactive, Beijing, China) throughout the initial experiment, which natively supports eye tracking. Additionally, several IoT sensors were integrated into the setup, including a breath sensor and an EEG sensor. To evaluate cross-platform compatibility, we replicated the experiment on other VR devices, including Meta Quest Pro (Meta Platforms, Menlo Park, CA, USA) and Meta Quest 3 (Meta Platforms, Menlo Park, CA, USA), as well as on PC platforms. These additional experiments aimed to assess Loka’s performance across various platforms and devices, which are detailed in
Section 6.3.
The breath sensor is an Arduino board (Arduino, Turin, Italy) equipped with a barometer, mounted on a belt. During the experiment, the user wore the belt around the abdomen. When the user inhales, the abdomen will expand, causing the barometer reading to increase; conversely, when the user exhales, the reading decreases. The sensor readings are transmitted to the Arduino board and then to the headset via Bluetooth (BT). In our setup, we use an Arduino Uno or Arduino Leonardo for the breath sensor. Since these boards do not have built-in BT functionality, we include an HC-06 module (PiePie, Taipei, Taiwan) for BT data transmission. The HC-06 module is connected to the Arduino via its serial interface (RX-TX); once the barometric sensor data are read by the Arduino, they are wirelessly relayed through the HC-06, which sends the data to the headset.
For the EEG sensor, we utilized Ganglion, a commercial Arduino-based bio-sensing device compatible with OpenBCI (
Figure 8). Ganglion is capable of monitoring EEG, EMG, or ECG signals, with data sampled at 200 Hz on each of the four channels. The sensor was integrated with the headset to collect real-time brainwave data during the experiment. Ganglion uses Bluetooth Low Energy (BLE) to transmit data in a specialized format. To enable seamless integration with the VR environment, we developed an embedded library in Android Studio and incorporated it into Unity (the implementation (Android Studio):
https://github.com/ncu-wmlab/LabFrameAndroidPlugins/tree/master/ganglion_plugin/src/main/java/com/xrlab/ganglion_plugin, accessed on 22 January 2024) (the interface code in Unity:
https://github.com/ncu-wmlab/LabFrame_Ganglion, accessed on 22 January 2024), allowing real-time data streaming at 200 Hz.
6.2. Bandwidth Loads
Loka is built atop WebRTC, whose congestion control capability can dynamically adjust the target video bitrate based on the estimated network throughput and condition. It begins with an initial setting and efficiently adjusts in real time as the network fluctuates. WebRTC receivers can send receiver estimated maximum bitrate (REMB) messages to the sender as soon as they detect any congestion and then keep sending the messages per second even if no congestion is happening. Then, the sender decides if the transmission bitrate can be raised or should be immediately lowered. REMB messages are usually generated by the receivers every 250 to 500 ms [
22]. Consequently, dynamic forwarding operates on a fine-grained timescale, effectively accommodating short-term bandwidth variations among receivers and ensuring seamless playback continuity. The bandwidth adaptation feature is ideal for remote cooperation scenarios. Gunkel et al. [
23] presented a WebRTC-based system for photorealistic social VR communication and evaluated the performance of the system for handling multiple user streams.
Another key feature is the adaptive bitrate capability of the AV1 encoder, which is particularly effective in reducing redundant data in low-motion or static frame scenarios, significantly optimizing bandwidth usage. Compared to traditional codecs, AV1 delivers approximately 30–50% better compression efficiency than H.264 and 20–30% better than H.265 (HEVC) while maintaining the same level of visual quality. Uhrina et al. [
24] analyzed the compression performance of several modern codecs, revealing that their efficiency varied with resolution. Notably, newer codecs like AV1 demonstrated greater efficiency at higher resolutions. These findings highlight the potential advantages of using the AV1 codec for VR streaming, which typically demands high-resolution content. This advantage enables Loka to optimize bandwidth consumption while preserving high visual fidelity. It can adapt to lower bitrate for low-complexity scenes, such as minimal movement or changes, to reduce bandwidth usage, but also keep the visual quality. Loka’s integration of WebRTC and AV1 ensures adaptive data transmission in dynamic environments. By monitoring network conditions and adjusting bitrate dynamically, the system reduces latency and prevents bandwidth overuse while maintaining high visual quality.
To evaluate this adaptive characteristic, we designed our experiment in three phases (
Figure 9). Each phase was designed to assess network performance under different levels of user movement and scene complexity:
Phase 1: The user was instructed to remain still, looking straight ahead with minimal head or body movement. This phase simulates a low-motion scenario, allowing us to measure our ability to maintain high-quality streaming when only minor frame updates are necessary.
Phase 2: The user was asked to move forward within the virtual environment, specifically walking through a playground slide. This phase represents moderate user movement, introducing more complex frame changes as the user interacts with objects in the scene.
Phase 3: In this phase, the user was allowed to freely walk and turn around in the scene. This phase introduced both rapid movement and changing perspectives as the user explored the virtual environment. With significant changes to both the objects and the user’s perspective, this phase poses higher bitrate demands and more frequent frame updates.
Figure 10 demonstrates how WebRTC responded to different scenarios across the three phases. During the low-motion scenario, where the user remained mostly still, the bitrate remained relatively low and stable at an average of 8353 kbps. This is expected in scenarios with minimal movement, as WebRTC saves up bandwidth by reducing the number of frame updates. As the user began to move in the virtual environment, the bitrate increased significantly. The average bitrate during this phase rose to 12,536 kbps, reflecting the more complex frame updates required to handle moderate user movement. WebRTC adapted by dynamically increasing the target bitrate to ensure high-quality streaming in response to the increase in scene complexity. Finally, in phase 3, the user was allowed to freely walk and turn around, introducing rapid movement and more frequent changes from the user’s perspective. The system reported a further increase in bitrate, averaging 14,675.5 kbps. The higher bitrate in this phase reflects the increased demand for frequent frame updates to accommodate the dynamic state of the scene. Despite the increased motion, the system managed to maintain a relatively stable frame rate and avoid significant delays.
6.3. Multi-Platform Performance
Loka’s cross-platform compatibility is enabled by its modular design and reliance on Unity’s XR Interaction Toolkit and OpenXR Plugin. This approach abstracts device-specific dependencies and ensures seamless operation across different hardware. Rigorous cross-platform testing, including evaluations on various VR devices and operating systems, ensures consistent performance. The framework’s dynamic configuration capability further enhances adaptability, enabling Loka to accommodate new devices and platforms efficiently.
To evaluate Loka’s performance across various platforms and devices, we replicated the experiment mentioned in the previous section on a range of VR devices, including PICO Neo 3 Pro Eye, Meta Quest Pro, and Meta Quest 3 (
Table 2). In addition to VR devices, we conducted the experiment on PC platforms, where keyboard inputs were used to simulate the control of movement and viewport, to mimic the VR experience. The PC tests were performed on both Windows and MacOS systems to assess Loka’s compatibility and performance across desktop environments. By covering a diverse range of platforms, both VR and non-VR, we ensured that Loka’s performance was evaluated under various hardware and software configurations, providing a comprehensive understanding of its streaming capabilities across different devices and operating systems.
The result is depicted in
Figure 11, where we can observe a clear trend in bandwidth consumption across the three phases. Phase 3, which involved the most user movement and scene complexity, consistently consumed the most bandwidth, followed by Phase 2 with moderate movement, and finally Phase 1, where users were mostly stationary. This pattern aligns with the results from the previous section, reinforcing that higher user motion and scene complexity demand greater bandwidth for real-time streaming in VR environments.
A notable observation is the lower bitrate performance on PC platforms (Windows and MacOS) during Phase 1 compared to the VR devices. This discrepancy can be attributed to the fact that, on PCs, users could remain completely stationary, while VR users, though instructed to remain still, could still exhibit subtle, involuntary movements such as head tilts or minor body shifts. These slight movements were enough to increase the bitrate required for VR streaming, as even small changes in position result in new frame data being sent. Additionally, the use of the AV1 codec could play a significant role in this observation. AV1 is known for its exceptional efficiency in handling low-motion or static frames, significantly minimizing the bitrate when there are no major changes in the scene [
25]. As a result, on PC platforms where users could remain truly stationary, the bitrate remained consistently low during Phase 1. This highlights AV1’s ability to optimize bandwidth usage in low-motion scenarios, particularly when streaming static frames.
Overall, the results demonstrate Loka’s ability to adapt to different platforms and scenarios, efficiently managing bandwidth across varying levels of motion and scene complexity. The system’s performance across multiple platforms shows that Loka can effectively scale its streaming capabilities to meet the demands of different devices and demands.
6.4. Multicasting Performance
To evaluate Loka’s multicasting capability, we conducted an experiment by connecting multiple clients to a host simultaneously. The test began by connecting a single device to the host, followed by the addition of a new device every 60 s. Throughout the process, we measured the bitrate and framerate performance to evaluate the system’s behavior under increasing load.
The results of the multicasting performance experiment are shown in
Figure 12. We can observe that the frame rate remains stable (i.e., close to 60 FPS) for up to three players interacting in the same virtual environment at the same time, demonstrating that the system is able to perform smoothly without impacting users’ experience. However, as the number of players increases, the frame rate gradually declines, dropping to between 40 and 50 FPS with six players. Additionally, the maximum round-trip time for packets increases from 15 ms to 25 ms. This performance degradation is primarily attributed to CPU load, as the system requires significantly more computational resources to process and synchronize data as more players join.
This performance drop is primarily attributed to CPU resource limitations on the host server. As more players join, the system requires significantly more computational resources to process and synchronize data streams, leading to increased CPU load. Consequently, the system becomes CPU-bound, and this serves as the major bottleneck in the current experimental setup.
The results indicate that, on our experimental host hardware, Loka performs efficiently in multiplayer settings with up to three concurrent users, maintaining consistent frame rates, lower latency and delivering a seamless experience. However, beyond this threshold, performance degradation occurs. The WebRTC testing conducted in [
26] also highlights the impact of resource limitations on performance. The service was hosted on a medium-sized cloud instance with 2 vCPUs and 4 GB of RAM. The study found that the system could support up to approximately 175 clients while maintaining acceptable latency. However, when the number of connected clients exceeded this threshold, the latency for all connected clients increased dramatically.
To address this limitation, a straightforward solution is to scale the system by increasing the CPU resources on the host server to support additional multicasting sessions. Furthermore, future optimizations will prioritize the development of efficient resource allocation strategies to enhance CPU utilization and improve overall system scalability.
7. Conclusions and Future Works
In this paper, we introduced Loka, a VR streaming toolkit designed to support the growing demands of the Metaverse by integrating the flexibility of AIO VR with the power of cloud-based rendering and streaming. Loka enables high-quality VR experiences on low-end hardware by offloading rendering tasks to remote servers, overcoming hardware limitations while supporting a variety of platforms. We assessed Loka’s ability to adapt to varying network conditions and user motion, ensuring stable and immersive experiences. Additionally, Loka’s multicasting functionality allows multiple users to engage in real-time interactions, a crucial feature for expanding social and collaborative experiences in the Metaverse, while integrating IoT sensor data, further enriching VR applications for both developers and researchers.
For future work, we plan to integrate Loka into our existing game modules, expanding its cross-platform and multiplayer capabilities. This integration will allow for seamless interaction between players on different devices, enhancing the versatility of the system. In addition, we plan to test Loka in various real-world scenarios to demonstrate its applicability and performance in practical settings. For example, in a remote education scenario, Loka will be used to create immersive VR classrooms where multiple students can interact in real time. These tests will evaluate key performance metrics such as synchronization accuracy, latency, and the effectiveness of personalized content delivery in enhancing learning outcomes. From a technical standpoint, we aim to study approaches to optimize bandwidth consumption and reduce latency by exploring advanced techniques such as foveated video streaming, which achieves throughput reduction by prioritizing rendering quality in areas where the user is looking [
27,
28]. The user’s viewpoint prediction can be also utilized to cache video data proactively and partially offload computing tasks to the edge server, to meet the demanding E2E latency [
29]. Additionally, we plan to implement quality of service (QoS) prediction algorithms to instantly predict the variance of the demanded quality metrics and allocate resources accordingly, to ensure a smooth streaming experience even under fluctuating network conditions [
22]. We plan to design adaptive algorithms for input prioritization, such as dynamic down-sampling of low-priority data streams and resource allocation strategies for high-priority inputs, to ensure that high-frequency input data do not degrade system performance and user experiences in multiplayer environments.