EVeREst: Bitrate Adaptation for Cloud VR

: Cloud Virtual Reality (VR) technology is expected to promote VR by providing a higher Quality of Experience (QoE) and energy efficiency at lower prices for the consumer. In cloud VR, the virtual environment is rendered on the remote server and transmitted to the headset as a video stream. To guarantee real-time experience, networks need to transfer huge amounts of data with much stricter delays than imposed by the state-of-the-art live video streaming applications. To reduce the burden imposed on the networks, cloud VR applications shall adequately react to the changing network conditions, including the wireless channel fluctuations and highly variable user activity. For that, they need to adjust the quality of the video stream adaptively. This paper studies video quality adaptation for cloud VR and improves the QoE for cloud VR users. It develops a distributed, i.e., with no assistance from the network, bitrate adaptation algorithm for cloud VR, called the Enhanced VR bitrate Estimator (EVeREst). The algorithm aims to optimize the average bitrate of cloud VR video flows subject to video frame delay and loss constraints. For that, the algorithm estimates both the current network load and the delay experienced by separate frames. It anticipates the changes in the users’ activity and limits the bitrate accordingly, which helps prevent excess interruptions of the playback. With simulations, the paper shows that the developed algorithm significantly improves the QoE for the end-users compared to the state-of-the-art adaptation algorithms developed for MPEG DASH live streaming, e.g., BOLA. Unlike these algorithms, the developed algorithm satisfies the frame loss requirements of multiple VR sessions and increases the network goodput by up to 10 times.


Introduction
Numerous Virtual Reality (VR) applications have emerged recently to improve entertainment [1,2], medicine [3,4], engineering [5,6], and other spheres of everyday life. To provide an immersive experience, such applications require minimal feedback delay and high image quality. The real-time rendering of the high-quality virtual environment is computationally expensive and requires special high-performance hardware. Cloud VR was introduced to promote VR and reduce the cost of the headsets [7]. In cloud VR, the headset, also called the Head-Mounted Display (HMD), only monitors the user's actions and sends the data to the remote server. The server renders the environment according to the received data and sends it back to the HMD as a video stream. The HMD then plays the video back to the user and, maybe, slightly adjusts the picture if the user's viewpoint changes.
There are still many issues related to the architecture of cloud VR systems, i.e., the computational split between the cloud or edge server and HMD, the proactive generation of the future virtual scenes, and bandwidth allocation [8][9][10].
Such issues become more challenging if the system tries to minimize the bandwidth used for data delivery. In particular, recent studies showed that the video shall have The rest of the paper is organized as follows. In Section 2, we describe the cloud VR system and introduce the problem statement. Section 3 reviews and analyzes the existing approaches to real-time video bitrate adaptation. Section 4 describes the algorithm developed in this paper. In Section 5, we present and discuss the obtained numerical results. Finally, Section 6 concludes the paper.

Cloud VR System Description and Problem Statement
A simplified cloud VR system is presented in Figure 1. Multiple VR HMDs via a Base Station (BS) are connected to Mobile Edge Computing (MEC) servers. Each HMD tracks the user's actions and sends their descriptions to the MEC server. The server renders the virtual environment, generates the corresponding video images, and immediately sends them back to the HMD, which displays them to the user. We assume that the BS shares the network resources among the flows belonging to different clients so that each one receives its fair share [25,26]. The clients enter and leave the network and use cloud VR applications in between. The channel state also changes because of the variable environment [27] and the number of active clients. Because the HMD needs to continuously display the video frames, the server shall generate the frames irrespective of the user actions. Therefore, each VR video flow can be modeled as a sequence of video frames generated by the server in real time with a fixed frequency ν and transmitted immediately. The video playback is organized as follows. The HMD accumulates a few frames in a buffer (often referred to as jitter-buffer) and only then starts the playback by fetching the frames one by one with the same frequency ν. The jitter-buffer compensates for the small unpredictable variations in the frame delivery rate. The initial depth of the buffer defines the maximal virtual scene refresh delay and the frame delivery time constraint D QoS . If at some moment the buffer is empty, the video player repeats the previous frame and discards the late one when it arrives to keep the playback delay fixed. We call such a situation a frame loss. Note that although the buffer can have a size of a few frames, the scene refresh delay may be smaller in the considered system if, for example, the server predicts the user's actions for a few frames ahead.
Modern video codecs (e.g., h265 [28], AV1 [29]) efficiently compress the video stream, but make them much more bursty [30]. With these codecs, only a small part of the frames, namely key frames (or I-frames), are self-contained and can be decoded independently.
Other frames are called Predicted frames (P-frames). They carry only the differences between the current and the previous frame. Therefore, they are much smaller than the key frames. Such video compression dramatically reduces the average bitrate of the video at the cost of the increased burstiness of the flow: key frames are generated rarely but produce high loads on the network.
In this paper, we assume that higher bitrates for the same video codec improve the visual quality of the video. That is a valid assumption for any well-engineered video service [31,32]. Unfortunately, it is impossible to choose a certain video bitrate level at the beginning of the flow and keep it unchanged during the whole VR session because the network state dynamically changes [27]. A higher bitrate requires more network resources to transmit the video to the client and increases the probability of delays and losses if the resources are insufficient. Therefore, generating video with the highest possible bitrate can lead to QoE degradation [33]. In contrast, generating video with low bitrates reduces network resource consumption and frame loss probability. However, it also worsens the visual quality of the video and reduces the QoE.
Therefore, the applications shall adaptively choose the bitrate of the video. This paper assumes that to provide the efficient coexistence of different cloud VR operators in a generic network architecture that is not tailored to cloud VR service, the bitrate adaptation is performed in a distributed manner. Furthermore, we do not consider the problem of tile-based adaptation because, thanks to MPEG OMAF technology [15], this adaptation can be performed independently from the bitrate adaptation. The bitrate adaptation algorithm estimates the maximal amount of data that can be delivered to the client in time, and the tile-based adaptation decides on the size (e.g., in pixels) of each tile to fit in the constraint imposed by the bitrate adaptation.
Similar to the architecture considered by the MPEG DASH protocol [34], each client independently measures the state of the network and requests the corresponding video bitrate from a discrete set B based on some bitrate adaptation algorithm, while the server encodes and sends the video as requested. Similar to MPEG DASH, we assume that the video is encoded in chunks, and the server can change the bitrate only at the beginning of a chunk. The reason for such a limitation is in the video structure: the video stream with a new bitrate shall start from the keyframe. Therefore, to reduce the network resource consumption, we should switch the bitrate only when the next keyframe is generated. Note that this does not restrict the server from sending the frames as soon as they are generated. Despite many similarities with MPEG DASH, cloud VR applications require much lower delays, so feedback from the application and bitrate change requests shall be sent asynchronously, e.g., the application sends the bitrate change request when it decides to change the bitrate, but the server takes the request into account only when it generates the next keyframe.
For the considered bitrate adaptation scheme, the QoE of the cloud VR user can be reduced to the QoE for real-time video streaming because all the VR specifics (e.g., view adaptation) are not affected by the adaptation. Therefore, the aim of the bitrate adaptation algorithm is to optimize the real-time video streaming QoE. Unfortunately, such a QoE is a complex subjective metric, and no standardized and well-established method to estimate it exists [35][36][37]. However, in the considered scenario, based on recent studies [38,39], we can indicate the main factors that affect it. The first is the visual image quality. In the considered system, it corresponds to the average bitrate of the VR video flow. The second is the virtual scene refresh delay. However, in the considered system, the playback delay depends only on the initial jitter-buffer depth. Therefore, in the paper, we do not optimize this variable and consider it as a pre-defined system parameter. The third factor is the frame losses. To estimate its influence on the QoE, we introduce the concept of a VR session satisfied with the frame loss ratio, i.e., the session that has lost less than θ% of the frames. According to [12], a typical value of θ is approximately 0.5-2%. The final factor that is often addressed in the literature is the frequency of the video quality switches. Although we agree that this factor is important to provide an immersive experience, we do not focus on this metric specifically because the importance of this factor is controversial, and some studies [40] show that users prefer eventual switches to higher quality over constantly low quality.
To aggregate these factors and simplify the interpretation of the results, we introduce the compound QoE metric, which we call goodput. It measures the total throughput of the cloud VR sessions satisfied with the frame loss ratio. When multiple cloud VR clients share the network resources, such a metric allows considering both the visual quality of their video streams and the accuracy of the network congestion estimation algorithm, which avoids overloading the network and achieves a limited frame loss ratio.
Our goal is to design a bitrate adaptation algorithm that will improve the QoE for the cloud VR users in the described system. We estimate the QoE as the goodput of the system.
Apart from that, some papers, e.g., [51][52][53], study frameworks that assume centralized network-assisted bitrate adaptation. Such an approach simplifies the problem because intermediate network nodes have more information about the network state. Although the approach is very efficient, it requires the network to support the framework, which complicates its deployment [54].
Live streaming scenarios introduce additional constraints by limiting the maximal buffer size that is related to the playback delay. Therefore, the adaptation algorithms shall be re-designed properly. For example, the paper [55] extended the algorithm from [41] to the real-time streaming scenarios by extending the initially buffer-based algorithm with a throughput estimation-based module. In [56], the authors developed an algorithm based on buffer occupancy monitoring. They dynamically adjusted the buffer level threshold to choose the appropriate bitrate and aimed to transmit the video smoothly (i.e., with a low frequency of bitrate changes). In [57,58], the authors proposed reducing the delay by using HTTP/2-enabled server push methods. Finally, in [59], the authors developed an algorithm jointly utilizing playback rate control, latency-constrained bitrate control, and adaptive frame dropping.
Unfortunately, because of the chunk-based download structure, MPEG DASH is not fit for interactive video streaming, and even its low latency modification implies delays around a couple of seconds [60]. This structure allows the client to pre-buffer rather big amounts of video and smoothen the network capacity fluctuations. However, the bitrate adaptation for interactive video streaming is a more challenging task because the algorithms have a much lower space for the maneuver. Therefore, many papers develop various algorithms based on neural networks that predict the future channel state accurately and choose bitrates appropriately. The paper [61] designed a neural network that dynamically adjusts the bitrate and the playback rate of the downloaded video to reduce the probability of video stalling. With simulations, the authors showed that the proposed algorithm provides higher QoE than state-of-the-art ones in low latency streaming scenarios. Another neural network-based algorithm was proposed in [62]. The algorithm was designed for the remote control of unmanned aerial vehicles. It takes into account the fluctuations of the air-to-ground channel and predicts the channel capacity to stream the video with appropriate quality. Finally, the paper [63] considered a completely different approach and introduced a frame splitting technique. The video frame is divided into several sub-frames that are encoded sequentially at the cloud server. As long as the sub-frame is encoded, it can be sent to the HMD, and the server can proceed to encode the next sub-frame. This technique decreases the end-to-end latency because it smoothens the load imposed on the networks by the video flow, and the HMD starts decoding the video frame while the server is busy encoding the other parts of the same frame. On the downside, this solution requires introducing large modifications to the video codecs to compose the frames from independently decoded sub-frames.

Algorithm Description
In this section, we describe the proposed VR bitrate adaptation algorithm. The algorithm consists of two building blocks: bitrate estimation and congestion estimation. The first block, bitrate estimation, decides whether the capacity of the connection between the server and the HMD is enough for timely delivery of the video stream with a higher bitrate, or instead, the bitrate shall be reduced to avoid VR frame losses; see Section 4.1. The second block, congestion estimation, intends to estimate whether the appearance of new traffic in the network will damage the VR flow; see Section 4.2. Finally, the interaction of these building blocks is described in Section 4.3.

Bitrate Estimation
The HMD keeps track of frame delivery times to determine whether it shall switch to another bitrate or continue receiving the video in the current bitrate. For that, the algorithm maintains two exponential averages D short and D long of the frame delivery delays calculated in the time windows t short win and t long win that correspond to short-term and long-term smoothing to track both local extremes and global trends. These averages are calculated as follows.
where ∆t f rame is time passed between the measurements and x(t) is the measurement performed at time moment t. The HMD calculates the frame delivery times as the delay between the reception of the first and the last packets belonging to the same frame and updates the values of D short and D long . Given D short and D long , the algorithm decides whether the frames arrive fast enough, and the bitrate of the video can be increased without an increase in losses, or the frames arrive too slow, and the bitrate should be decreased to avoid frame losses. Otherwise, if the frames arrive as expected, i.e., with the delay around frame periodicity D period = 1/ν, the bitrate shall remain unchanged.
In the developed algorithm, the decisions to increase or decrease the bitrate are based on different frame delivery duration averages. Because the algorithm needs to be sensitive enough and rapidly decrease the bitrate if the channel conditions degrade, it decreases the bitrate if D short becomes lower than a pre-defined threshold D lower . Similarly, the algorithm decides to increase the bitrate only if D long > D upper . To improve the stability of the algorithm, we implement a hysteresis-like reset of the averages. The average that indicated the bitrate change shall be reset to a pre-defined value: D long resets to a higher value t h and D short to a lower value t l . The pseudocode of the algorithm is presented in Algorithm 1. The algorithm runs each time the video frame is received, and its complexity is O(1).
The aforementioned thresholds D lower and D upper depend on the exact set of bitrates B available for the client to choose. For example, typically, the bitrates double up the quality ladder [64]. In such a case, the threshold to increase the bitrate shall be D lower = 0.5D Period : the frames twice as big as the current ones can be downloaded on time. Similarly, the threshold to decrease the bitrate shall be D upper = 1.5D period . However, if the bitrate ladder grows with smaller steps, the thresholds may be closer to one.

Link Congestion Estimation
Apart from bitrate estimation, the algorithm estimates whether the available channel resources will be enough when the load increases because of a new heavy flow. For that, it estimates the full network capacity over the route between the server and the client and the share of the network allocated for the HMD. We employ the classical idea of the so-called fluid network model [65] and adapt it to modern wireless networks. The algorithm estimates the network capacity as the highest download rate of a small amount of data. Conversely, it estimates the average achievable network throughput as the download rate of a large amount of data. The implementation details of both of these stages are presented in the following sections.

Network Capacity Estimation
We use the P-frames as a small probing portion of data. For that, we split them into N gr equal groups of packets of equal size, except for the last one, which can be smaller. The size of each group shall exceed the maximal portion of data that can be delivered simultaneously in the network. Let each group consist of N p packets of size L MTU that is the Maximal Transfer Unit (MTU) along the path between the server and the client. The server simultaneously sends all packets belonging to the same group and spreads the groups uniformly over the inter-frame interval. When the client receives all packets belonging to a group k, it calculates the download rate of this group in the following manner.
where recvTime k [i] is the time instant when the client received the packet i of packet group k. When all the packet groups are received, the HMD calculates the network capacity as the maximum among the obtained values: The network capacity estimation is performed each time a P-frame is received. Its complexity is O BD period N p L MTU , where B is the bitrate of the video.

Network Throughput Estimation
The key frames serve as large chunks of data that we use to estimate the network throughput. They are generated relatively seldom. The HMD determines the whole frame's delivery time ∆t delivery as the delay between the reception of the first and the last packets of the frame and calculates the network throughput T est as: where L f rame is the length of the frame in bytes. The obtained value is typically smaller than the network capacity because the network resources are shared with the other traffic in the network. The network throughput estimation is performed each time a key frame is received, and its complexity is O(1).

User Number Estimation
We estimate the link congestion as the effective number of user-generated flows active in the network. Note that the real number of flows may differ, as well as their rate may differ from the rate of the considered VR flow. However, these flows generate congestion as the estimated effective number of flows. We calculate the same as in Equation (1) exponentially weighted moving averages of capacityC est and throughputT est with an averaging window length t (user) win to flatten the fluctuations. We assume that every user gets an equal share of the network resource, which is the typical mode of operation of the networks [25,26]. Therefore, we can estimate the number of users in the network as follows: where x is the round-up operator. We round the number up because this way, we overestimate the network load. Therefore, we make the algorithm more conservative and prioritize the frame losses over video bitrate. We do this because the frame losses cause video stalling, which is considered to be one of the main factors that cause cybersickness [24]. The number of users is estimated each time a frame is received. Its complexity is O(1).

Bitrate Adaptation
Once the algorithm decides whether to change the bitrate or not, it ensures that the new bitrate does not exceed the threshold C margin calculated as: If the new bitrate is higher than C margin , the algorithm chooses the best one that satisfies the condition Bitrate ≤ C margin . We introduce this threshold because the client changes the bitrate asynchronously, and the server cannot immediately proceed with the change request. Therefore, we should always expect that another client will enter the network, and the throughput will decrease to C margin . In this way, the HMD always reserves the network resources for a sudden new client. When a new HMD starts consuming its share of the network capacity, the old ones have room to take appropriate measures. We do not have to reserve resources for more than just one client because we assume that the time granularity of the adaptation is small enough to render the probability of more clients entering the network during this interval negligible.
The bitrate clipping is performed each time a frame is received. Its computational complexity is O(N bitrates ), where N bitrates is the number of bitrates the client can choose. To summarize, the total complexity of the bitrate adaptation is O 1 + BD period N p L MTU + N bitrates . The bitrate adaptation is performed each time a video frame is received, i.e., approximately once in D period . Therefore, the computational complexity of the algorithm is significantly lower than the complexity of the other processes running at the HMD and can be easily run in real HMDs.

Numerical Evaluation
In this section, we use the network simulator NS-3 [66] to evaluate the performance of the developed algorithm and compare it against the bitrate adaptation algorithms for live video streaming known from the literature. In Section 5.1, we describe the considered scenario, the parameters of the algorithm, the metrics we use to evaluate the QoE, and the set of evaluated algorithms. In Section 5.2, we present and analyze the results of the simulation.

Scenario
We consider a 5G indoor hotspot network [67] where a few (N) clients (VR HMDs) are dropped inside a 50 m circle around a 5G small base station (gNB). The gNB has a wired connection to an MEC server that processes the commands from the HMDs and generates VR video streams. Users start their VR sessions at random moments, and the session duration is a random value that has a uniform distribution from 90 to 110 s. After the session ends, the user waits for some time and starts a new session. The duration of inter-session pause is a random value that has a truncated exponential distribution with an average of 30 s and upper and lower bounds equal to 10 and 60 s, respectively. The duration of the simulation run is 1000 s, and we run 100 simulation runs for each of the considered adaptation algorithms and the number of VR HMDs. The considered scenario, for example, corresponds to an engineering office with multiple users performing modeling in VR or an indoor multiplayer game. In such a scenario, the capacity of a cell may be relatively low, and the changes in the user activity will noticeably influence the network throughput of the other users. Although, in reality, the duration of the sessions and inter-session pauses can be much longer, we assume that scaling up the session parameters by some constant will not change the effects. At the same time, it will slow down the simulations and complicate obtaining statistically significant results.
The videos are generated in one of the following resolutions: 720p, 1080p, 1440p, 2160p. The optimal choice of the bitrate ladder clearly depends on the considered cloud VR deployment scenario and the network profile. However, we aim to develop and evaluate the bitrate adaptation algorithm, so we choose a single bitrate ladder based on the example from the industry. In particular, from [64], we obtain the following bitrate ladder recommended for the considered resolutions: {7.5, 12, 24, 60} Mbps. However, the services, like YouTube, typically compress the content [68][69][70], so the analysis of the videos stored on YouTube reveals that, in practice, much lower average bitrates are used [71]. Therefore, based on this analysis, we consider the following bitrate ladder: {3.2, 6.1, 12.3, 24.8} Mbps. We use FFmpeg [72] to encode the video using the x265 video encoder with real-time encoding parameters [73], and the server uses RTP over UDP [74] to stream the video to the clients. The frame delay limit (jitter-buffer depth) is 50ms or three frames, which is the maximal tolerable feedback delay in VR scenarios [12]. Table 1 lists other parameters of the scenario.
We evaluate the QoE of the end-users with the following metrics.  We implement the algorithms and set the parameters according to the original papers and the DASH.JS implementation [75]. To adapt them to the cloud VR system, we treat separate video frames as MPEG chunks so that the algorithms stay sensitive enough to the changes in the network state. Note that extremely low buffer levels of cloud VR streaming reduce the BOLA algorithm to its part called DYNAMIC in the original paper. This algorithm estimates the network throughput as the average chunk download rate in a small time window and chooses the maximal video bitrate that is less than the estimated throughput. To take the fluctuations of the frame sizes into account, instead of the average bitrates for each of the video resolutions, we use the effective bandwidth of each resolution, i.e., the minimal network throughput that is enough to satisfy the frame loss ratio requirement. To find it, we conduct the following simple experiment. We transmit the flow over a constant rate point-to-point link and find the minimum data rate of the link that is enough to satisfy the frame loss ratio requirement. The resulting "effective bitrate" ladder is {4.3, 7.9, 16.0, 32.0} Mbps.
The parameters of the EVeREst algorithm are presented in Table 2. We chose them based on the preliminary study. However, their values could be justified in the following way. The short frame delay average shall be sensitive enough to react to the changes at the scale of a single bitrate adaptation interval. In the considered system, the server can change the bitrate only once in a GoP. Therefore, t short win approximately equals the GoP duration (set to 1 s in our experiments). A long frame delay average instead shall track the channel state, including wireless channel fluctuations and user activity. We find that, in the considered scenario, 5 s on average provides a good tradeoff between these two factors that influence the frame delivery delays perceived by the client. The same is true for the user dynamics averaging window because the channel fluctuations can also influence this estimation. Finally, the reset values after bitrate increase/decrease shall have the same order as a typical frame delay and introduce a small hysteresis to avoid multiple up/down switches in a row.  Figure 2 presents the main QoE metrics for the considered algorithms obtained with simulations. Let us first analyze the performance of the CBR algorithms. Figure 2c-d illustrates the fundamental tradeoff between the reliability and the bitrate. The increase in the resource consumption leads to a higher frame loss probability and vice versa. Figure 2a shows that when the number of clients is small, the number of satisfied sessions grows linearly, which means that all the VR sessions are satisfied. However, the further increase in the number of UEs saturates the network (see Figure 2e). Its capacity becomes insufficient to transmit all the data in time; frame losses increase (see Figure 2d), and the QoE degrades fast. No VR sessions can be satisfied after the network capacity is reached, e.g., 5-6 UEs for CBR, 1440p. As for CBR, 720p, from Figure 2e, we can see that, in the considered scenario, it underloads the network. All sessions have low frame loss probability (see Figure 2a,d), but the average bitrate and the corresponding visual quality of the video are unsatisfactory (see Figure 2c). The results of CBR algorithms help us find how much cloud VR flows at a particular bitrate would utilize the network without violating the frame loss requirements and show why we need the bitrate adaptation algorithms. Next, let us analyze the performance of state-of-the-art bitrate adaptation algorithms designed for live MPEG DASH video streaming. Figure 2a-d indicates that BOLA overestimates the throughput available to the video flow and cannot provide a satisfactory frame loss ratio when more than two UEs are present in the network. We identify two main reasons for it. The first one is the structure of the VR video flows. Because of the video compression, the frame sizes are highly variable, and the load imposed on the network by multiple VR flows fluctuates significantly. The average throughput perceived by the flows is higher than the throughput when a few large frames of different flows have to be downloaded simultaneously. Therefore, the average network throughput gives little insight into the correct choice of bitrate. The second reason is in the network state dynamics. Changes in user activity cause large throughput variations. When a new user starts a VR session, the lack of insurance from such events results in additional frame losses. The algorithm developed in [56] provides similar results, although it pays more attention to the buffer level than to the measured throughput. Furthermore, the fluctuations of the network state cause rather frequent bitrate switches (see Figure 2f); this also contributes to the low QoE.

Simulation Results
Unlike other adaptive algorithms, EVeREst satisfies VR clients regardless of the network load (see Figure 2a). Figure 2c,e shows that it provides higher or equal average bitrate and network utilization if compared to CBR algorithms until the network capacity for the corresponding constant bitrate. Although at the edge of the network capacity, the adaptive algorithm typically provides higher frame loss probability and a lower number of satisfied sessions, it achieves an average frame loss ratio lower than the requirement (θ%) up to eight UEs in the network (see Figure 2d). We must highlight that thanks to the hysteresis in bitrate switching decisions, EVeREst provides a much lower bitrate switching frequency than other considered bitrate adaptation algorithms (see Figure 2f). Compared to the other considered adaptation algorithms, the developed algorithm provides up to 10 times higher goodput and much more consistent QoE when the network conditions change. Finally, because EVeREst performs only the bitrate adaptation, it can be easily modified for the other scenario or application that requires such a conservative adaptation. However, if we relax the delay requirements, the conservatism of the developed algorithm will definitely lead to some bitrate underestimation.

Conclusions
To facilitate the further development of VR applications, cloud VR technology is introduced. Unlike standalone VR, the cloud VR HMDs only have to track the user's actions and report them to the remote server. Based on the received data, the server renders the virtual environment, encodes its image into the video stream, and sends it back to the HMD. The HMDs use wireless technologies to transmit and receive the data to enable user mobility without restricting their movements.
To provide high QoE for the end-users, cloud VR applications shall avoid image freezes and low visual image quality and minimize the feedback and environment refresh delay. Most VR applications are highly interactive, so the VR video stream cannot be rendered and downloaded to the HMD in advance. Unstable network conditions and hard timing constraints further complicate the problem.
In the paper, we addressed the problem of improving QoE for interactive cloud VR applications through the bitrate adaptation. In contrast to other papers, we assumed no network assistance and designed the application-level client-side bitrate adaptation algorithm. The algorithm considers the bursty nature of the cloud VR traffic and the variations of the network state caused by the wireless channel fluctuations and the changing activity of the users.
We compared the developed algorithm with the bitrate adaptation algorithms designed for live MPEG-DASH-based video streaming that do not assume assistance from the network. The results of the simulations show that in the scenario with multiple cloud VR flows sharing a single wireless network, the developed algorithm provides much higher QoE to cloud VR users in terms of frame delay violation probability and system goodput. In particular, it can achieve up to 10 times higher goodput than the other considered algorithms and achieves much more stable performance in a wide range of network loads.
We see the following directions for future research. Although the algorithm designed in the paper provides rather high QoE to cloud VR users, to further reduce the delays, we have to either reserve some network resources or perform admission control. However, the bursty nature of cloud VR traffic requires finding the tradeoff between the resource overprovisioning and satisfactory QoE.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: QoS Quality of Service