A Survey on QoE-Oriented VR Video Streaming: Some Research Issues and Challenges

: With the advent of the information age, VR video streaming services have emerged in large numbers in scenarios such as immersive entertainment, smart education, and the Internet of Vehicles. People are also demanding an increasing number of virtual-reality (VR) services, and service providers must ensure a good user experience. Therefore, the quality of the VR user’s experience is receiving increasing attention from academia and industry. The review in this paper focuses on a comprehensive summary of the current state of quality-of-experience (QoE) technologies applied to VR video streaming. First, we review the main inﬂuencing factors of QoE and VR video streaming. Second, the user QoE for VR evaluation is discussed. Third, the modeling of QoE for VR video streaming, the QoE-oriented VR optimization problem, and enabling techniques of machine learning for VR video streaming improvement are summarized. Lastly, we present current challenges and possible future research directions.


Introduction
With the development of Internet technology, network data traffic has shown an explosive growth trend. The most notable is the growth of video traffic, which brings huge challenges to current bearer networks. The popularization of virtual-reality (VR) applications has especially presented higher requirements for network quality and performance [1,2]. Since VR applications are dedicated to bringing immersive experiences, the perceived experience for user services is particularly important, where the perceived quality of user experience (QoE) for VR end users' services is a very important metric [3]. For the QoE of VR users, it is of great importance to understand the expectations and experiences of users for the development of existing services and the improvement of future services. Therefore, academic and industrial communities in the multimedia field show special interest in this area.
It is widely recognized that QoE is a multidisciplinary indicator that is influenced by a variety of factors from different fields. However, it can be very difficult to explicitly address the impact of all factors when they are taken into account. Most current studies on QoE evaluation are based on direct feedback from users to obtain subjective evaluations. Most subjective evaluation methods were developed for traditional video, are costly to collect user feedback, and data collection is limited by controlled environments. A traditional objective evaluation method of video quality is usually directly compared with the original video to arrive at the evaluation, which includes peak signal-to-noise ratio (PSNR) [4], structural similarity image metric (SSIM) [5], and the video-quality model (VQM) [6].
Although existing works studied the impact of user QoE on video services, most studies [7][8][9][10][11][12][13][14] only mention QoE in traditional video streaming applications. Since the interactive and immersive nature of VR video makes it vastly different from traditional flat image video, it makes traditional 2D QoE metrics not well-suited for the evaluation and optimization of existing VR video streams. Research on QoE related work for VR video streaming services has been gradually carried out over recent years.

Survey Novelty and Contributions
This survey is the first to discuss the current status of the impact and improvement of QoE on VR video streaming technology, and challenges and prospects for future development. Possible solutions for QoE assessment and optimization for VR users are critical to the success of VR service development. This survey focuses on the main trends in the application of QOE evaluation in VR video streaming optimization. Unlike previous studies that only briefly introduce the impact of QoE on VR, this paper focuses on a comprehensive survey of the main influencing factors of QoE for VR video streaming, evaluation methods, existing test platforms, and improvements of machine learning for QoE optimization. The main contributions of this paper are summarized as follows: • An overview of QoE and VR video streaming is provided, and the benefits of QoE evaluation for the development of VR service applications are illustrated; • Subjective and objective ways of evaluating VR QoE, and VR-related test and evaluation platforms are discussed; • An evaluation model for QoE based on VR video streams is investigated, the optimization of VR systems is described as a QoE optimization option, and machine learning is considered an active way to support the QoE optimization of VR video streaming (machine learning is considered to be an active way to support QoE optimization of VR video streaming); • For the enhancement of VR user experience, the challenges facing VR video streaming QoE and future research directions are described.

Survey Structure
The paper is organized as follows: • Section 2: overview of the basic background knowledge on the main QoE influencing factors and VR video streaming; • Section 3: description of QoE approach evaluation and testbed in targeting VR video applications; • Section 4: constructed QoE models are surveyed on the basis of VR video streams, and QoE-oriented optimization problems for VR video streams; research on machinelearning methods for use in QoE optimization is examined; • Section 5: recent findings are combined to provide an outlook on future research challenges and trends; • Section 6: main points of this survey are summarized.
The roadmap of our approach is shown in Figure 1. Our work summarizes the QoE influencing factors of multiple dimensions in the frequency service, and uses the QoE evaluation model to evaluate whether the current user experience needs are met to guide the QoE optimization strategy, and then effectively improve the QoE. In the research work of QoE optimization, the selection of QoE influencing factors and the measurement and evaluation of QoE are common parts. QoE modeling and optimization are carried out on the basis of the former research output.

Overview
In this section, the main QoE influencing factors are described, and the VR video streaming technology is introduced.

QoE-Influencing Factors
A study [15] considers QoE to be a research topic covering the multiple fields of computer science, social psychology, cognitive science, and economics. The study of QoE-influencing factors is the cornerstone of all QoE studies, which is difficult to directly obtain from some obvious factors, so the investigation of QoE-influencing factors needs to be comprehensively discussed. The white paper [16] on the definition of quality of experience defines influencing factors for QoE as "any characteristic of a user, system, service, application or environment whose actual state or setting may impact the user's quality of experience". According to ITU [17], the factors influencing QoE can include: the type and characteristics of the application or service, the environment in which it is used, the user's expectations and their fulfilment, cultural background, socioeconomic issues, psychological conditions, emotional state, and other factors, which may continue to grow as research progresses. From the above perspectives, we classify the influencing factors of QoE into the following categories, which include the influencing factors of the system's own properties on the experience (system factors), the influencing factors of the various external environments in which it is located (context factors), and the factors of the human's physiological and psychological perceptions of the experience (human factors), i.e,. System IF, Context IF, Human IF, shown in Table 1.

System Factors
A white paper published by Qualinet [16] indicates that the quality that an application or service technically produces is determined by system factors. QoE in video-delivery services may be affected by variations in the perceived quality of the content. Blockiness, blur, and other issues that may arise from different types of compression algorithms can lead to an unsatisfactory user experience. Meanwhile, system factors such as network QoS parameters [18] and media configurations [19] are also considered to hugely impact QoE. Dobrian et al. [18] argue that the percentage of buffering time in the total session time directly affects the QoE of the user, and the longer the buffering time is, the worse the QoE performance. A study [20] investigated other QoS parameters in the system (buffering event rate, buffering time, average bit rate) and showed great correlation with QoE.
In addition to this, studies [21] have argued that different forms of viewing content (e.g., competitions, conferences) lead to different viewing patterns, and that the user's QoE is influenced by the video content, and the user experiences different perceptual quality.
The literature [22] shows that, with a given bit rate, genres that usually contain little motion are compared to genres that contain high-speed motion (action movies) to obtain a higher perceptual quality. In summary, system factors for QoE can include network-layer (e.g., latency, throughput, packet loss, buffering event rate, buffering time, average bit rate, bandwidth), application-layer (e.g., resolution, frame rate), and service-layer (e.g., type of content viewed, application level, viewing mode) parameters.

Context Factors
Han et al. [23] argued that the user's QoE is influenced by some external factors that act on the surrounding environment, and when the user feels relaxed, this better impacts the user's quality of experience. Meanwhile, the location of the seat, viewing distance and height, lighting conditions [24], and possible disturbances [25] such as incoming calls or SMS alerts and some other physical-environment aspects may affect the user's experience. Martinez et al. considered some economic contextual situations, such as the cost of subscription type, among the factors influencing QoE. K.Yamori et al. [26] found that the user's payment for content impacts user experience. People usually show a higher experience tolerance for content with lower payment prices. A study [27][28][29] found that adding factors such as user expectations and budget, and quality pricing contributed to the accuracy of the user perception model.
In summary, environmental factors of QoE can include physical environmental (e.g., lighting, sound, location) and economic (e.g., desired price, budget) factors.

Human Factors
Both human physiological and psychological factors usually significantly impact QoE.

•
Physiological factors Most studies found that the user's physiological characteristics play a key role in in user QoE, with visual perception being of particular interest. Laghari et al. [30] analyzed a variety of factors in the human body itself (e.g., gender, age, etc.) to find the main influencing factors that may affect user perception quality. Most of these factors have been studied and modeled, but how an individual's physiological characteristics affect QoE is equally important. Owsley et al. [31] demonstrated that factors such as visual acuity and loss of contrast sensitivity due to aging can affect visibility (and annoyance) in visual impairment. However, they are hardly included in the QoE model. M.S. El-Nasr et al. [32] found that physiological deficit disorders of user vision directly impact the user experience. Colorblind users have a different perceptual experience, while stereoblindness can also hugely impact it when faced with an immersive visual experience. In addition, human auditory characteristics can impact the QoE of the medium. Saleme et al. [33] studied 360 • mulsemedia, an emerging VR application, with the aim of uncovering the physiological factors that may impact the experience. Unlike other factors, they introduced a specific factor of odor sensitivity. The study also found that women showed higher sensitivity when considering multisensory situations. P. Orero et al. [34] argue that the physiological situation can vary greatly in the assessment of the QoE in different situations due to the many different characteristics of the individual, minimizing the consideration of this aspect. • Psychological factors The user's psychological state is likely to play a large role in the level of satisfaction with the user experience. Some of the existing literature [35][36][37][38][39][40] indicated that personal psychological factors influence QoE in various ways, and Wechsung et al. [35] indicate that more variable factors, such as motivation, attention level, or user's mood, i.e., affective factors, also play an important role in dealing with QoE influencing factors.
Another study [36], on the other hand, found that the effect of emotion and multimedia experience is reciprocal. A good experience leads to good emotions, which are more likely to produce a good experience. The authors of [37][38][39] found that interest is the influencing factor that plays a decisive role in QoE as an influencing factor of emotion, and that interest may be triggered by some content that may impact the perception of QoE. In [40], the authors experimentally found that, when people watch content in which they are interested, the effect of the video quality is ignored. Regarding video quality, interest is positively correlated with QoE. In addition, the authors in [41] argued that some other factors, such as education or occupation background, can also impact the QoE of users. In conclusion, the influence of personal psychological factors on perceived quality is complex and closely related.
In summary, human-level influencing factors of QoE are more subjective factors, mostly obtained through the users' active perception. They mainly include basic user profiles (e.g., age, gender, education level), physical state (e.g., vision, hearing, smell), physical and mental states (e.g., user preferences, emotions), background (e.g., educational and occupational background), and hobbies. These influencing factors are more complex and variable than objective factors.

VR Streaming System
At this stage, most VR video services are segmented on the basis of adaptive streamingdelivery methods. Therefore, most VR video streaming systems are full-view streaming (viewport-independent), viewport-based streaming, and tile-based streaming. Next, we discuss representative streaming scenarios. These are summarized in Table 2. Full-view streaming is the most straightforward available solution for streaming 360 • content. It delivers the entire frame in the same quality as that of a traditional streaming solution; 360 • video content is projected and encoded in ERP or CMP and then delivered directly to the client without the need to obtain additional HMD information. The client is also similar to the bit rate adaptation of traditional video, and continues the request for the clip on the basis of the current network state. Afzal et al. [42] conducted experiments on a large number of 360 • YouTube videos and did not consider the viewing direction to acquire only full frames. The results show that the bit rate of 360 • videos is about 6 times higher than that of regular videos. However, their encoding efficiency is greatly limited by the need to encode the entire frame compared to the viewport-related streaming media. In addition, they consume much bandwidth and many resources that are not decoded by the viewing area.

Viewport-Based Streaming
Viewport-based streaming is based on the selective transmission of viewports that people watch, providing high-quality transmission of the part that the user watches and low-quality transmission of the rest. On the client side, the streaming endpoint device detects the user's head movements and receives only certain desired video frame areas. The client reduces the bit rate of the 360 • video stream by dynamically selecting the viewport area and adjusting the quality of the viewport. The server side stores multiple adaptation sets related to the user's orientation, and matches them on the basis of the network state and viewport-position prediction. In this approach, each viewport that can be viewed is encoded and stored in multiple quality versions to meet different delivery requirements. To guarantee a smooth playback experience, mechanisms such as viewport prediction, synchronization with user head motion, and quality adjustment are also gaining attention. Corbillon et al. [43] proposed quality-focused regions, and set the video quality in these regions above other regions, thus reducing the huge bandwidth demand of 360 • video streams and achieving adaptive viewport video streams. Sreedhar et al. [44] compared ERP, CMP, and pyramid projection, and proposed a differentiated quality approach that transmits the content in front of the projection at a relatively high resolution. Nguyen et al. [45] investigated the impact on 360 • viewport adaptive streaming of using response latency as an influencing factor. He et al. [46] proposed an FOV adaptive mechanism for the purpose of reducing bandwidth consumption. The length and size of the FoV segment were calculated on the basis of the measured network delay, so that the bandwidth could be efficiently used. Zhou et al. [47] proposed a new coding method to improve the problem of high loan consumption of 360 • video streams by coding more pixel information for the selected viewpoint direction to achieve the bit rate saving of the video.
However, in viewport-based streaming, a large amount of cache resources is required because of the need to adapt to the user's viewing direction according to network conditions, and the need for multiple versions of the content to be stored on the server side. When facing the service of live streaming, it is difficult to meet the resource-intensive coding in time because the user viewing position changes frequently, and a new high resolution is needed to replace viewport resources.

Tile-Based Streaming
In VR video streaming, video clips are generally divided into different clips on the basis of time sequences, and these video clips are then spatially divided into tiles of different sizes. The viewports are generally predicted in tile-based streaming to obtain the corresponding tiles to compose the desired viewports. Skupin et al. [48] encoded the tiles within the viewport on the basis of viewport adaptive streaming, namely, the tiles within the viewport that obtain different resolutions; by combining for different resolution tiles into the corresponding bit stream, resolution adaptation is achieved. Graf et al. [49] explored the properties of various tiles by studying 360 • videos, where each tile could be qualityadaptive projected according to different viewing areas. Results indicated significant bit rate saving compared to in the full-and partial-delivery strategies. Yu et al. in [50] divided an equirectangular video into many tiles, and where each horizontal tile was, it was assigned a sampling weight on the basis of its content. Bitrate allocation is optimized on the basis of sampling weights and bandwidth budget. The overlapping edges with two neighboring tiles are increased by blending at the edges of the overlapping tiles to overcome the probability of viewport loss.
In some recent studies [51][52][53], the tiles within the viewport that the user needed to select were hierarchically represented as multiple types in order to overcome the problem of network changes and the randomness of head movements. Ozcinar et al. [51] optimized an adaptive omnidirectional video (ODV) streamer by means of an introduced visual-attention metric to achieve optimal streaming for each selected mode using a bit rate-allocation strategy that was assigned to tiles belonging to different regions. Xie et al. [54] constructed a probabilistic-model-based viewport prediction-error handling mechanism based on tile adaptive streaming to ensure the continuity of playback within the buffer. The authors also proposed to use the saliency model to further improve viewport adaptivity in the next phase of work. In [55], the authors improved the tile-based-stream coding method by comparing different tiles sizes and choosing the corresponding tiling scheme to achieve bit-stream saving.
Typically, tile-based streaming requires only a small number of content versions on the server side compared to viewport-related streaming, so it has lower storage and processing overhead. Viewports and adjacent tiles also reduce the bandwidth cost of streaming by using different resolutions. When there is an error in the viewport prediction, the video quality is significantly reduced because the tiles have different resolutions. Therefore, there is the huge problem of how to achieve a trade-off among the quality of the video stream, the viewport prediction error, and bandwidth efficiency.

State of the Art of VR QoE
In this section, we summarize the existing research on the main assessment methods and platforms for the QoE of VR users. The assessment methods for QoE are mainly divided into two main approaches: subjective and objective assessments. QoE usually represents the human perception of video quality; therefore, it needs to be quantified by the corresponding testing platforms and procedures.

Subjective Quality Assessment
Subjective evaluation is a challenging problem. The International Telecommunication Union (ITU) has proposed many experimental subjective video-quality evaluation methods. The results of subjective-evaluation experiments are mostly based on the quality level of user opinions and the final average results, where two metrics indicating the results, MOS [56] and DMOS [57], are widely used for subjective quality assessment. Some subjective assessment methods applicable to omnidirectional video are receiving attention. Huang et al. [58] proposed a study of omnidirectional images based on single-stimulus ACR. Perrin et al. [59] used a set of publicly available dataset images to compare the quality of HDR imaging using a new subjective assessment method for omnidirectional content aimed at omnidirectional representation. Shahid et al. [60], through subjective experiments, examined how the type of video content, camera motion position, and the number of moving targets may be factors that affect video in different network stopping states on user QoE. Azevedo et al. [61] argued that different people may explore content in different ways, but visual attention and saliency are the two main aspects that need to be considered during the subjective evaluation of 360 • video content. Van et al. [62] investigated the network disease problem by subjectively testing users' quality perception, experience, perception, and load problems. Anwar et al. [63] examined two key QoE influences, perceived quality and motion sickness, by collecting subjective experience datasets from 29 users on 96 360 • viewings. The aim was to improve the motion-sickness aspects that occur during the viewing of 360 • videos. Kono et al. [64] used a subjective test of the repeated viewing of the same video sequence to quantify the relationship between video quality and presence through head movements and survey-question results. Other studies [65,66] investigated the degree of influence of resolution, bit rate, quantization parameter QP, and content characteristics on the perceived quality of 360 • videos and determined the main influencing factors for VR videos through subjective assessment tests. Schatz et al. [67] investigated the coding parameters, device type, perceived quality, and acceptability of 360 videos through subjective testing methods, and determined the number of influencing factors.
Unlike ordinary videos, 360 • videos are usually viewed using HMD devices for a better viewing experience. However, due to the limitations of human physiological characteristics, viewers can experience different degrees of adverse sensations such as strain and motion sickness when immersed in the content using HMDs. These undesirable sensations play a hindering role in enhancing QoE. Singla et al. [68] conducted a subjective quality-assessment test on tile-based 360 • video streaming to investigate the effect of video content, round-trip delay, and session duration on simulator disorders. In addition, the authors in [69,70] proposed a modified absolute category rating (M-ACR) method by using different VR devices in order to analyze the halo phenomenon in 360 • videos under different bit rate and resolution conditions. Albert et al. [71] conducted a study of several key factors of the user (size of the eye-socket area, severity of degradation, degradation algorithm). A detailed investigation was conducted to understand how system latency affects the VR user experience on desktop and HMDs. Fernandes et al. [72] investigated the relationship between viewport size, VR sickness, and perceived quality using a subjective evaluation method. Steed et al. [73] examined a subjective VR device type, experience scenario, and the effect of the external environment. Huyen et al. [74] considered the effects of latency, quality changes, and interruptions on QoE, and the experimental results showed that the method had better prediction results and helped in the subjective evaluation of QoE to quantify the different quality of switching and interruptions. Yang et al. [75] proposed an objective assessment method applicable to panoramic videos and constructed a generic objective panoramic-video quality-assessment framework consisting of several quality factors and a fusion model. Table 3 summarizes the subjective quality assessment approaches.

Objective Quality Assessment
At the present, in terms of the objective assessment of video, 2D flat video media has mature technical solutions. However, in the face of the spherical characteristics of 360 • video, the traditional flat objective quality-assessment method is directly used to analyze the panoramic video, and results cannot accurately reflect the quality of the video. Tran et al. [77] validated various traditional quality metrics and found that the traditional PSNR outperformed other pass metrics. Moller et al. [36] used data from subjective methods as a basis to predict the quality score (MOS) by objective data about the video, and results showed that some objective video quality-assessment methods advanced PSNR as a metric. Liu et al. [78] enhanced the validity of PSNR of spheres (S-PNSR) and perceptual PSNR (P-PSNR) for objective assessment of panoramic video. Objective and perceptual rate control (RC) formulas were developed to optimize these two objective metrics, enabling the best S-PSNR or P-PSNR results in panoramic-video coding. Yang et al. [76] focused on factors such as sweep suppression and the contrast sensitivity of the human visual system and conducted subjective experiments to investigate the perception of viewport adaption when quality changes occur. Zakharchenko et al. [79] proposed an objective evaluation method for weighted PSNR (W-PSNR) for 360 • video, normalizing the row-byrow addition of weighting coefficients derived from projection anisotropy. The results of this objective evaluation were confirmed by subjective visual tests.
SIM is used to define multifactor image distortion a quality-evaluation metric. Chen et al. [80] proposed a spherical SSIM (S-SSIM) metric for 360 • video by analyzing SSIM results to compare the similarity of the restored video and the original 360 • video. Tran et al. [77] performed a similar evaluation on 18 subjects. All considered objective metrics were highly correlated with the subjective results, and they attributed the source of distortion to changing the video-content format and transmitting the content over the network. Upenik et al. [81] analyzed the correlation between subjective quality and objective quality levels, and experimental results showed that in the existing objective metrics designed for 360 • videos did not reflect better subjective correlation than that of traditional objective evaluation metrics. The authors concluded that metrics developed for 360 • video content must undergo certain improvements to achieve better results. Lastly, Egan et al. [82] predicted biosensor-based QoE scores. Experimental results showed that electrical skin activity significantly contributed to the QoE score, while heart rate had a relatively small effect on the evaluation score. Table 4 summarizes the objective quality assessment approaches. Table 4. Objective quality assessment approaches.

Works
Metrics/Methods Contribution Years [77] PSNR Evaluation based on the correlation between objective quality indicators and subjective quality. 2017 [75] S-PSNR The objective evaluation and subjective relevance of this work is higher than existing methods. 2017 [78] S-PSNR and P-PSNR The Rate control scheme is effective in improving the S-PSNR and P-PSNR of panoramic video coding 2018 [79] weighted PSNR The accuracy and reliability of the proposed objective quality estimation method have been verified, and it has a good correlation with subjective quality estimation.
2016 [80] S-SSIM S-SSIM outperforms state-of-the-art objective quality assessment metrics in omnidirectional video quality assessment.
2018 [81] S-PSNR/WS-PSNR/CPP-PSNR/VIFP VIFP objective indicators provide the best performance indicators. New algorithms are also needed to better predict the perceived quality of omni-directional content.

[82] Heart Rate and ElectroDermal
The first work to show the real relationship between the EDA/HR combination and the QoE of users in an immersive VR environment.

VR-Related QoE Evaluation Test Platform
Due to the specificity of VR 360 • video experience, the design of subjective and objective assessment test platforms impacts the evaluation results. A number of studies designed test platforms according to their focuses.
Ahmadi et al. [83] proposed a test platform using an HMD as a display device regarding omnidirectional video and images. Upenik et al. [84] established a testbed for the subjective evaluation of VR, and to show the applicability of their testbed, the authors collected mean opinion scores (MOS) for 360 • images and videos of different quality levels through the testbed. Subjects' scores, orientation, and consumed time can be tracked from the testbed during each assessment session. Regal et al. [85] implemented a QoE testing platform for VR users using Unity, where testers were asked to fill in a questionnaire regarding their VR experience. During the test, the collected scores were stored in CSV files for result analysis. Singla et al. [69] constructed a QoE testbed and recruited 28 subjects to evaluate six 360 • videos downloaded from YouTube, considering two commercial HMDs and two resolutions in the experiment. The disorders induced by viewing 360 • videos in HMDs were assessed by subjective evaluation. Bessa et al. [86] investigated the question of whether the experimental platform increased subjective QoE levels for 3D (stereoscopic) views compared to 2D views. By recruiting 63 participants, half of the participants watched the 2D version of the video, while the other half watched the 3D version. Singla et al. [70] developed a QoE testing platform, and recruited 28 subjects to evaluate six 360 • videos downloaded from YouTube, and two commercial HMDs and two resolutions were considered in the experiment. Illnesses induced by viewing 360 • videos in HMDs were assessed by subjective evaluation. Schatz et al. [87] considered VR-based training applications and investigated how the type of scene affected subjective scores and task performance. Hupont et al. [88] developed a QoE assessment procedure for games using HMDs and compared them with traditional displays regarding realism, and willingness to use for mobility; results indicated that HMDs all showed better QoE results. Han et al. [89] compared the assessment results of user QoE under different external conditions by both offline and online methods, which helped the system to make appropriate assessment choices. Gomes et al. [90] used a crowdsourcing approach to study the quality of experience (QoE) of VR self-driving cars through an Internet-based evaluation task to investigate system and human influences on the influence factor (IF) of the self-driving simulation. Midoglu et al. [91] constructed a measurement platform that correlates objective metrics with subjective user ratings for 360 • video streams. Simone et al. [92] analyze the quality of experience (QoE) of users by collecting subjective and objective data in VR interaction states. Although these studies shed some light on the testing and evaluation of QoE platforms for 360 • video, they are not involved in the optimization of 360 • video streams using QoE.

QoE-Oriented Optimization for Virtual-Reality Video
With the emergence of multimedia services and real-time application services, research on QoE intensified step by step, and the research on QoE optimization methods has been diversified. This section, we present the construction of a model about QoE based on VR video streams and the QoE-oriented VR optimization problem, and the QoE optimization strategies powered by applying machine-learning methods.

VR QoE MODEL
QoE models are built on the basis of the user's perceptual elements, and they are an important bridge in user and multimedia communication, both reflecting the user evaluation elements and act as a basis for later user optimization. Some of the most advanced QoE models [93][94][95] were examined, but they were all constructed in a 2D environment. Traditional QoE models for 2D video usually use video bit rate and network QoS parameters as input model input conditions.
To enable VR users to obtain a sense of immersive experience similar to real-world perception, VR video QoE tacitly considers a variety of factors related to VR characteristics such as stall speed, quality switching, and pauses. Several studies [96][97][98] selected corresponding QoE models for VR videos in order to better reflect the viewing experience of 360 • users. Kim et al. [96] established a relevant QoE prediction model by the fine-grained analysis of user perceptual motion characteristics and statistical content features affecting user motion perception from user's physiological characteristics, which aims to predict the degree of VR illness when watching VR videos to ensure a comfortable viewing experience. Experimental results showed that the correlation between the QoE model proposed in this paper for subjective disease scores reached 72%. Yao et al. [97] studied and derived an HMD-based QoE model for 360 • video through an open-source 360 • video player for VR users, which explores diverse VR projection schemes. Experimental results showed better accuracy for the obtained QoE model with good scalability. The authors argued that more factors or more complex models are needed to improve modeling accuracy. Xie et al. [98] modeled this analytical process by studying the perceptual response to a mass change over a certain time period. Cross-validation of the data showed that the model exhibited very accurate mass estimates. Experimental results showed that both Pearson and Spearman's rank correlations exceeded 0.98. Yu et al. [99] used a psychological mapping model based on the Weber-Fechner law [100,101], combined with existing research [102,103] assessment models and elements, and used the logarithmic function as the basic mapping relationship between QoE and VR video features to construct a relevant QoE model. Han et al. [104] examined the need to simultaneously guarantee the overall QoE maximization during the adaptive streaming of 360 • video in an environment with high network variability. The authors modeled this by using the QoE impact factors mentioned in the literature [105], where the QoE model is denoted as Equation (1), and Equation (2) represents the calculation process of the important parameter (impact of quality transformation) in the QoE model.
In the QoE model, in order to balance the impact of each major factor on the overall VR video QoE, they were assigned different coefficients. Where the bit rate of the video is denoted as the mapping function q(R), the damage effects of stall, quality switch, and startup time are denoted as t stall , Q switch and t startup , respectively. Experimental results showed that Q360AS has better QoE performance than that of PERCEIVE in streaming media.
Saxena et al. [106] studied virtual-reality headsets, modeled QoE for wireless-transmission intermittent behavior and user head-position prediction on the basis of 360 • different video streams, and proposed an MDP-based algorithm to accurately measure and optimize the cost and QoE per user. Roberto et al. [107] produced a platform for 360 • video visual-quality assessments. The platform provides access to multiple 360 • video viewport objective quality features and combines these features into QoE models that closely match subjective quality scores for a variety of different conditions. Hu et al. [108] used a QoE model for rate adaptation to maximize the utility of the QoE model under constraints by weighting the QoE parameters. Experiments show that VAS360 improves the user experience, and the quality of the viewport-adaptive solution was 23-45% better than that of the non-viewportadapt ive solution.

QoE-Oriented VR Optimization Problem
QoE currently represents the user's perception level well, so the optimization of VR video streams through QoE-oriented approaches is gaining increasing attention. They are also summarized in Table 5. Most current approaches [54,109] focus on optimizing QoE goals through specific heuristics. However, since user preferences differ, and usage scenarios vary greatly, using a single approach to optimize a specific QoE goal sometimes does not always yield good results. Therefore, it is important to choose the right optimization direction.
Existing 360 • video streaming systems mainly focus on optimizing specific qualityof-experience (QoE) goals through fixed heuristics [54,109]. However, users may have different preferences for QoE goals, and thus methods designed for specific scenarios cannot provide high QoE for all users. In addition, most existing methods rely on accurate predictions of future bandwidth and viewpoints, while dynamic changes in real scenarios significantly degrade the performance of these methods. A 360 • video viewport adaptive system driven by tile-based viewport QoE optimization is introduced by Xie et al. [54]. Figure 2 shows the architecture of the system. On the server side, a 360 • video is cropped by the video cropper with the encoder to generate an MPD document and stored on an HTTP server. A 360ProbDASH service is provided to the client. On the client side, we integrated additional modules in the DASH adaptation algorithm. (1) Direction Prediction: predicts the direction of user head movement; (2) Bandwidth Estimation: estimates the corresponding time-varying throughput on the basis of download duration; (3) QR Map: generates quality-rate (QR) maps for all segments on the basis of attributes in the MPD; (4) Viewport Probability model: calculates the viewing probability of each tile with reference to the user's directional prediction error; (5) Target-buffer-based Rate Controller: controls the buffer to stay at the target level; (6) QoE-driven Optimizer: determines the optimal download segment involved in an HTTP GET request based on information from Modules 3-5. The system minimizes the expected spatial variation of quality distortion bricks and quality as the QoE optimization objectives under the constraint of the total transmitted bit rate. Experimental results showed that the method exhibited better results under the evaluation of some target metrics. Hu et al. [110] proposed a novel 360 • video streaming algorithm based on user viewing behavior, improved the accuracy of viewport prediction by tile view maps constructed from real user line data, and optimized user QoE by saving limited bandwidth. Xie et al. [111] proposed a system for cross-user learning by studying a real VR user dataset and combining the obtained user viewing patterns. The optimization problem driven by tile rate allocation asking as QoE is optimized with the expectation of distortion under minimization. Wang et al. [112] designed a 360 • video self-adaptation scheme based on QoE optimization, and to ensure the maximization of multiuser (QoE) optimization and fairness, they jointly optimized the code rate delivery and cache decisions. The experimental results showed that this optimization method improves the cache hit rate and QoE performance compared to other methods. Zhang et al. [113] proposed EPASS360 to predict users' future views by mining patterns in other users' historical trajectories, set a QoE objective on the basis of prediction results with allocation strategy, and formulated a QoE optimization function to obtain the optimal rate allocation of tiles through a balanced selection. Experimental simulation results showed that EPASS360 was more competitive than advanced streaming performance, and QoE was improved in a variety of scenarios.

Server Side Client Side
He et al. [114] designed a tile-based hierarchical coding framework for encoding spatial and temporal features of 360 • videos. It is also implemented in the client-side optimization process to optimize the QoE for the user. Figure 3 in this paper shows the comparison of the Rubiks system with existing optimized streaming algorithms. In this figure, YouTube [47] indicates: streaming all data for the entire 360 • frame to the client, FoV-only [115] indicates: streaming only the tiles predicted in the user's FoV. FoV+ [116] indicates: selecting the surrounding area on the basis of the estimated prediction error of the FoV. As the figure shows the optimization algorithm needs to consider the three important metrics of bandwidth saving, decoding speed and video quality at the same time. If excessive pursuit of video quality is considered as in Youtube, bandwidth and decoding speed are sacrificed. If bandwidth saving is needed to increase the decoding speed, the video quality is degraded, such as in FoV and FoV+. The Rubiks system wants to achieve the optimization goal by changing the encoding method of the tiles.

YouTube
FoV Only FoV+ Rubiks Yu et al. [117] proposed a QoE-based video-adaptive method that combines QoE evaluation metrics and proposed an objective optimization-process function that adapts to the problem QoE; to ensure the flexibility of concealing, various preferences QoE parameters are adjustable. Experiments show that the method shows excellent results in different network environments. Perfecto et al. [118] proposed a method based on Lyapunov framework for the multicast problem maximizing the full network HD frame admission as a QoE optimization problem with the constraints of a low-latency and high-reliability optimization problem. Simulation results showed that content reuse with highly overlapping user clusters due to multicast reduces VR frame latency by 12%.
Recently, several studies have found that the encoding of 360-degree video VR can have an impact on QoE. Appropriate settings of the video encoder driven by QoE aim to be optimal to achieve user QoE. Qian et al. [119] combined a subjective QoE model with an encoder parameter model to propose a QoE maximization problem with encoder adaptation as a constraint. Experimental results show that the proposed encoder adaptation scheme has a significant improvement on the user QoE. Tran et al. [65] investigate the effect of features such as encoding parameters and device type on the QoE aspect through principal observations and show that the video quality will be affected when the encoding level is reduced. Yang et al. [120] fully consider the encoding bit rate of each tile of VR video under ERP projection during resource allocation (RA), also considering the channel quality of each tile and user equipment (UE), and formulated this as a non-deterministic polynomial (NP)-hard problem, so a low-complexity approximate convex algorithm is proposed to solve it. The simulation results show that the overall viewer quality of experience (QoE) is significantly improved. Graf et al. [49] describe tile's implementation of bandwidth efficient adaptive streaming using modern video codecs such as HEVC/H.265 and VP9 to evaluate the quality of the viewport PSNR. Guan et al. [121] propose a video streaming system by considering the balance between perceptual quality and video coding efficiency; a variable-sized tile tiling coding scheme is proposed. The experimental results show that the perceptual QoE can be improved while reducing the bandwidth. Table 5. Summary of QoE-oriented VR optimization strategies.
Use contextual information to improve the client's bit rate selection strategy.
QoE-aware/driven adaptive streaming based on user data Set QoE targets based on forecast results and allocation strategies [110,111,113].
Reduce content delivery latency and improve network resource utilization.
Video encoder with appropriate settings to improve user QoE Tile-based layered coding provides a balance between quality and video coding efficiency [49,114,120,121].
Improved tile coding method, Optimize resource allocation (RA), Increase the efficiency of QoE and bandwidth usage.

ML-Based Approaches Improve QoE
The current development of machine-learning techniques is accompanied by their use in several fields. Some recent studies are on the use of machine-learning tools to evaluate these QoE models and participate in QoE performance optimization efforts. Costa Filho et al. [122] proposed a VR performance model applying techniques from both playback parameters and perceived QoE parameters through machine learning to predict the adaptive performance of VR systems while analyzing the impact of the network on VR streaming. The results showed good accuracy. Li et al. [123] considered data from human eye and head movements using a DRL model for the quality evaluation of 360 • videos. Yang et al. [75] considered individual pixels, regional superpixels, salient objects, and complete projections with inputs from multiple scales of the backpropagation (BP) algorithm. A quality assessment of a VR system (QAVR) metric was constructed. Li et al. [124] predicted the probability of viewing the possible viewports in the next phase and determined the extent of their impact on the expected QoE by CNN. Wu et al. [125] proposed a deepreinforcement-learning (DRL)-based approach ABR decision mechanism based on 360 • video streaming of tiles and implemented QoE, which can be adapted to multiple objective preference goals through a designed DQN model with preference encoder and customization. Ban et al. [126] used the mean field actor-critic (MFAC) algorithm to request viewing tiles with the aim of minimizing the bandwidth usage of the core network and maximizing the QoE with the user.

Discussion: Challenges, Issues Future Directions
In this section, we discuss the main challenges faced by QoE in VR video streaming applications and possible future research directions.

Challenges and Impacts
Currently, although most studies [49,99,127,128] conducted various subjective and objective evaluations for 360 • videos, most evaluation methods are still directly selected from traditional video-evaluation criteria. There is a lack of uniform and standardized impact factors for 360 • videos. Standard evaluation methods are not yet finalized. This is a complex and challenging issue.
A study [129] with equivalent HD viewing experience for 360 • video required a viewport with 4K by 4K resolution at 60 fps, and a video requires 12K resolution at 400 Mbps bit rate. Different bit rates also result in a wide range of bandwidth requirements. With the further development of the VR experience level, various bandwidth and latency requirements are more stringent, and it is a challenging task to optimize the QoE metrics to ensure higher quality of experience (QoE) for 360 • video all the time.

Research on User Personalization Modeling for VR Video Streaming QoE
For VR video streams, the measure of VR video streams merit is quality of VR users' experience. At present, the main adopted measures are the PSNR and the utility function approach [130,131]. For VR video viewing through a head-mounted display, many factors such as the parameters related to the head-mounted display, and the rendering latency can affect the QoE experience of VR users. Therefore, according to a user's own institutional characteristics, adopting a more accurate mathematical model to describe the relationship of VR video streaming QoE and using the established applicable personalized QoE model as the goal of optimization is a problem worthy of in-depth investigation in future VR research.

Trade-Off of QoE Based on MEC Solutions
Currently, with the high speed of the fifth-generation communication technology, multiaccess edge computing (MEC) is considered [132] to have a driving role in VR development. Therefore, it is a very interesting research point to investigate the trade-off between the need to ensure high bandwidth and low latency in MEC-based VR systems and QoE optimization. QoE optimization of VR systems needs to consider reducing the cost of caching and computing while ensuring the user viewing experience. Balancing user QoE with cache and compute resource costs with MEC involvement is a future research direction.

Conclusions
With the spread of new VR devices and the increasing popularity of converged networks to provide VR applications, it is now a challenge to satisfy customers with a high-quality multimedia service experience by enabling users to perceive and evaluate the quality of experience of VR streaming services. To introduce readers to the latest and most widely used VR streaming technologies, the description of VR streaming services over the Internet focuses on viewport-and tile-based standards. Research shows that QoE assessment for VR requires an interdisciplinary perspective, so we chose to provide a comprehensive description of QoE influencing factors in terms of system, user, and context, with QoE multidimensional influencing factors generally serving as the basis for the entire QoE study. However, along with the continuous updating and development of VR services, the unification of QoE impact factors is not fully achieved.
QoE evaluation is based on platform measurements while considering the complex relationship between VR user characteristics and streaming system characteristics. We summarized the QoE assessment methods for VR and the related assessment platforms, which include two mainstream methods: subjective and objective assessment methods.
In QoE optimization, QoE-oriented elaboration adjustments are made, which in turn cycle to effectively improve QoE. On this basis, we investigated and discussed the construction of QoE models for VR users and QoE optimization for VR systems. The survey also extensively explored solutions for QoE evaluation and optimization using the emerging technology of machine learning. We also proposed and discussed future research needed in the following directions: QoE VR user-personalized QoE modeling studies and QoE trade-offs based on MEC solutions. The survey and current state of research provided in this paper could help readers to understand the direction of needed work.
Author Contributions: This work was mainly performed by J.R. (planning of the work, conceptualization, investigation, methodology, data curation, formal analysis, resources, software, visualization, and original draft preparation) and was completed with key contributions from D.X. (planning of the work, conceptualization, supervision, validation, manuscript review and editing and funding acquisition). Both authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: