A Subjective Study on User Perception Aspects in Virtual Reality

: Three hundred and sixty degree video is becoming more and more popular on the Internet. By using a Head-Mounted Display, 360-degree video can render a Virtual Reality (VR) environment. However, it is still a big challenge to understand Quality of Experience (QoE) of 360-degree video since user experience during watching 360-degree video is a very complex phenomenon. In this paper, we aim to investigate four QoE aspects of 360-degree video, namely, perceptual quality, presence, cybersickness, and acceptability. In addition, four key QoE-affecting factors of encoding parameters, content motion, rendering device, and rendering mode are considered in our study. To the best of our knowledge, this is the ﬁrst work that covers a large number of factors and QoE aspects of 360-degree video. In this study, a subjective experiment is conducted using 60 video versions generated from three original 360-degree videos. Based on statistical analysis of the obtained results, various ﬁndings on the impacts of the factors on the QoE aspects are provided. In particular, regarding the impacts of encoding parameters, it is found that the difference of QoE is negligible between video versions encoded at 4 K and 2.5 K resolutions. Also, it is suggested that 360-degree video should not be encoded at HD resolution or lower when watching in VR mode using Head Mounted Display. In addition, the bitrate for good QoE varies widely across different video contents. With respect to the content motion factor, its impact is statistically signiﬁcant on the perceptual quality, presence, and cybersickness. In a comparison of two rendering device sets used in this study, there is no statistically signiﬁcant difference found for the acceptability and cybersickness. However, the differences of the perceptual quality and presence are indicated to be statistically signiﬁcant. Regarding the rendering mode, a comparison between VR and non-VR modes is also conducted. Although the non-VR mode always achieves higher perceptual quality scores and higher acceptability rates, more than a half of the viewers prefer the VR mode to the non-VR mode when watching versions encoded at the resolutions of fHD or higher. By contrast, the non-VR mode is preferred at the HD resolution.


Introduction
Thanks to cost decreases and usability increases of virtual reality (VR) devices, 360-degree video (or 360 video for short) has gradually been gaining popularity on streaming platforms such as YouTube and Facebook in recent years.Unlike traditional video, 360 video is capable of providing a 360-degree view of a scene at the same time, and so immersive viewing experience to users.However, 360 video requires much higher network bandwidth than traditional video [1][2][3].To provide excellent user experience, 360 video should have high quality and high resolution, resulting in a bulky size.Therefore, for effective delivery of 360 video, it is crucial to obtain in-depth understandings of Quality of Experience (QoE) as well as QoE-affecting factors of 360 video.
For traditional video, two key QoE aspects are perceptual quality and acceptability, which have been considered in many previous studies [4][5][6][7].Hence, in this study, we also investigate the two QoE aspects in the context of 360 video.Perceptual quality refers to the degree of user satisfaction of video quality being displayed on rendering devices.Acceptability indicates whether a service or application is acceptable to users.In most previous studies, to obtain the acceptability of a video, viewers in subject tests are asked to give a binary score, i.e., whether a video is acceptable or not [5,8,9].Then, the acceptability rate which is determined as the ratio of the number of viewers accepting a video to the total number of viewers is used as an indicator to represent the acceptability [5,8,9].In the literature, it is well-known that the perceptual quality and the acceptability of a traditional video may be significantly affected by video compression and transmission [4,10], which could cause video distortions such as blocking and blurring artifacts.
In case of 360 video, other QoE aspects such as cybersickness and presence should be considered as well.Presence refers to the sense of "being there" in the VR environment with interactions like in the real environment [11].Cybersickness is the feeling of dizziness or nausea while watching a video.It is expected that users feel more present in a VR environment compared to watching traditional videos.This is one of main different points of user experience between watching 360 videos and traditional videos.Therefore, the presence aspect is also studied in this paper.In a subjective experiment conducted in [12], a large percentage of viewers reported on cybersickness.So, viewers in our experiment are asked to assess their cybersicknesses as well.In the future, our research will be extended with some other QoE aspects such as enjoyment and usability.
So far, existing studies on QoE aspects as well as QoE-affecting factors of 360 video are still very limited [12][13][14][15][16][17].Most previous work focuses on the perceptual quality aspect [15][16][17].There are some studies on the presence aspect [13,14] and the cybersickness aspect [12,16].These studies will be discussed in detail in the next section.So far, no previous study has investigated the acceptability aspect of QoE of 360 video.Regarding QoE-affecting factors, the impact of rendering mode on QoE aspects has not been evaluated in previous studies.Also, the impacts of other factors such as content motion, encoding parameters, and rendering device have not been fully understood.
To fill this gap, our study focuses on measuring four QoE aspects of 360 video, namely perceptual quality, presence, cybersickness, and acceptability.In addition, we conduct a subjective experiment to investigate four QoE-affecting factors of encoding parameters, content motion, rendering device, and rendering mode.To the best of our knowledge, this work is the first to cover a large number of factors and QoE aspects of 360 video.
The preliminary result of this work has been presented in [18].In this paper, we evaluate more factors.In particular, the impact of the content motion factor is additionally investigated in detail.Also, we evaluate the impact of wearing a Head-Mounted Display (HMD) and the HMD's Field of View (FoV) on user experience.In addition, the relationships between the QoE aspects are examined.Moreover, statistical analyses and much further discussions are additionally presented for each of the considered factors in this work.
The remaining of this paper is structured as follows.In Section 2, we describe an overview of main QoE aspects and key factors affecting QoE of 360 video.Also, related work to these factors and QoE aspects is presented in this section.Section 3 describes the QoE aspects and factors considered in this study as well as settings of our subjective experiment.Section 4 analyzes and discusses the experiment results.Finally, Section 5 concludes the paper.

Overview and Related Work
In this part, we first present key factors and main QoE aspects of 360 video.Then, related work to these factors and QoE aspects is presented in detail.

Key Factors and Main QoE Aspects
In Figure 1, we summarize the key factors and main QoE aspects of 360 video.In this study, we cluster these factors into three groups.The first group includes the factors relating to user characteristics such as gender, age, and usage history.In the second group, the included factors relate to system characteristics such as encoding parameters, rendering device, rendering mode, and delay.The third group consists of the factors related to content characteristics of 360 video such as content motion and self-avatar.Here, the term "content motion" refers to all motions in a content including camera motion, object motion, and background motion.According to Recommendation ITU-T P.10/G.100[19], QoE is "the degree of delight or annoyance of the user of an application or service".In fact, QoE is conceived as a complex and multidimensional concept that consists of multiple aspects.As mentioned in Section 1, four main QoE aspects of 360 video are perceptual quality, presence, cybersickness, and acceptability.

Related Work
Although QoE of traditional video has been investigated in a considerable number of previous studies [20][21][22], researches on QoE of 360 video are still very limited.A summary of related work on QoE of 360 video is presented in Table 1.It can be seen that most existing studies focus on the perceptual quality [16,17,[23][24][25][26].There are only a few studies investigating other QoE aspects such as the presence [14] and cybersickness [16,17].So far, there have been no study on the acceptability of 360 video.In the next parts, the related work of each of the QoE aspects of 360 video is presented in detail.Rendering device [17] [17] Delay [16] [16] Content motion [17,26] [16,17] Self-avatar [14] 2.2.1.Perceptual Quality As shown in Table 1, previous studies have investigated the impacts of various factors on the perceptual quality of 360 video such as gender, encoding parameters, rendering device, delay, and content motion.In [16], it is found that the delay (i.e., Motion-to-High-Resolution Latency) has a considerable effect on the perceptual quality, whereas the gender does not.
Previous studies have shown significant impacts of encoding parameters on the perceptual quality, such as quantization parameter (QP) [25], resolution [16,17,[23][24][25], framerate [24], and bitrate [23,24,26].In comparing the impacts of encoding parameters, a conclusion in [24] is that the resolution and frame rate have very similar impacts on the perceptual quality, and their impacts are more significant than that of the bitrate.
In [17], the authors compare two rendering devices of Oculus Rift and HTC Vive in terms of perceptual quality.Experimental results show that the Oculus Rift provides a slightly worse perceptual quality than the HTC Vive.However, the difference is not statistically significant.Therefore, the impact of device type seems trivial.
With respect to the factor of content motion, its considerable effect on the perceptual quality has been found in [17,26].However, analyses and discussions on the impacts of specific content motions such as camera motion, object motion, and background motion are not presented.

Presence
In [14], the authors study the impact of self-avatar on the presence of 360 video.It is found that the self-avatar has a positive effect on the presence.In addition, the asynchrony between the user's and the avatar movements may cause negative impacts on the presence.
In [13], the authors show that, by providing an enhanced presence comparing to traditional video, 360 video could be used effectively in education, training and rehabilitation.Also, a solution to apply 360 video for travel services is proposed in [27].

Cybersickness
While watching 360 video, users may feel nauseous or dizzy, which is called "cybersickness" [28,29].Such a symptom usually appears when a user perceives differences between content motions and his/her movements [30].In [31], it is found that an abrupt speed change in vibrational and translational camera motions can lead to cybersickness.To alleviate cybersickness in first person 360 video, a solution using image processing and a wearable omnidirectional camera is proposed.
Previous studies have also shown significant impacts of gender [17], resolution [17], and content motion [16,17] on the cybersickness.Meanwhile, no significant effect on the cybersickness is observed for the two factors of rendering device [17] and delay (i.e., Motion-to-High-Resolution Latency) [16].

Factors and QoE Aspects
A summary of factors and QoE aspects considered in this paper is shown in Figure 2. On the content delivery path, 360 video is affected by various stages, from a server, networks, to a rendering device.As shown in Figure 2, our study aims to examine the influences of four factors (i.e., content motion, encoding parameters, rendering device, and rendering mode) on four QoE aspects, namely perceptual quality, presence, cybersickness, and acceptability.In this study, we use Mean Opinion Score (MOS) as an indicator of the perceptual quality aspect, which is similar to most previous studies [16,24,32].Based on Recommendation ITU-T P.800.2 [33], the MOS of a video is calculated by the average of viewers' scores collected via a subjective experiment.Since the presence and cybersickness are new QoE aspects, MOS is currently also deployed in our study to represent these two aspects.Regarding the acceptability aspect, we employ the most commonly used indicator called acceptability rate as mentioned in Section 1.
In the next part, the factors investigated in this study are described in more detail.

Content Motion
During watching 360 video, users will immerse themselves in a VR environment.Hence, content motions are expected to have strong effects on the QoE of 360 video.In this study, we use three 360 videos from YouTube corresponding to three different contents, denoted Content #1, Content #2, and Content #3.These contents are selected based on the feature of content motion.Here, each content is represented by three specific motions of camera motion, object motion, and background motion.Detailed descriptions of these contents are presented in Table 2. From Table 2, we can see that the three contents used in this study have different levels of camera motion speed, namely static (i.e., Content #1), medium (i.e., Content #2), and fast (i.e., Content #3).The snapshots of these contents are shown in Figure 3.

Encoding Parameter
It is well-known that QoE of traditional video is strongly affected by encoding parameters such as QP, frame rate, and resolution.In this work, a subjective experiment is carried out to study the effects of two encoding parameters of resolution and QP on the QoE aspects of 360 video.
In particular, we investigate 20 combinations of five QP values and four resolutions as shown in Table 3.An investigation of other parameters is reserved for future work.

Rendering Device
One of the key challenges when studying QoE of 360 video is the diversity of rendering devices.Today, there are different types of mobile phones as well as HMDs that could be used for 360 video rendering.However, understandings of how rendering devices affect QoE aspects are limited because only one device set is usually employed.In this study, comparisons between two different device sets are conducted.Moreover, the impacts of wearing HMD and Field of View (FoV) on user experience are investigated.

Rendering Mode
In this work, we also evaluate the influence of rendering mode on the perceptual quality and acceptability of 360 video.In particular, we carry out a subjective experiment to assess the QoE aspects in two rendering modes, namely VR and non-VR.Also, our experiment investigates which one among the two rendering modes is preferred by users.In the VR mode, a mobile phone and an HMD are used to display 360 video versions.Users can adjust their viewing directions by turning their heads.In the non-VR mode, 360 video versions are viewed on a mobile phone without using HMD.Users can change their viewing directions by moving the phone.

Subjective Experiment
For the subjective experiment, we use the three videos as mentioned in Section 3.1.1.The duration of each video is 30 s.All the three videos are in Equirectangular projection format [34].To generate 20 different versions of each video, we use H.264/AVC encoder (libx264) and 20 combinations of QP values and resolutions as mentioned in Section 3.1.2.Note that, the frame rate of all the versions is fixed at 30 fps.Totally, 60 versions are generated from the three videos.It is worth noting that H.264/AVC is still most widely used on the Internet, and the following discussions on the impacts of encoding parameters are specific to this format.
In our experiment, we use two rendering device sets, denoted D1 and D2, to display video versions.In particular, device set D1 consists of a Samsung Galaxy S6 phone and a Samsung Gear VR HMD.The Samsung Galaxy S6 has the display size of 5.1 inches and the screen resolution of 1440 × 2560 [35].The Samsung Gear VR has a FoV of 96 degrees [36] covering 80% of human FoV [37].Device set D2 consists of a Samsung Galaxy S5 phone and a Google Cardboard HMD.The Samsung Galaxy S5 has the display size of 5.1 inches and the screen resolution of 1080 × 1920 [35].The Google Cardboard has a FoV of 90 degrees [38], which is able to cover 75% of human FoV.
Table 4 shows the questions employed in our experiment.Questions Q1 and Q2 are respectively used to measure the perceptual quality and presence.The acceptability is evaluated in Question Q3.Question Q4 is a comparative question between the non-VR and VR modes.The degree of cybersickness is measured using Question Q5.Questions Q6 and Q7 are to investigate the effects of wearing HMD and FoV on user experience.

Q1
How is your assessment about the perceptual quality of the video on the scale from 1 to 5?

Q2
How is your assessment about the sense of presence in VR environment on the scale from 1 to 5?
(1 means that you have absolutely no sense of presence, and 5 means that you have a true sense of presence as in a real environment).

Q3
Is this viewing acceptable to you? (1 means that you accept and are willing to watch until the end of the session, and 0 means that you do not accept, feel annoying, and want to quit the session).

Q4
Which do you prefer, non-VR rendering mode or VR rendering mode?(0 is non-VR and 1 is VR).

Q5
How is the level of dizziness or nausea during VR viewing experiment on the scale from 1 to 5?
(1 means very dizzy, and 5 means not dizzy at all).

Q6
How much does wearing a VR head-mounted display affect your experience in VR environment on the scale from 1 to 5? (1 means very cubersome and annoying, 5 is absolutely no problem).

Q7
How much does the FoV of the device affect your sense of presence in VR environment on the scale from 1 to 5? (1 means very limited, 5 is absolutely no problem).
Regarding the test processing, viewers are firstly trained to get familiar with the rating procedure and the rendering devices before doing actual subjective tests.Then, with each viewer, the display order of the three videos is randomly determined.For each video, each of the twenty versions is randomly chosen, and then displayed two times in the non-VR and VR modes.In particular, each viewer watches the chosen version in the non-VR mode in the first display time, and then gives answers to questions Q1 and Q3.Then, the viewer watches the same version in the VR mode in the second display time, and then gives answers to questions Q1-Q4.Note that this choice of experiment design is to identify a viewer's preference between the VR and non-VR modes.After completing all versions of a video, the viewer gives answers to questions from Q5 to Q7.It should be noted that the viewers orally give their answers, which are recorded by an assistant.
To rate all the video versions, it would take a viewer about 4 h.Such a long time may cause negative influences on ratings of the viewers.Therefore, the viewers are randomly divided into two groups.Total time of a viewer in each group is about 2 h, of which one hour is spent watching versions.In addition, to avoid the negative impacts of fatigue and boredom, there are a break of 10 s after each version and a rest of 20 min after every 20 versions.Total time for all of the viewers is 72 h.In the experiment, each viewer utilizes just one device set to prevent annoying when watching a version many times.So, there are two sets of results, corresponding to the two device sets.Totally, our experiment has 36 viewers between the ages of 20 and 37. Eighteen of them use device set D1 and the rest uses device set D2.To determine and exclude outliers, we conduct a screening analysis of the subjective test results following Recommendation ITU-T P.913 [39].As a consequence, there is no outliers.The average of the viewers' scores for each question is used as the mean opinion score (MOS) of that question.

Subjective Results and Discussions
In this section, we discuss the impacts of the four factors of (1) encoding parameters, (2) content motion, (3) rendering device, and (4) rendering mode on the QoE aspects of 360 video.Note that the discussions below are mainly based on results from device set D1 since it has better quality.When necessary, results from device set D2 are mentioned for comparison purpose.In addition, in the figures reported in this paper, the error bars represent 95% confidence intervals.
To analyze the impacts of the factors on each of the QoE aspects, Kruskal-Wallis test is carried out on the obtained subjective results.Specifically, the Kruskal-Wallis test is used to determine if there are statistically impacts of factors on QoE aspects.Based on Cohen's conventions [40], eta-squared values η 2 can be used to interpret effect sizes.In particular, thresholds of η 2 are respectively 0.01, 0.06, and 0.14 for "small", "moderate", and "large" effect sizes.
In [9], the authors define the acceptability rate higher than 60% as the quality level which can please most viewers.In this study, we define "good" QoE as the acceptability rate higher than 60% and the presence and perceptual quality scores higher than 3 MOS.

Evaluation of QoE Aspects
Based on experimental results, it is found that the versions encoded at the 4 K resolution and QP of 22 for all the three contents achieve very high scores in terms of presence, perceptual quality, and acceptability.In particular, the presence scores for Content #1, Content #2, and Content #3 are, respectively, 4.22 MOS, 4.33 MOS, and 4.28 MOS.The corresponding perceptual quality scores are 4.44 MOS, 4.44 MOS, and 4.56 MOS, respectively.This perceptual quality range are similar to those obtained in [17,41].Also, their acceptability rates are all equal to 100%.This means that versions encoded at the 4 K resolution as provided on existing video streaming platforms (e.g., YouTube and Facebook) are acceptable to users.
Since the acceptability can be considered as the overall quality of a service [6,8], the impacts of the QoE aspects on the acceptability are also examined.As shown in Table 5, there is a statistically significant effects of the presence and the perceptual quality on the acceptability (i.e., p < 0.05).In addition, the effect sizes are both "large" (i.e., η 2 > 0.14).This means that both the quality of displayed video as well as the presence must be satisfactory, so that a 360 video service is acceptable to users.With respect to the cybersickness, there is a high percentage (i.e., 89% for device set D1 and 94% for device set D2) of the viewers in our experiment reporting nausea and dizziness when watching video versions.Therefore, the cybersickness is a serious problem for 360 video.However, as discussed later, the cybersickness strongly depends on the content motion of a video.
In the following, the influences of different factors on QoE are discussed in detail.

Impact of Encoding Parameters
In this subsection, we will investigate the impacts of two encoding parameters of resolution and QP on the three QoE aspects of perceptual quality, presence, and acceptability.Figures 4-6 respectively show the presence scores, the perceptual quality scores, and the acceptability rates at different QP values and resolutions of three video contents.We can see that lower QP and/or higher resolution result in higher scores.This observation is in-line with the results from previous studies [16,17,[23][24][25] that the perceptual quality increases when reducing QP [25] and/or increasing resolution [16,17,[23][24][25].

Impact of Resolution
From Figures 4-6, it can be seen that, when the QP is lower than 32, the difference in QoE between the 4 K (i.e., 3840 × 1920) and 2.5 K (i.e., 2560 × 1440) resolutions is insignificant.Specifically, when QP = 22, the difference averaged over content is 0.22 MOS for the perceptual quality, 0.28 MOS for the presence, and 0% for the acceptability rate.
However, the QoE scores decrease rapidly as the resolution is reduced to the fHD and HD resolutions.In particular, the presence score of all three contents drops by 0.56 to 0.94 MOS when the resolution changes from 4 K to fHD (i.e., 1920 × 1080).Also, the decrease of the perceptual quality is from 0.78 to 0.89 MOS.This finding is in-line with the results presented in [17,23] where the significant differences of the perceptual quality were found for the resolutions of 4 K and fHD.It is also worth noting that only two resolutions (i.e., 4 K and fHD) are used in [17,23].From Figures 4-6, we can also see that, for all the versions encoded at the HD resolution, the acceptability rates are about 60% or lower.The perceptual quality and presence scores are also lower than 3 MOS.Therefore, 360 video should not be provided at the HD resolution when watching in the VR mode using HMD.

Impact of Quantization Parameter (QP)
At the resolution of 4 K, it can be seen that the differences of QoE aspects between QP = 22 and QP = 28 are trivial.Specifically, the average difference over the three contents is 0.19 MOS for the perceptual quality, 0.24 MOS for the presence, and 0% for the acceptability rates.For the versions with the QP value of 40, the acceptability rates are very low (i.e., <30%).In addition, the corresponding presence and perceptual quality scores are smaller than 3 MOS.This implies that the versions encoded at the QP value of 40 or higher are annoying to the viewers.To achieve good QoE, the maximum QP values at the 4 K and 2.5 K resolutions are respectively 32 and 32 for device set D1, and 32 and 28 for device set D2.

Optimal Encoding Parameters
In this part, we discuss the optimal encoding parameters for each video content.The question here is "Given a bitrate, which encoding parameters should be chosen for the best QoE?".Figures 7-9 respectively show the presence scores, the perceptual quality scores, and the acceptability rates with respect to the bitrates.In these figures, the five marker types correspond to the five chosen QP values.The four line types represent the four selected resolutions.
It is clear that increases of bitrate result in improvements on all the QoE aspects.Besides, when the bitrate is decreased, the viewers prefer keeping the resolution, i.e., 4 K, and increasing the QP, i.e., from 22 to 28.When the bitrate is reduced further, the viewers want to decrease the resolution to 2.5 K, and keep the QP value of 28.If the bitrate is further reduced, the viewers want to keep the resolution of 2.5 K, and increase QP from 28 to 36.This indicates that, although the resolution of fHD is acceptable to the viewers, the increase of QP to 36 is preferred to the decrease of resolution to fHD.By this way, the maximum improvements in terms of presence score, perceptual quality score, and acceptability rate are respectively 0.56 MOS, 0.72 MOS, and 22%.To achieve good QoE, the minimal bitrates of Content #1, Content #2, and Content #3 are respectively about 1.5 Mbps, 5.6 Mbps, and 11.9 Mbps for device set D1, and 2.2 Mbps, 9.6 Mbps, and 11.9 Mbps for device set D2.So the bitrate for good QoE considerably varies across different contents.In [23], it was found that there is only a small difference of the perceptual quality between the bitrates of 15 Mbps and 8 Mbps.Also, the finding in [26] is that the perceptual quality at 10 Mbps is only slightly higher than that at 6 Mbps.However, from Figure 8, we can see that this depends on content characteristics.In particular, for Content #3, the difference between 15 Mbps and 8 Mbps or between 10 Mbps and 6 Mbps is significant (i.e., higher than 0.40 MOS).In the next subsection, a discussion of the impact of content motion is presented in detail.

Impact of Content Motion
In the Section 4.2.3, it has been shown that video bitrate for good QoE varies widely across different contents.Therefore, to guarantee satisfactory QoE for 360 video services, the characteristics of each content should be taken into account.In the following, we further discuss the impacts of the content motion on the QoE aspects.
Table 6 shows the statistical results about the effect of content motion on four QoE aspects based on the Kruskal-Wallis test.We can see that no statistically significant effect of content motion is found for the acceptability (i.e., p > 0.05).Meanwhile, statistically significant effects with the "small" sizes are found for the presence and the perceptual quality (i.e., p < 0.05, 0.06 > η 2 > 0.01).This implies that factors having significant impacts on the presence and the perceptual quality may not cause considerable influences on the acceptability.For the cybersickness, the effect of content motion is found to be "large" (i.e., p < 0.05, η 2 > 0.14).These findings are in agreement with a statement presented in [17] that the content motion has statistically significant impact on the perceptual quality and cybersickness.Table 6.Statistical results about the effect of content motion on QoE aspects based on Kruskal-Wallis test.The bold numbers in column p-value represent to statistically significant effects of factors.The bold, italic, and underlined numbers in column η 2 respectively correspond to "large", "moderate", and "small" effect sizes.Since the impacts of content motion on the presence, the perceptual quality, and the cybersickness are statistically significant, a post-hoc test using Mann-Whitney tests with Bonferroni correction is additionally conducted.The Mann-Whitney test is used to compare the statistical difference between pairs of video contents.The obtained results are presented in the following subsections.Note that, in this study, we consider three content motion types of camera motion, object motion, and background motion.

Impact on Presence
Based on the Mann-Whitney tests' results, it is found that there are statistically significantly differences of the presence between Content #1 and Content #2 (i.e., p < 0.05, r = 0.18) and between Content #2 and Content #3 (i.e., p < 0.05, r = 0.11).However, no significant difference is found between Content #1 and Content #3 (i.e., p > 0.05).In addition, from Figure 4, it can be observed that, given a combination of QP and resolution, the presence scores are generally highest for Content #2 and lowest for Content #1.To understand this result, we made discussions with the viewers.It is found that, when watching a video with medium camera motion (Content #2), many viewers feel more comfortable and enjoyable than when watching videos with static camera (Content #1) or fast camera motion (Content #3).Therefore, the viewers' perception of the presence in VR environment when watching Content #2 tend to be better.

Impact on Perceptual Quality
Regarding the perceptual quality, the test results show significantly differences between Content #1 and Content #2 (i.e., p < 0.05, r = 0.09) and between Content #1 and Content #3 (i.e., p < 0.05, r = 0.13).However, the difference between Content #2 and Content #3 is not significant (p > 0.05).This could be because Content #1 has few moving objects and, moreover, medium object motions.In addition, the background is static since the camera is fixed to the floor.Thus, encoding distortions in Content #1 are more likely to be detected by the viewers than that in Content #2 and Content #3.

Impact on Cybersickness
Question Q5 is to investigate the cybersickness for different contents.As shown in Figure 10, the scores of Q5 for Content #1 and Content #2 are higher than 3 MOS for both the device sets.Meanwhile, the scores for Content #3 are very low (i.e., approximately 2 MOS).Based on the results of the post-hoc test, it is shown that there are statistically significant differences between Content #1 and Content #3 (i.e., p < 0.05, r = 0.62) and between Content #2 and Content #3 (i.e., p < 0.05, r = 0.51).Meanwhile, no significant difference is found between Content #1 and Content #2 (i.e., p > 0.05).This suggests that, when watching videos with fast camera motion such as Content #3, it is easy for the viewers to realize the differences between their movements and the displayed content, and consequently causing more severe cybersickness.Hence, cybersickness is a serious problem for 360 video, especially for contents having fast camera motions.

Impact of Rendering Device
Table 7 shows the statistical results about the effects of rendering device on the four QoE aspects based on the Kruskal-Wallis test.It can be seen that there is no significant effect of the rendering device on the acceptability and the cybersickness (i.e., p > 0.05).Meanwhile, the rendering device has significant effects with the "small" sizes on the presence and the perceptual quality (i.e., p < 0.05, 0.06 > η 2 > 0.01).This finding is contradictory to that presented in [17] that the impact of the rendering device is not significant.This contradiction can be explained by the difference of rendering device sets used in the two studies.In particular, two HMD devices of Oculus Rift and HTC Vive, which are both of high-end quality, are employed in [17].Meanwhile, the comparative experiment in our study uses Samsung Gear VR and Google cardboard HMDs. Figure 11 compares the two device sets D1 and D2 in terms of presence and perceptual quality when the QP value is 22.It can be seen that the presence scores and the perceptual quality scores of device set D1 are always higher than those of device set D2 for all three contents.This result confirms that device set D1 is better than device set D2.
From Figure 11, it is also observed that the difference between the two device sets can be clearly seen only if the quality is acceptable to the viewers.In particular, there are noticeable differences of the presence scores between the two device sets at the resolutions of 4 K, 2.5 K, and fHD (i.e., ≥0.39 MOS).Meanwhile, only small differences of the presence scores can be seen at the HD resolution (i.e., <0.25 MOS).For the perceptual quality scores, similar observations can also be made.The impacts of wearing HMD (Q6) and FoV (Q7) on user experience are also investigated.The responses are showed in Figure 12.For device set D1, the scores of Q6 and Q7 are respectively 3.35 MOS and 3.46 MOS.For device set D2, the scores are 3.22 MOS for Q6 and 3.07 MOS for Q7.This result means wearing HMD and viewing through a narrow FoV are not very annoying to the viewers.However, these scores also imply that the HMDs still need to be improved.

Impact of Rendering Mode
In this subsection, we present a comparison between the non-VR and the VR modes in terms of acceptability and perceptual quality.Because most existing VR streaming platforms (e.g., YouTube and Facebook) are providing video versions at different resolutions, we also give suggestions about which mode should be used for different resolutions.
Statistical results about the effects of rendering mode on the acceptability and the perceptual quality are shown in Table 8.We can see that there are "small" effects of the rendering mode on the perceptual quality and the acceptability (i.e., p < 0.05, 0.06 > η 2 > 0.01).Figures 13 and 14 respectively show the perceptual quality scores and the acceptability rates versus the resolutions, for both the non-VR and VR modes with the QP value of 22.It can be seen that both the perceptual quality scores and the acceptability rates in the non-VR mode are always higher than those in the VR mode.This is because video versions in the VR mode are zoomed up via lenses in HMD.In addition, we can see that, the lower the resolution is, the larger the difference between the two modes becomes.Figure 15 shows the number of viewers in percent who prefers the VR mode to the non-VR mode with the QP of 22 and device set D1.It is interesting that, at the resolutions of fHD or higher, more than a half of the viewers prefer the VR mode to the non-VR mode, though the perceptual quality scores and the acceptability rates in the VR mode are lower in the non-VR mode.It can be because that the viewers are excited when watching in the VR mode.At the HD resolution, less than 33% of the viewers prefer the VR mode to the non-VR mode.Therefore, instead of the VR mode, the non-VR mode should be used for watching 360 video versions encoded at the HD resolution.In practice, most current 360 video services such as YouTube and Facebook still provide 360 videos in the HD resolution.Also, we can see that, similar to the behavior of the presence scores shown in Section 4.3.1, the number of the viewers who favor the VR mode to the non-VR mode is the highest for Content #2, and lowest for Content #1.This suggests that the higher presence score in the VR mode may result in the higher number of the viewers preferring the VR mode.In addition, the above results show that the higher perceptual quality and acceptability rate do not necessarily mean the higher preference of users.This implies that the perceptual quality and the acceptability rate are only valid in a rendering mode, and so cannot be used to compare different rendering modes.

Remarks on Findings
Based on the above discussions, we remark the findings as follows.

•
QoE aspects: 360 videos at the resolutions of 4 K and 2.5 K can offer good presence and perceptual quality scores to users.Both the presence and the perceptual quality are found to have strong effects on the acceptability.Also, the cybersickness is a serious issue for 360 video, especially for contents with fast camera motion.

•
Encoding parameters: The QoE differences are trivial between versions encoded at the 4 K and 2.5 K resolutions and the QP values of 22 and 28.Yet, as the resolution decreases or the QP increases, the QoE aspects reduce quickly.In particular, 360 video should not be encoded at the resolutions of HD or lower and the QP values of 40 or higher when watching in the VR mode using HMD.

•
Content motion: The bitrate for good QoE varies widely across different contents.The minimum bitrates of the three test contents are about 1.5 Mbps, 5.6 Mbps, and 11.9 Mbps.In addition, there exist strong correlations between camera motion and two QoE aspects of the cybersickness and the presence.Specifically, videos with fast camera motion cause more severe cybersickness than those with static or medium camera motion.In addition, users' presence is worst in videos with static or fast camera motion, and best in videos with medium camera motion.

•
Rendering device: Device set D1 always provides higher presence and perceptual quality scores than device set D2, but the differences between them are not much as expected.Statistical results show that the effect of rendering device is "small" on the presence and the perceptual quality.Meanwhile, no statistically significant effect of rendering device are found on the acceptability and the cybersickness.

•
Rendering mode: Although the acceptability rates and the perceptual quality scores in the non-VR mode are usually higher than those in the VR mode, users prefer the VR mode to the non-VR mode when watching video versions encoded at the resolutions of fHD or higher.For video versions with the HD resolution, watching in the non-VR mode is preferred to watching in the VR mode.Also, the results suggested that the presence aspect may be what makes users prefer the VR mode to the non-VR mode.In addition, the perceptual quality and acceptability rate cannot be used to compare different rendering modes since they are valid in a rendering mode only.

Conclusions
In this paper, we first highlighted main QoE aspects, key factors affecting QoE of 360 video, and the related work.It is shown that currently studies on this area are still limited that motivates us carrying out this study.By means of subjective experiment and statistical analysis, we evaluated the impacts of multiple factors on four main QoE aspects of 360 video, namely, perceptual quality, presence, cybersickness, and acceptability.Experiment results showed that the perceptual quality aspect is significantly impacted by all the four considered factors of encoding parameters (i.e., QP and resolution), content motion, rendering device, and rendering mode.It is also found that there exist strong impacts of encoding parameters, content motion, and rendering device on the presence aspect.Regarding the cybersickness aspect, the impact of content motion is shown to be significant, while the impact of rendering device is trivial.With respect to the acceptability aspect, the encoding parameters and the rendering mode have considerable impacts, whereas the content motion and the rendering device do not.Also, the acceptability aspect is strongly impacted by the perceptual quality and presence aspects.
For future work, we intend to build and evaluate QoE models of 360 video, which take into account the impacts of multiple factors, as well as QoE aspects.The findings in our study, as well as QoE models, in the future work are expected to be helpful to improve the QoE of video streaming services over Internet and VR applications for education and entertainment.In addition, we plan to study QoE aspects in a new context of VR gaming where QoE aspects such as presence is crucial for service providers.

Figure 1 .
Figure 1.Key factors and main QoE aspects of 360 video.

Figure 2 .
Figure 2. Factors and QoE aspects considered in this study.

Figure 10 .
Figure 10.Responses to Q5 for three contents.

Figure 15 .
Figure 15.Number of viewers in percent who prefers the VR mode to the non-VR mode (QP = 22 and Device set D1).

Table 1 .
Descriptions of factors and QoE aspects considered in related work.

Table 2 .
Descriptions of three contents used in our experiment.

Table 3 .
Settings of encoding parameters used in our experiment.

Table 4 .
Questionnaire and Rating Scale.

Table 5 .
Statistical results about the effects of the presence and perceptual quality on the acceptability based on Kruskal-Wallis test.The bold numbers in column p-value represent to statistically significant effects of factors.The bold, italic, and underlined numbers in column η 2 respectively correspond to "large", "moderate", and "small" effect sizes.

Table 7 .
Statistical results about the effects of rendering device on QoE aspects based on Kruskal-Wallis test.The bold numbers in column p-value represent to statistically significant effects of factors.The bold, italic, and underlined numbers in column η 2 respectively correspond to "large", "moderate", and "small" effect sizes.

Table 8 .
Statistical results about the effects of rendering mode on QoE aspects based on Kruskal-Wallis test.The bold numbers in column p-value represent to statistically significant effects of factors.The bold, italic, and underlined numbers in column η 2 respectively correspond to "large", "moderate", and "small" effect sizes.