Viewport Rendering Algorithm with a Curved Surface for a Wide FOV in 360° Images

: Omnidirectional visual content has attracted the attention of customers in a variety of applications because it provides viewers with a realistic experience. Among the techniques that are used to construct omnidirectional media, a viewport rendering technique is one of the most im ‐ portant modules. That is because users can perceptually evaluate the quality of the omnidirectional service based on the quality of the viewport picture. To increase the perceptual quality of an omni ‐ directional service, we propose an efficient algorithm to render the viewport. In this study, we ana ‐ lyzed the distortions in the viewport picture to be the result of sampling. This is because the image data for the unit sphere in the omnidirectional visual system are non ‐ uniformly sampled to provide the viewport picture in the conventional algorithms. We propose an advanced algorithm to con ‐ struct the viewport picture, in which the viewport plane is constructed using a curved surface, whereas conventional methods use flat square surfaces. The curved surface makes the sampling interval more equally spaced than conventional techniques. The simulation results show that the proposed technique outperforms the conventional algorithms with respect to the objective quality based on the criteria of straightness, conformality, and subjective perceptual quality.


Introduction
Omnidirectional visual content has attracted more attention in the fields of gaming devices, remote lecturing systems, broadcasting, movies, and streaming services because it provides users with a more realistic experience than conventional techniques [1][2][3][4]. Many companies have developed a variety of devices that are related to omnidirectional visual technology. To meet the demands of the industry, the Joint Video Exploration Team (JVET) of ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) has been working on 360° video coding as part of the explorations that are being conducted for developing coding technologies for future video coding standards since 2015 [5][6][7]. An omnidirectional visual system consists of a variety of core technologies, including those for picture capturing, stitching [8][9][10][11], bundle adjustment [12,13], blending [14,15], 360° image formatting [16,17], 360° image encoding [18,19], streaming 360° data [20], and viewport rendering [7]. Recently, stereo omnidirectional visual systems have been introduced to increase the sense of immersion for users. The stereo omnidirectional visual content gives viewers more realistic images and videos [21,22]. Among these essential techniques, viewport rendering is one of the most important modules. Viewport rendering provides users with the final image frames and the users can evaluate the perceptual quality of the omnidirectional visual system based on the quality of the viewport picture (VP) [20,23,24].
During the last decade, many experts have discussed the distortions that have resulted from the various methods to render the VP. Some algorithms bend the straight lines into curved lines in the VP, whereas other methods differently stretch objects in the vertical and horizontal directions. This is because these methods cannot overcome the limitations that result from the geometrical properties of the 360° VR coordinate system.
To solve these problems, many researchers have developed advanced algorithms [25][26][27][28][29][30][31][32]. In a study by Sharpless et al. [25], they modified the cylindrical projection [26] to maintain the shape of the straight lines. An automatic content-aware projection method was proposed by Kim et al. [27]. Kim et al. developed a saliency map to classify the objects in the image according to their importance levels and they combined multiple images to minimize the distortion. In [28], Kopf et al. introduced the locally-adapted projections, where a simple and intuitive user interface allowed the specification of regions of interest to be mapped to the near-planar parts, thereby reducing the bending artifacts. In [29], Tehrani et al. presented the unified solution to correct the distortion in one or more nonoccluded foreground objects by applying object-specific segmentation and affine transformation of the segmented camera image plane, where the algorithm was assisted by a simple and intuitive user interface. The algorithm proposed by Jabar et al. [30] moved the center of the projection to minimize the stretching error and bending degradation. Because the algorithms proposed in [27,30] created a saliency map [31] to determine the content of the 360° image, the additional process increased the complexities of these methods. Kopf et al. [32] defined the projected surface of the viewport as the surface of the cylinder, where the size of the cylinder is modified according to the field of view (FOV).
In this study, we consider the distortions in the VP to be the result of sampling because the image data on the unit sphere in the omnidirectional visual system are nonuniformly sampled to provide the VP for conventional algorithms. We propose an advanced algorithm to render the VP, where the viewport plane is constructed using a curved surface, whereas the conventional methods have used flat square surfaces. The curved surface makes the sampling interval more equally spaced than the conventional techniques.
This study is organized as follows. In Section 2, we explain the basic procedure to render the VP and two conventional algorithms: perspective projection [33] and stereographic projection [33]. Some processes are modeled, and we derive some equations to calculate the core parameters. In Section 3, we discuss the limitations of the conventional algorithms and the reasons why the pixels in the VP are distorted. We propose an advanced technique to render the VP in Section 4, where a curved surface is used for the projection plane. In Section 5, we demonstrate the performance of the proposed algorithm and compare it with the various conventional techniques. Finally, the conclusion is presented in Section 6. Figure 1 shows the 3D XYZ coordinate system that is used to represent the 3D geometry of a 360° image, where the (X,Y,Z) coordinates are based on a right-hand coordinate system [7]. The sphere can be sampled with the longitude (ϕ) and latitude (θ), where longitude ϕ is in the range of [−π, π], and latitude θ is in the range of [−π/2, π/2]. In Figure 1, the longitude ϕ and latitude θ are defined by the angle starting from the X-axis and they move in a counter-clockwise direction and by the angle from the equator toward the Yaxis, respectively. The relationships between the (X,Y,Z) coordinates and (ϕ, θ) are described as follows [7].

Basic Procedure to Render the Viewport Picture
Z=-cos ( θ) sin ( ϕ) ϕ= tan -1 -Z X (4) θ= sin -1 Y X 2 +Y 2 +Z 2 (5) Figure 2 shows the viewport rendering procedure for the given 360° image, where the coordinates of the pixels in the viewport picture (VP) and UV plane (a rectangular ABCD) are denoted by (m, n) and (u, v), respectively. The pixel values in the VP depend on several variables, including the FOV, the horizontal and vertical resolutions of the VP, and the projection technique. The procedure to calculate a pixel value of (m, n) in the VP is equivalent to getting the value of pixel P ' at the corresponding location (X,Y,Z) in the 3D sphere image. Therefore, this procedure consists of three steps: (step 1) the derivation of the coordinate (u, v), which is mapped to (m, n); (step 2) calculation of the corresponding location (X,Y,Z) of on the unit sphere for the (u, v) that was calculated in (step 1); (step 3) obtaining the pixel value at (X,Y,Z). The first and second steps are represented by two functions f M (m, n) and f p (u, v), respectively, as follows: The detailed processes of f M (m, n) and f p (u, v) vary according to the projection method, which could be the perspective projection [33], stereographic projection [33], and so on. Note that the UV plane is an intermediate surface on which the sample data is projected. The viewers can watch the screen of the viewport plane, not the UV plane. In the following subsections, we explain two typical projections in detail.
In Figure 2, we assume that the viewing angle is along the Z-axis. If the VP selected by the user is not along the Z-axis, (X,Y,Z) is rotated to (X ' ,Y ' ,Z ' ) by using a rotation equation. The pixel value at (X ' ,Y ' ,Z ' ) is used to calculate the pixel value of (m, n) in the VP.
The rotation matrix R is defined as follows: where 1 0 0 0 cos sin 0 sin cos (11) In (10) and (11), we define (ϕ c ,θ c ) as the angles between the view axis and the Z-axis along the axes of longitude ϕ and latitude θ, respectively. Note that the method of conversion from spherical coordinates to Cartesian coordinates is unstable around the zenith axis.

Perspective Projection
When perspective (i.e., rectilinear) projection [33] is used in the procedure to render a VP, the cross-section of the UV plane and the sphere can be represented as in Figure 3, where point P ' is projected onto P along the straight line between O and P. In this case, the width W uv and height H uv of the UV plane depend on the horizontal and vertical FOVs, respectively, as follows: where F h and F v denote the FOVs in the horizontal and vertical directions, respectively. By applying (12) and (13), the function f M (m, n) of (6) is represented by the following equations for the perspective projection [33].
where W VP and H VP are the width and height of the VP, respectively. The position (u,v) of P in Figure 3 is represented with 3D coordinates (x, y, z) using the geometric structure as follows: Point P is projected onto point P ' on the unit sphere, where the coordinates (X,Y,Z) of P ' can be determined as follows: X= x Y= y Z= z x 2 +y 2 +z 2 (19) Equations (16)- (19) explain the detailed procedure of f p (u, v) in (7) when the perspective projection [33] is used. Figure 3. Cross-section of the UV plane and sphere when the perspective projection is used. Figure 4 shows the UV plane and sphere when the stereographic projection [33] is used, where point P is projected onto P ' along the straight line between O ' and P, whereas the line between O and P ' is used in the perspective projection [33]. The cross-section of Figure 4 is shown in Figure 5, where the width W uv and height H uv of the UV plane can be determined as follows: Figure 5. Cross-section of the UV plane and sphere when the stereographic projection is used.

Stereographic Projection
In the stereographic projection, f M (m, n) from (6) is implemented by substituting (20) and (21) into (14) and (15). After (u, v) has been derived by using f M (m, n), it can be represented with 3D coordinates (x, y, z) by applying (16). When the stereographic projection is used, the 3D coordinates of P ' are as follows:

Limitation of the Conventional Methods
According to Jabar et al. [24], the conventional algorithms that are used to render the viewport produce a variety of distortions when the FOV is wide, although these distortions may not be noticeable for narrow FOVs. In this section, we discuss the reasons why the conventional techniques have limitations for wide FOVs followed by the analysis of the degradation tendency according to the degree of the FOV.

Distortions in Conventional Methods
The subjective qualities of the viewports that are generated with various FOVs are compared in Figure 6, where (a-c) and (d-f) show the perspective [33] and stereographic [33] projections, respectively. As observed in Figure 6, when the perspective projection is used, the viewport shows a distorted perspective for a wide FOV. On the other hand, when the stereographic projection is used, straight lines are severely bent as the FOV increases.

Analysis of the Distortion Resulting from Conventional Methods
As illustrated in Figure 2, the pixel values in the viewport are set to those with the corresponding P ʹ points on the unit sphere. When the pixel value at (m, n) in the VP is denoted by I(m, n), the corresponding P ʹ point on the unit sphere is represented by P (m,n) ʹ , where the coordinate of P (m,n) ʹ is denoted with the form (X (m,n) ,Y (m,n) ,Z (m,n) ). To analyze the distance between the positions of the corresponding P (m,n) ʹ points, we calculated the vertical length of the arc between P (m,n) ʹ and P (m,n+1) ʹ and the horizontal length between P (m,n) ʹ and P (m+1,n) ʹ , as demonstrated in Figure 7. Note that the length is the sampling interval that is used to generate a VP from a 360° image.
where 0≤ m <W VP -1, 0≤ n <H VP -1, * and * are the specific values that are used to calculate the lengths of the arcs between the horizontal and vertical positions. Figure 8 presents the normalized values of l h (m, n*) according to the horizontal FOV (F h ) when the perspective and stereographic projections are used. In this figure, we can observe that the length between the locations of the consecutive sampling points on the sphere decreases as m moves away from the center (m=(W VP /2)-1). This means that the pixels in the boundary region of a VP are made by sampling the P ʹ points with reduced sampling intervals on the sphere. This makes the pixels in the VP look like a stretched form of P ʹ points on the sphere. This tendency is more significant as F h increases. As observed in Figure 8a,b, the amount of stretching in the perspective projection is greater than that in the stereographic projection. Note that the analysis for l v (m * ,n) is similar to those in Figure 8, although it is not presented in this paper.

Curved UV Surface
By analyzing Figure 8 in the previous section, when the VP is rendered with the wide FOV, the degradation in the VP is due to the non-uniformed sampling space. In this section, we propose a model for the UV surface, which makes the sampling space uniform. Figure 9 depicts a model to represent the proposed UV surface, which is tangent to the unit sphere. The proposed UV surface is a part of the outer sphere whose center and radius are O ' and r, respectively. The outer sphere intersects with the unit sphere at the center of the UV surface. The outer sphere is represented by the following equation. In Figure 10, the red curve represents the cross-section of the proposed UV surface, where F v is the vertical FOV and F v OUT is the corresponding vertical angle at the center O ' of the outer sphere to cover the surface that is generated by applying F v at the center O of the unit sphere. The relationship between F v and F v OUT is as follows: Equation (30) can be rewritten as follows: Note that Equation (31) is numerically stable only for 0.5. has a complex number for 0 0.5 . When 0.5 , is not defined. However, because the curved UV surface is a part of the outer sphere as shown in Figure 10, would be larger than 1. Thus, in the proposed algorithm, is always numerically stable. Equations (32) and (33) explain the detailed process for f M (m, n) from (6) when the proposed UV surface is used. Based on the geometry that is related to the outer sphere, the coordinates (x, y, z) of P are derived as follows: y=r sin ( θ OUT ) z=r cos ( θ OUT ) cos ( ϕ OUT )-(r-1) Point P is projected onto point P ' on the unit sphere, where the coordinates of P ' are (X,Y,Z), which are calculated by using (17)- (19). When the proposed UV surface is used, f p (u, v) from (7) is implemented by applying (34)-(38) and (17)- (19).

Adjusted Sampling Intervals
In this section, we adjust the sampling intervals that are made by using the curved UV surface so that the sampling positions are more evenly spaced. Figure 12 (2), S(4)} from the diagonal line is set to w, which is defined as follows: where thr FOV is a threshold for the FOV. When the FOV is larger than thr FOV , i.e., when the FOV is wide, the sampling intervals are modified. In (39), the parameter α is set based on the empirical data. Using this mapping function, the sampling points P ' on the unit sphere are more uniformly sampled horizontally. This reduces the degradation (e.g., stretching) in the boundary regions of the viewport. The mapping function in Figure 12 can be modeled with the following equation: u ʹ =g(u)=c 0 +c 1 u+c 2 u 2 +c 3 u 3 where the variables c 0 ,c 1 ,c 2 ,c 3 are calculated by applying a polynomial regression algorithm [34] as follows: Note that all of the techniques that are explained in this subsection can be applied for v.
In Figure 8a,b, if the sampling positions are ideal, then the normalized l h (m,n*) is equal to 1 for all m values. Thus, the degradations in these figures can be evaluated by using the following equation.
Max l h (m,n*) In order to analyze the effect of the parameter of equation (29) in the proposed algorithm, the values of L n* h of (44) for various are shown in Figure 13. In the figure, as the value of L n* h decreases, the quality of the viewport increases because it means that the data on the sphere are sampled uniformly. As shown in Figure 13,  To demonstrate the effect of the proposed UV surface, Figure 14 displays the normalized l h (m,n*) when (u, v) of the UV plane is modified by applying the mapping technique of Figure 12 for u and v independently after the proposed UV surface is employed with the optimized values of r. Based on (44), the degradations of the various methods in Figures 8 and 14 are summarized in Table 1. As observed in Table 1, the degradation of the proposed algorithm is much smaller than those for conventional algorithms. The degradation values in the r = 0.3 column are larger than those in other columns because r = 0.3 is one instance of invalid data. The analysis for l v (m,n*) is similar to those in Figures 8 and 14, although it is not provided in this manuscript.

Simulation Results
To demonstrate the performance of the proposed algorithm, we compared it with a variety of conventional techniques, including the perspective projection [33], stereographic projection [33], Pannini projection [25], automatic content-aware (ACA) projection [27], and adaptive cylindrical (AC) projection [32].

Quantitative Evaluation
In this section, an objective evaluation is performed for the various conventional techniques and the proposed algorithm, where straightness and conformality are used as criteria, as demonstrated by Kim et al. [27]. Figure 15 explains how to calculate the straightness and conformality as criteria in the objective evaluation. The evaluation equations for these are as follows:

Straightness Conformality
Center point Figure 15. Straightness and conformality as the criteria in the quantitative evaluation. Straightness= Conformality= min ( β 1 ,β 2 ,β 3 ,β 4 ) max ( β 1 ,β 2 ,β 3 ,β 4 ) The maximum values of straightness and conformality are 1. As these values get closer to 1, the degradation in the straightness and conformality decreases. Figure 16 displays the patterns that occur using the various algorithms when nested squares are used as a test pattern to evaluate the straightness and conformality. It can be observed that some lines are curved and some red circles have been deformed. The bent lines imply that the applied methods result in poor performance with respect to the straightness. In a case where the circle is badly deformed, we know that the conformality of the algorithm is not good. Based on (45) and (46), the evaluation results for the straightness and conformality are summarized in Table 2, where the best results and the second-best results are represented with red and blue numbers, respectively.

Projection Method
Straightness Conformality Perspective [33] 0.999999 0.719107 Stereographic [33] 0.973522 0.878603 Pannini [25] 0.97737 0.788848 ACA [27] 0.978032 0.753724 AC [32] 0.978640 0.616209 Proposed Algorithm with the optimized r 0.980597 0.826667 As observed in Figure 16 and Table 2, the perspective method shows the best performance in the straightness, whereas it provides one of the worse results for the conformality. In contrast, the stereographic method shows the smallest value for the straightness and the largest value for the conformality. This implies that the stereographic algorithm outperforms the other techniques in maintaining the shape of the circles in the viewport, but it bends the lines more than the other schemes. Although the proposed algorithm does not provide the best performance for the straightness and conformality, the overall performance of the proposed algorithm is in the upper ranks. In particular, Figure 16 shows that the overall subjective deformation is much less than the other techniques.

Qualitative Evaluation
In this section, we evaluate the subjective performances of the various algorithms that are used to render a viewport picture. Figure 17 presents the six test images that are used for the subjective evaluation, which show the equirectangular images whose horizontal and vertical FOVs are 360° × 180°, respectively.  Figure 18 displays the viewport images that are rendered by the various projection methods. The viewport images rendered by the equirectangular pictures are shown in Figure 18 in which the horizontal and vertical FOVs of the viewport are set to 150° and 120°, respectively. As demonstrated in Figure 18, the subjective qualities of the proposed method are much higher than those of the others. This is because the objects are deformed and the lines are bent in the pictures that are produced by the perspective [33], stereographic [33], Pannini [25], ACA [27] and, AC [32] methods.

Perceptual Evaluation
To provide simulation results for the perceptual evaluation, the single stimulus method of BT.500 [36] is used for the images in Figure 19. The mean opinion score (MOS) is a simple measurement to obtain the viewers' opinions. The MOS provides a numerical indication of the quality that is perceived by the viewers. The MOS is expressed as a single number in the range of 1-5, where 1 means bad and 5 means excellent. In this test, the viewport images with a wide FOV were evaluated and they were scored by 20 viewers who were graduate students. Figure 19 shows the MOS values. The MOS values of the proposed method are always much higher than those of the conventional methods [25,27,31,33].

Complexity Comparision
To compare the computational complexities of the algorithms, we evaluated the central processing unit (CPU) times that were required by the various algorithms, where a personal computer incorporating an AMD Ryzen 5 2600 six-core Processor @3.40 GHz and DDR4 32 GB was used. The CPU times are summarized in Table 3, where the proposed algorithm is much simpler than ACA [27]. This is because ACA [27] required additional processes to detect the line component and to optimize the related parameter, whereas the proposed algorithm does not need these. The complexities of the other conventional algorithms, such as the perspective [33], stereographic [33], Pannini [25], and AC [32] algorithms are approximately equal to that of the proposed method. Table 3. Central processing unit (CPU) time consumed by the various algorithms.

Conclusions
We discussed the limitations of the conventional algorithms that are used to render the VP and we have proposed an algorithm to overcome these problems. A curved surface was used to efficiently project the pixel data on the unit sphere of an omnidirectional visual system onto the VP. Even though conventional techniques use a flat square plane to project the pixel data with non-uniform sampling intervals, the proposed curved surface reduced the non-uniformity and this resulted in a perceptually enhanced VP.
Using rendering engines can be considered to implement the proposed algorithm in real-time applications. First, the rasterization-based pipeline in the GPU is a unit designed to increase the speed of filling the pixel values in the triangle meshes on the screen. However, the proposed algorithm projects a value on the sphere to the pixel in the viewport.

MOS
Consequently, the proposed algorithm is not suitable for implementation using the rasterization-based pipeline in the GPU. Second, the ray tracing-based pipeline in the GPU can be considered to enhance the algorithm's performance. In this scenario, we need to access the data pixel-by-pixel in the module of the ray-tracing pipeline. However, most commercial GPU products do not provide the accessibility. Third, we can consider Intel Embree, which is a collection of high-performance ray tracing kernels. Since the Embree does not have the limitations of GPU-based systems, it best matches the proposed algorithm.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.