HEVC based Frame Interleaved Coding technique for Stereo and MultiView Videos

Received Jun 12, 201x Revised Aug 20, 201x Accepted Aug 26, 201x The latest standard MV-HEVC has proven to deliver about 50% bitrate saving compared to its predecessors, however its multi-layer coding architecture could be seen as a bottleneck in fast frame streaming across different views. In this paper, a frame interleaved stereo/multi-view video coding technique, based on the HEVC standard video codec, to encode stereo and multi-view video sequences is presented. The frames of stereo and multi-view video sequences are interleaved in a way to maximize exploitation of temporal, interview and cross correlations, generating a monoscopic video stream. The single-layer coding structure of the MV-HEVC codec is then used to encode the resulting video sequence. The encoded bitstream of the interleaved stereo/multi-view video sequences exhibits fast frame streaming across views. The coding performance of the proposed codec is compared with the anchor standard MV-HEVC codec using three standard multi-view video sequences, namely: “Poznan_Street”, “Kendo” and “Newspaper1”. Experimental results show that the proposed codec gives substantial coding gains than the anchor codec for coding both stereo and multi-view video sequences. Keyword:


INTRODUCTION
In recent years 3D video entertainment market has grown enormously, however the application of 3D videos is not limited to multimedia purpose alone.3D videos are employed in immersive video conferencing, e-learning, cloud-based multimedia services, real time surveillance, automation, robotics, and machine vision [1].Multi-view videos are generated by using geometrically aligned and synchronized multiple cameras, which capture the same scene simultaneously.The vast amount of visual information contained in multi-view videos largely constitute to their demand for huge storage space, higher transmission bandwidth over communication channel and greater computational power for coding [2].The bitrate for encoding the views of the multi-view videos individually, using monoscopic codecs, increases approximately linearly with the number of views, hence efficient compression techniques are necessary for such applications [3].Unlike single view video codecs, texture based multi-view video codecs employ scene geometry implicitly, through disparity prediction and compensation across views, to efficiently compress stereo/multi-view videos.Stereo/multi-view video codecs extensively use disparity prediction/compensation (DPC) and motion prediction/compensation (MPC) techniques, which are designed to exploit the inter-view and temporal correlations, respectively [3], [4].In addition to DPC and MPC techniques, stereo and multi-view video codecs are provided with advanced coding tools, such as hierarchical B picture (HBP) prediction structure, variable block-size motion estimation (ME) and disparity estimation (DE), to improve the coding efficiency.Standard 3D video codecs use the principle of combining temporal and inter-view prediction techniques to improve the coding performance.Numerous coding standards such as: H.264/AVC, MPEG 3DAV, H.264/MVC and MV-HEVC, have been developed over the years to efficiently compress multi-view videos [5], [6], [7], [8].Although the standard multiview video codecs are designed to efficiently code various 3D video applications, they still fail to fulfil the requirement for multi-view video transmission at lesser bandwidth and computation costs.Whereas, today's wireless and low-cost digital data transmission channels often operate at much lower bitrate and most of the applications use high definition (HD) and ultra-high definition (UHD) videos.The objective of the research reported in this paper is to develop a less complex texture based stereo/multi-view video codec by analysing the coding process and prediction structures of standard HEVC codec.
Various stereo/multi-view video coding techniques have been developed to address the standard multi-view video codec's short comings.An analysis of different combinations of temporal and inter-view prediction techniques was conducted by Merkle et al. [9], for multi-view video compression technique based on the standard H.264/AVC video codec.Their results revealed that efficiency of inter-view/temporal prediction combinations strongly depends on properties of the multi-view video sequences and adding interview reference pictures for disparity prediction/compensation could increase the achieved coding gain.Over the last decade, introduction of H.264/AVC and its multiview extension, H.264-MVC, has attracted the interest of many researchers towards further developing advanced stereo/multi-view video codecs; however, the challenging aspects of these techniques have been to deal with inherent computational complexity and high bandwidth requirement, due to the nature of the multi-view videos.Many coding techniques have been proposed based on motion vector quantisation, flexible group of pictures (GoP) structures that can adapt to different characteristics of multi-view videos, estimating motion homogeneity by calculating the difference in horizontal and vertical motion vectors for complex motions and an adaptive search window range algorithm by calculating differences between the prediction vectors [10], [11].The results from MVC based stereo/multiview video coding techniques have shown that increasing the number of inter-view predictions effectively reduces the required bitrates [11], [12].Another way of coding stereo videos is using asymmetric resolution coding techniques, where video quality of the additional views is reduced by scaling down the resolution spatially or temporally.Asymmetric video coding techniques benefit from human visual system's tolerance to suppressed high frequency components and reduced resolution in one of the views.Coding efficiency for different scaling levels and resolutions for the stereo videos was studied in [13] and [14], the coding performance of these techniques were found to be close to that of the standard multi-view video coding technique, while they were able to deliver higher subjective qualities.A subjective study on the visual quality of the decoded video frames of asymmetric and symmetric stereo videos was conducted in [15] using H.264/MVC codec.Their results showed that asymmetric video frames exhibit superior visual quality to those of symmetric videos at high bitrates, with compression efficiency close to that of H.264/MVC codec.
An inter-view motion vector prediction method was proposed in [16] to improve coding efficiency of the dependent views by using previously encoded motion information of the reference views using temporal motion vector prediction.This method calculates a global disparity vector by accessing a look-up table generated from disparity vectors of the previously encoded frames.Then the global disparity vector is used to adjust the motion field of inter-view reference pictures.A multi-view video compression scheme using HEVC monoscopic codec was proposed in [17], the prediction structure of this technique closely matches that of H.264/AVC based multi-view video codec with minimized prediction signaling.A less complex but improved motion and inter-view prediction multi-view video codec using HEVC was proposed in [18], which uses vector scaling for the targeted prediction units.In addition, it uses decision choices for selecting prediction candidate from co-located units in the reference frame to track the neighboring unit motion/disparity vector and identify unit vector that can be used as the source of prediction (nested prediction).Although this method adds complexity for finding the best match in multi-view motion/disparity prediction, it does not produce significant coding gain to that of the standard MV-HEVC.
From the literature it is evident that researchers have mainly investigated techniques that adapt the standard monoscopic and multi-view video codecs to encode stereo/multi-view videos.So far, the modification mostly encompasses changing the resolution of video frames and improving the motion/disparity estimation/compensation unit of the codec.However, an investigation into techniques, which enables to exploit cross-frame (also called as lateral frame) along with inter-view and temporal corrolations for stereo and multiview video coding, have been less reported in the literature.In this paper a HEVC based interleaved stereo/multi-view video codec is presented.The proposed codec applies a novel frame interleaving technique on the stereo/multi-view video frames to increase the exploitation of cross-frame, temporal and inter-view correlations.The coding performance of the proposed codec is assessed and compared with that of the standard MV-HEVC using three standard multi-view video datasets.Experimental results show the merit of the proposed technique.The rest of this paper is organized as follows: Section 2 outlines the framework of the proposed stereo/multi-view video codec.Section 3 presents the experimental results and finally, the paper is concluded in Section 4.

HEVC BASED FRAME INTERLEAVED VIDEO CODING TECHNIQUE
The extension of the HEVC for code multi-view videos, known as MV-HEVC, uses a framework similar to the multi-view extension of H.264/ MPEG-4 AVC, namely H.264-MVC.In contrast to H.264-MVC, which uses a single loop coding approach, MV-HEVC uses a multi-loop decoding technique.Hence, MV-HEVC requires decoding all the encoded reference layers, prior to decoding a new layer.It also uses a layered representation for encoding multi-view videos, this layer encoding dependency significantly increases the decoding complexity of the MV-HEVC codec [19], [20].In this section, a HEVC based Frame Interleaved Stereo/Multi-view video codec (HEVC-FISMVC), which uses a single layer encoding approach, is presented.The proposed frame interleaving algorithm reorders video frames in such a fashion that the resulting monoscopic video sequence has two consequent frames of each view next to each other, as shown in Figure 1.This potentially improves the exploitation of temporal and inter-view correlations resulting in higher coding performance.The resulting monocular video sequence is then encoded, as a single layered video, using the HEVC codec.From Figure 1a-b, it can be seen that the interleaving method starts and completes the frame reordering at the left-and center-view for stereo and multi-view video frames, respectively.This is to ensure that the I-frames are always from the left-view frame for stereo videos and the centre view frame for multiview videos.It has been shown in [11] and [12] that exploiting inter-view correlations, DPC, delivers higher coding efficiency than the temporal correlations, hence the standard codecs use intelligent combinations of disparity and motion compensation to increase the coding efficiency, which adds to the computation cost of the codec [9].To further improve the coding efficiency of the codec, codecs exploit inter-view and temporal redundancies by taking up a group of advanced coding tools, such as hierarchical B picture (HBP) prediction structure, variable block-size motion estimation (ME) and disparity estimation (DE).The reference frame architecture in MV-HEVC has been combined with signaling for prediction dependencies between different views.The MV-HEVC uses a multi-loop encoding design to encode frames from other views, it also uses a layered representation for encoding multi-view videos, this increases decoding complexity of the codec since the multilayered decoding process is an essential need for prediction prior to encoding a new layer.As a consequence, the video sequences of individual views cannot be processed independently since they share reference pictures with other views.The proposed HEVC-FISMVC codec benefits from coding the reordered multi-view monoscopic video frames.The reference frames architectures for the frame interleaved stereo and multi-view video codecs are shown in Figure 2 and 3, respectively.From these figures, it can be seen that in addition to adjustment inter-view and temporal frames referencing, the proposed reference frame architectures allow crossframe referencing (also called as lateral frame referencing), if not, this would be computationally complex when implemented within the AVC framework (using the standard HEVC).
The goal of motion/disparity estimation/compensation is to reduce the energy of the difference block [21].This is achieved by finding the same scene in either neighboring-view or the previous frame [22].In case of neighboring views, the scene location is a function of distance of the cameras from the scene and their intercamera angles.A study on the impact of camera separation on performance of the multi-view video codecs has shown that, as the angle between the optical reference lines of the cameras, increases the inter-view motion correlation decreases [23].For the standard test videos, used in this study, motion vector search range was set to 96 pixels to mitigate the effect of the inter-camera angles and camera distances from the scene.The motion/disparity search area for the proposed codec is set to 96 pixels, so that the HEVC's advance and temporal motion vector prediction toolsets work efficiently for frame interleaved multi-view monoscopic videos.The standard MV-HEVC video codec's software is configured to provide a single layered HEVC codec operating with AVC capabilities and reduced transmission overhead.HTM-16.0-MV-Draft 5 software version [25] was used to implement the proposed HEVC-FISMVC codec.The values assigned to the parameters of the standard MV-HEVC codec in order to implement the proposed codec are tabulated in Table I.The parameter "NumberOfLayer" is set to the value 1 to configure single layer mode of operation of the standard MV-HEVC codec.The parameters "Number OfViewId", "OutputLayerSetIdx" and "LayerIdsInAddOutputLayerSet_0" were assigned minimum values to run the standard codec with least number of signaling bits in the transmission overhead.The intra period is set to 24 frames as restricted by the specifications as per the common test condition document JCT3V-G1100 [24], but the minimum GoP size for the proposed codec's design for 2view and 3-view multi-view video scenarios are set to 8 and 12 frames respectively.Table 1.MV-HEVC design parameter for the proposed codec.

RESULTS AND ANALYSIS
Performance of the proposed HEVC based frame interleaved coding technique is compared with the anchor MV-HEVC codec, as presented in JCT3V-G1100 document [24], for 2-view and 3-view scenarios.To evaluate the performance of the proposed HEVC-FISMVC codec, in a 2-view scenario, views 4-3, 1-3 and 2-4 of "Poznan_Street", "Kendo" and "Newspaper1" standard multi-view video sequences were chosen respectively and coded using the proposed codec.In case of 3-view scenario, views 5-4-3, 1-3-5 and 2-4-6 of "Poznan_Street", "Kendo" and "Newspaper1" standard multi-view video sequences were chosen, respectively and coded using the proposed HEVC-FISMVC codec.The test video sequences illustrate different entertainment and interactive applications, with varying scene characteristics and levels of illumination and camera distances.The PSNR of the Y frames (Y-PSNR) of the proposed and anchor codec for "Poznan_Street", "Kendo" and "Newspaper1" test videos at different bitrates are shown in Figures 4-6.From these figures, it is clear that the proposed codec generates significantly higher coding performance than the anchor MV-HEVC at all bitrates (up to 1.2 dBs).The "Poznan_Street" dataset is an outdoor video sequences captured under natural lighting condition.Its videos contain multiple moving objects with a stationary background, which were recorded by stationary cameras.From Figure 4(a), it can be noted that, for 2-view "Poznan_street" scenario, the proposed codec's video frames on average exhibit 0.7dB greater than the anchor codec's video frames at 550kbps to 3100kbps.From Figure 4(b), which show results for coding the 3-view multiview video sequences, it is obvious that the proposed code outperforms the standard codec by an average of 0.5dB, between 300kbps to 3900kbps.The "Kendo" dataset is an indoor multi-view videos captured under multiple controlled lighting sources.These videos contain progressive background changes with a number of fast moving objects in the foreground.From Figure 5(a), it is clear that the proposed codec gives higher coding performance than that of the anchor codec for coding 2-view "Kendo" with an average Y-PSNR of 1.4dB.For coding 3-view "Kendo" multi-view test sequences, the Y-component of the proposed codec's video frames have an average of 1.2dB greater PSNR than that of anchor MV-HEVC codec's video frames.From these figure, it evident that the proposed codec's frame exhibits generally higher visual quality than that of MV-HEVC video frame

CONCLUSION
In this paper a HEVC based frame interleaved video coding technique, for coding stereo/multi-view videos were presented.The proposed codec uses a single layer approach to encode frame interleaved multiview videos, while it exploits correlations across views more efficiently than the anchor MV-HEVC codec resulting in higher coding gain.The coding performance of the proposed codec was compared with the standard MV-HEVC video codec using three standard multi-view video sequences.Experimental results show the proposed codec outperforms the anchor MV-HEVC codec up to 1.2dB.In addition, the encoded single layer video bit stream mitigates the drawback of accessing multi-layer video frames enabling fast video streaming across different views.


ISSN: 2088-8708 IJECE Vol.x, No. x, September 201x : xxxx 32 Title of manuscript is short and clear, implies research results (First Author) 33

Figure 1 :
Stereo and multi-view frame interleaving block diagram: (a) Algorithms contour to interleave stereo frames and (b) Interleave representation of the stereo frames.

Figure 2 :
Figure 2: Reference frame architecture of the proposed HEVC-FISMVC for stereo videos.

Figure 3 :
Figure 3: Reference frame architecture of the proposed HEVC-FISMVC for multi-view videos.

Figure 4 .
Title of manuscript is short and clear, implies research results (First Author) Average Y-PSNR vs bitrate of the anchor MV-HEVC and the proposed HEVC-FISMVC codecs for coding: (a) 2-view "Poznan_street" and (b) 3-view "Poznan_street" videos.

Figure 6 .Figure 7 :
Figure 7: Decoded frame number 16 from Kendo videos of (a) the proposed codec and (b) the MV-HEVC standard codec.To enable the reader to compare the achieved visual quality of the proposed codec's, frame number 16 of the proposed-and MV-HEVC anchor codec from the Kendo videos are shown in Figure 7 (a) and (b), respectively.From these figure, it evident that the proposed codec's frame exhibits generally higher visual quality than that of MV-HEVC video frame