Efﬁcient Software HEVC to AVS2 Transcoding

: The second generation of Audio and Video coding Standard (AVS) is developed by the IEEE 1857 Working Group under project 1857.4 and was standardized in 2016 by the AVS Working Group of China as the new broadcasting standard AVS2. High Efﬁcient Video Coding (HEVC) is the newest global video coding standard announced in 2013. More and more codings are migrating from H.264/AVC to HEVC because of its higher compression performance. In this paper, we propose an efﬁcient HEVC to AVS2 transcoding algorithm, which applies a multi-stage decoding information utilization framework to maximize the usage of the decoding information in the transcoding process. The proposed algorithm achieves 11 × –17 × speed gains over the AVS2 reference software RD 14.0 with a modest BD-rate loss of 9.6%–16.6%.


Introduction
Aiming to establish generic technical standards for the compression, decoding, processing, and representation of digital audio-video content, the Audio and Video coding Standard (AVS) Working Group was formed in China in March 2002 [1].By the year 2012, the AVS Working Group had published a series of video coding standards, which formed the first version of AVS standards.In AVS1, there are six profiles defined to satisfy different requirements of various applications including Main, Surveillance Baseline, Enhanced, Portable, Surveillance and Broadcasting.Also, the AVS standards are being recognized by the global society.Based on the members of the IEEE standards Association from the AVS Work Group, the IEEE 1857 Working Group was founded in 2012 to globalize the AVS standards as the IEEE 1857 standards [2].So far, the IEEE 1857 Working Group has finished three parts of the IEEE 1857 standards.
The second generation of AVS, AVS2, has been under development since 2012 [3][4][5].At the end of 2015, the final draft of AVS2 was released and the standardization process is running now.Targeting next generation Ultra High Definition TV (UHDTV), AVS2 adopts larger partition sizes and other coding tools to improve coding performance, especially on high resolution videos.Compared to its predecessor AVS1, AVS2 achieves up to 50% bitrate savings and provides more features on surveillance video coding [6].
High Efficiency Video Coding (HEVC) is the latest global standard on video coding.It was developed by the Joint Collaborative Team on Video Coding (JCT-VT) and was standardized in 2013 [7].HEVC was designed to double the compression ratios of its predecessor H.264/AVC with a higher computational complexity.After several years of developments, mature encoding and decoding solutions are emerging, accelerating the upgrade of the video coding standards of video contents from the legacy standards such as H.264/AVC.With the increasing needs of ultra-high resolution videos, it can be foreseen that HEVC will become the most important video coding standard in the near future.
Compared with HEVC, AVS2 shows comparable coding performance with HEVC on normal videos, while for surveillance videos AVS2 outperforms HEVC by 32.1% due to its additional background coding for surveillance videos [6].Until now, there have been few well-optimized AVS2 software encoders, so transcoding from HEVC to AVS2 is a fast way to generate AVS2 video contents.Also, HEVC and AVS2 share the same quad-tree partitioning structure and the block processing flows of both are similar, making it easier and more efficient to utilize the HEVC decoding information in the AVS2 encoding process.
In this paper, we propose a fast HEVC to AVS2 transcoding algorithm that utilizes the partition sizes, prediction modes, motion vectors (MVs), reference pictures, intra-prediction modes and other information extracted from the input HEVC bitstream to expedite the AVS2 encoding process and achieves efficient HEVC to AVS2 transcoding.Inspired by the fast H.264/AVC to HEVC transcoding in [8,9], we adopt the multi-stage decoding information utilization framework in [9] and apply in the HEVC to AVS2 transcoding scenario to maximize the usage of the decoding information.Experiments show that our proposed algorithm can achieve up to 17× speed gains over the RD 14.0 AVS2 reference software with an acceptable coding performance loss.
The remainder of this paper is organized as follows.Section 2 reviews some related work.Section 3 compares the similarities and differences between HEVC and AVS2 standards.Section 4 introduces the fast HEVC to AVS2 transcoding algorithm based on the multi-stage decoding information utilization framework.Section 5 shows the experimental results on HEVC test sequences and we discuss them in Section 6. Section 7 presents the materials used and Section 8 concludes this paper.

Related Work
There are quite a few research works on video transcoding between different coding standards, which is defined as heterogeneous transcoding in [10].In [11], the authors discuss some key issues in generic video transcoding process.The authors of [12][13][14] provide some thoughts on transcoding among MPEG-2, MPEG-4 and H.264.
After the HEVC standard was finalized, there were more explorations on transcoding from H.264/AVC to HEVC.Zhang et al. [15] proposed a solution where the number of candidates for the coding unit (CU) and prediction unit (PU) partition sizes was reduced for the intra-pictures, while for the inter-pictures, a Power-spectrum based Rate-distortion Optimization model-based power spectrum was used to estimate the best CU split tree from a reduced set of PU partition candidates according to the MV information in the input H.264/AVC bitstream.Peixoto et al. [16] proposed a transcoding architecture based on their previous work in [17].They proposed two algorithms for mapping modes from H.264/AVC to HEVC, namely, dynamic thresholding of a single H.264/AVC coding parameter and context modeling using linear discriminant functions to determine the outgoing HEVC partitions.The model parameters for the two algorithms were computed using the beginning frames of a sequence.They achieved a 2.5×-3.0×speed gain with a BD-rate [18] loss between 2.95% and 4.42% by the proposed PT-LDF method.Diaz-Honrubia et al. [19] proposed an H.264/AVC to hevc transcoder based on a statistical NB classifier, which decides on the most appropriate quadtree level in the HEVC encoding process.Their algorithms achieved a speedup of 2.31× on average with a BD-rate penalty of 3.4%.Mora et al. [20] also proposed an H.264/AVC to HEVC transcoder based on quadtree limitation, where the fusion map generated by the motion similarity of decoded H.264/AVC blocks is used to limit the quadtree of HEVC coded frames.They achieved 63% time saving with only a 1.4% bitrate increase.
Because the standardization process of AVS2 is not finished yet, there is little research work on this fairly new standard.According to the inheritances between H.264/AVC and HEVC and between AVS1 and AVS2, works on H.264/AVC to AVS transcoding can be a reference.Wang et al. [21] proposed a fast transcoding algorithm from H.264/AVC bitstream to AVS bitstream.They used a QP mapping method and a reciprocal SAD weighted method on intra mode selection and inter MV estimation.
Their algorithms achieved 50% time saving in intra prediction with ignorable coding performance loss and 40% time saving in inter prediction with minor coding performance loss.
The algorithm we proposed in the paper is inspired by the multi-stage decoding information utilization in our previous work [9].In [9], we presented a four-level transcoder framework, namely GOP-level Task Distribution, High-level Parallel Processing, Mid-level H.264/AVC Information Utilization, and Low-level SIMD and Assembly Acceleration, and implemented on a distributed multi-core processors system, achieving 720 p at 30 Hz H.264/AVC to HEVC transcoding in real time.The Mid-level H.264/AVC Information Utilization is based on the multi-stage decoding information utilization framework.In this framework, the decoding information is grouped into different layers and the utilization process is divided into different stages, in which decoding information in particular layer will be utilized only when the corresponding information in the upper layer has been used in the previous stage.Thus, the decoding information can be used as much as possible to maximize the efficiency of the transcoding.In this paper, we apply this framework on HEVC to AVS2 transcoding, and, as the experiments show, the framework helps the transcoder achieve a high speed gain efficiency.

Comparison between HEVC and AVS2 Standards
Though coming from different series of coding standards, HEVC and AVS2 are quite similar in many aspects in terms of their coding tools, which provides some convenient ways of transcoding HEVC bitstream to AVS2 bitstream.One of the important differences, which needs to be pointed out, is that in AVS2 a new frame type called F frame (Forward-predicted frame) is proposed and several new prediction techniques are added for these new types of frames.

Partition Sizes
To improve compression performances on high resolution videos, both of HEVC and AVS2 replace the fixed size macroblock partitioning methods with the new quad-tree partitioning structure at a range from 4 × 4 to 64 × 64.
For intra predicted blocks, both HEVC and AVS2 support square 2N × 2N and N × N partitioning from 4 × 4 to 64 × 64, while in AVS2 a new Short Distance Intra Prediction (SDIP) technique is added to support Coding Units (CUs) parted into four horizontal or vertical strips, that is, 2N × 0.5N and 0.5N × 2N.SDIP is more adaptive to the image content especially in an edge area, but calculating the SDIP modes increases the computational complexity, so it is not applied to 64 × 64 CU.

Intra Prediction
Besides the introduction of SDIP modes, the intra prediction modes in AVS2 are also different from those in HEVC.There are 35 intra modes in HEVC including DC mode, Planar mode, and 33 angular modes, while there are 33 modes in AVS2 including DC mode, Plane mode, Bilinear mode and 30 angular modes.In terms of the angular modes, the degree difference between adjacent angular modes is 5.625 degree in HEVC and 7.5 degree in AVS2, which means HEVC has a smaller partitioning on angular intra modes.

Inter Prediction
Both HEVC and AVS2 have made some improvements to extend P frames in previous standards because P frames are usually refer to and are closely relevant to coding performance.In HEVC, reference lists are extended to include both forward and backward reference pictures, making it possible for B frames to use two of forward reference pictures or backward reference pictures, or one of each list.Thus, P frames can be replaced by B frames using two forward reference pictures to get compression gains.In AVS2, F frames are introduced and a temporal Dual Hypothesis Prediction (DHP) technique is designed for these frames with a signaled MV for a main reference picture and an implicit MV for another reference picture calculated according the time distance between two reference pictures.Besides, a spatial Directional Multi-Hypothesis (DMH) technique is introduced to average two near blocks oppositely located around the best MV point in Motion Compensation (MC) process.
For conventional B frames, AVS2 adds a new Symmetric prediction mode along with Forward, Backward, and Bidirectional modes.The Symmetric mode is similar to the Dual mode of DHP, but the MV of the forward reference picture is signaled with the MV of the backward reference picture calculated from the time distance between two reference pictures.
For Direct/Skip modes, HEVC uses the Merge mechanism which chooses a candidate from those derived from the neighbor blocks and uses a flag to indicate whether it is Skip or not.In AVS2, corresponding to the new prediction modes, Weighted, Complementary and Multi-Hypothesis Direct/Skip modes are added besides the conventional ones.Since the MV derivation mechanisms are not so helpful during transcoding, the details of both Direct/Skip modes will not be discussed here, and we only take their MVs and reference pictures into consideration during the transcoding process.

Other Coding Tools
Apart from the code tools mentioned above, there are some other differences in coding tools between HEVC and AVS2.In an in-loop filter, AVS2 adds an Adaptive Loop Filter (ALF) after the Deblock Filter (DBF) and Sample Adaptive Offset (SAO).HEVC uses Discrete Cosine Transform (DCT) along with 4 × 4 Discrete Sine Transform (DST), while AVS2 uses Integer Transform (IT) along with 64 × 64 Logical Transform (LOT).For entropy coding methods, HEVC uses Content Adaptive Binary Arithmetic Coding (CABAC) while AVS2 uses Adaptive Entropy Coding (AEC).In addition, AVS2 adds special coding tools for background frame coding.During the transcoding process we will keep these coding tools work as though they are in the encoding process.
Table 1 summarizes the differences in coding tools between HEVC and AVS2.

The Proposed Algorithms
We proposed a fast H.264/AVC to HEVC transcoding algorithm based on multi-stage decoding information utilization framework in [9].In this framework, H.264/AVC decoding information, including partition size, prediction mode, reference pictures and MVs, is grouped into different layers and the utilization process is divided into several stages.Only after the processing of the information in the upper layer in the previous stage has been completed, can the corresponding information in the lower layer be processed in the next stage.
Instead of three stages as we used in H.264/AVC to HEVC transcoding, two stages are used in HEVC to AVS2 transcoding as is shown in Figure 1, including (1) partition size and mode decisions; (2a) reference picture decisions and MV estimation; (2b) intra mode decisions.

Fast Partition Mode Decision
Although all partition modes from HEVC can be found in AVS2, we cannot just simply map the same partition modes from HEVC to AVS2.For example, if the target bitrates of transcoding are much less than the original bitrate of the input HEVC bitstream, the partition sizes should be larger than those in the HEVC bitstream.To simplify the discussion, we make an assumption that the bitrates of transcoded bitstreams are smaller than the original bitrate of the input HEVC bitstream, based on the fact that larger bitrates of transcoded videos than that of the original input video is less reasonable since there will always be quality loss after transcoding.We divide the Fast Partition Mode Decision stage into three stages according the comparison between the partition depth of the CU in AVS2 and in HEVC.
We first consider the case where the partition depth of CU in AVS2 is smaller than that in HEVC.With the assumption that the bitrates of transcoded bitstreams are smaller, it is unlikely that smaller partitions are used in AVS2 than in HEVC.Thus, we limit the maximum partition depths of CUs in AVS2 to the partition depths of the corresponding CUs in HEVC.
Then we consider the case where the partition depths of CU in AVS2 are larger than that in HEVC.With the same assumption, it is likely that larger partitions are used in AVS2 than in HEVC.In the AVS2 encoding process, the recursive mode examinations of CU are conducted bottom-up, that is, the smallest CUs are checked first to get the best modes and then we check the larger ones.So we always can get the results of mode decisions of the next depth when checking the modes in the current depth, except if we are already in the maximum depth.Thus, we can filter some more possible modes based on the modes used in the next depth.We conduct similar algorithms with the Fast Partition Mode Decision in H.264/AVC to HEVC transcoding described in [9] as follows.Examples for each condition are given in Figure 2. Finally we consider the case where the partition depths of CU in AVS2 and in HEVC are equal.Since the CUs are of the same size, we check the same mode in AVS2 with that in HEVC.If an intra mode is used in the CU in HEVC, we check two more SPID modes in AVS2.Because intra blocks are less possible to be used in B frames, we do not check SPID modes in B frames.

Fast Reference Picture Decision and Motion Vector Estimation
For B frames, there is only one forward and one backward reference picture in the default settings, so there is no need to make the reference picture decision.For F frames, corresponding to Generalized P and B (GPB) frames in HEVC, there are at most two reference pictures from HEVC.We check all the reference pictures of the HEVC PUs covered by the current block.
In the Motion Estimation (ME) process of AVS encoding, the uni-referred forward and backward prediction will be first processed to get the best integer MVs of each reference picture from both lists and then refine them to sub-pixel level.For the multi-referred dual, symmetric and bidirectional prediction, the MV searching will start from the best integer MVs acquired in the forward or backward prediction and then search around it to do sub-pixel refinements.Therefore, we first convert the corresponding HEVC MVs into an integer level and then substitute the best integer MVs of each reference picture in the AVS2 ME process with the converted HEVC MVs to skip the original integer pixel searching.The sub-pixel refinements of multi-referred predictions are kept unchanged.

Fast Intra Mode Decision
There are total 33 intra modes in AVS2, including DC mode, Plane mode, Bilinear mode and 30 angular modes.Without interpolating the pixels, non-angular modes do not take much time to calculate, so we check all the three non-angular modes.For angular modes, we check four to five angels of AVS2 modes around the angel of HEVC modes.The mapping from HEVC angular modes to AVS2 angular modes can be expressed as follows: where x is the index of the HEVC angular mode and Y is the index set of the AVS2 angular modes to be checked.
If the HEVC mode is non-angular, we check all the AVS2 intra modes.

Experimental Results
Experiments were conducted with standard HEVC test clips to evaluate the speed and Rate-Distortion (R-D) performances of our proposed transcoding algorithm.Test tests were run on a server with two Intel Xeon 2.4 GHz CPUs and 64 GB RAM running Linux operating system.
The RD-14.0 AVS2 reference software is used in the full decoding-and-encoding transcoding method, which is the baseline compared to our proposed fast transcoding algorithms in the experiments.We use the encoder_ra.cfgdefault configuration file to configure RD-14.0 with a Random Access frame structure of 7 B frames and 1 F frame and an intra period of 48 frames.Our proposed algorithms are implemented based on RD-14.0 and the same configuration file is used.QPs {35, 40, 45, 50} are used for each test clip.
The input HEVC bitstreams of each test clip are generated using HM-16.0HEVC reference software with the default Random Access configuration and a QP of 27.Sharing the same HEVC decoding procedure, we only compare the performances on the AVS2 encoding side, where the AVS2 encoder encodes the decoded frames from HEVC decoder into AVS2 bitstreams.To observe the performance changes between each stage of our proposed algorithms, we record stage changes and accumulative changes of all the stages and sub-stages in our proposed algorithms as is shown in Table 2. Figure 3 shows the Rate-Distortion (R-D) performances of the proposed transcoder against the full decoding-and-encoding AVS2 transcoder using RD-14.0.

Discussion
Table 2 shows the performance changes of each stage, where I-1, I-2 and I-3 represent the three sub-stages in Fast Partition Size and Mode Decision described in Section, II-A represents the Fast Reference Picture Decision and MV Estimation, and II-B represents the Fast Intra Mode Decision.
For stage I-1, the limitation of maximum partition depth of AVS2 CUs brings 2.7×-4.2×speed gains with ignorable R-D performance loss.For stage I-2, the filtering of larger partition modes achieves a speed gain around 1.3× with about a 4% BD-Rate loss.For stage I-3, the mapping of partition modes between CUs of same depth brings around 2× speed gains with an average 4.4% RD-Rate loss.For the whole Fast Partition and Mode Decision stage, the algorithms achieve total 7×-10× speed gains with an average 9.6% R-D performance loss.Although the speed gains achieved are not so significant compared with the corresponding BD-Rate loss, this stage makes it possible for the subsequent stages of the transcoding process to use the rest decoding information more efficiently, leading to a good overall speed gain for the entire transcoding process and an acceptable R-D performance for many applications.
For stage II-A, the Fast Reference Picture Decision and MV Estimation brings a speed gain of around 1.5× with an average 1% BD-Rate loss.For stage II-B, the Fast Intra Mode Decision achieves about 1.1× speed gains and the average BD-Rate loss is within 1%.This stage makes good speedup efficiency since the usages of decoding information in this stage are more reliable after the last stage.
The overall transcoding algorithms achieves an 11×-17× speed gain with a 9.6%-16.6%R-D performance loss.By the calculation of Speedup Efficiency presented in [9], the efficiencies of the proposed algorithms for the test clips are 117, 110, 70, 141, which are relatively higher than the H.264/AVC to HEVC transcoder in [9].

Materials and Methods
The RD-14.0 AVS2 reference software is retrieved from AVS Work Group [22].The test clips used in the experiments are selected from the HEVC standard test sequences.

Conclusions
In this paper, we proposed a fast HEVC to AVS2 transcoding algorithm based on the multi-stage decoding information utilization framework.By utilizing HEVC decoding information in a multi-stage approach, the proposed algorithms achieve a speed gain of 11×-17× more than the the RD-14.0AVS2 reference software, with an R-D performance loss of 9.6%-16.6%.This provides an efficient way to transcode existing HEVC video contents into AVS2 video contents in a high processing speed with an acceptable compression performance.

Figure 1 .
Figure 1.Fast HEVC to AVS2 Transcoding Algorithms Based on Multi-stage Decoding Information Utilization Framework.

Figure 2 .
Figure 2. (a-d) Examples for when conditions 2-5 are applied; (e-h) Examples for when conditions of Asymmetric Motion Prediction (AMP) modes are applied; (i,j) Examples for when conditions of Short Distance Intra Prediction (SDIP) modes are applied.

Figure 3 .
Figure 3. R-D Performance of Proposed Transcoder on HEVC Test Clips.

Table 1 .
Comparisons on Coding Tools between High Efficiency Video Coding (HEVC) and Audio and Video coding Standard (AVS2).

Table 2 .
Performance in stages of proposed transcoding algorithms.