Entropy-Based Fast Largest Coding Unit Partition Algorithm in High-Efficiency Video Coding

High-efficiency video coding (HEVC) is a new video coding standard being developed by the Joint Collaborative Team on Video Coding. HEVC adopted numerous new tools, such as more flexible data structure representations, which include the coding unit (CU), prediction unit, and transform unit. In the partitioning of the largest coding unit (LCU) into CUs, rate distortion optimization (RDO) is applied. However, the computation complexity of RDO is too high for real-time application scenarios. Based on studies on the relationship between CUs and their entropy, this paper proposes a fast algorithm based on entropy to partition LCU as a substitute for RDO in HEVC. Experimental results show that the proposed entropy-based LCU partition algorithm can reduce coding time by 62.3% on average, with an acceptable loss of 3.82% using Bjontegaard delta rate.


Introduction
As the next generation of video coding standards, high-efficiency video coding (HEVC) [1] aims to reduce bit rate in half with the same reconstructed video quality as H.264/AVC, which is the latest-generation video coding.Many useful tools are adopted in HEVC, such as Sample Adaptive Offset, Motion Vector Merging, Merge Skip and Residual Quadtree Transform [2].
OPEN ACCESS HEVC provides a larger coding unit (CU), which is fixed 16×16 size in H.264, and a more flexible quadtree structure.The CU sizes of HEVC are 128 × 128, 64 × 64, 32 × 32, 16 × 16, 8 × 8, and 4  4. One Largest Coding Unit (LCU) can be split into four equal-sized CUs, and one CU can be encoded or split into four equal-sized CUs [3].This split only ends when the CU reaches the smallest CU.To find the optimized combination of CUs, the encoder has to fully search all possible CUs. Figure 1 shows an LCU partitioned into CUs.Whether or not a CU whose size is larger than smallest CU is encoded or split into four equal-sized CUs is decided by using a rate distortion optimization (RDO).This ergodic process searches and encodes all possible CUs to choose the CUs with the smallest rate distortion (RD) cost.Using a more flexible quadtree structure results in a more efficient encoder [5] and can bring coding gain by effectively adapting the diversity of picture content.According to [6], a new coding tree structure with a 64 × 64-sized LCU can bring nearly 12% bitrate reduction on the average compared with a 16 × 16-sized LCU. Figure 2 shows an image partitioned by H.264 and HEVC.In Figure 2b, the red lines represents the edge of LCUs (64 × 64) and the larger CU results in a more focused encoder [7].The areas with lesser information content are partitioned into larger CUs.By contrast, the areas with more information content are partitioned into smaller CUs.
Although a larger CU can bring significant bitrate reduction, the HEVC encoder has to search for all possible CUs to obtain the optimized CUs, resulting in an extremely large computation complexity [8].To find the optimal CUs, the computation burden is equivalent to encoding an LCU four times because the encoder has to encode the CUs with sizes 64 × 64, 32 × 32, 16 × 16, and 8 × 8. Given that an encoder creates the optimal partition plan only once, 75% of the computation burden is therefore wasted.The large burden computation is not appropriate for many applications of video coding, such as real-time application scenario.We thus proposed a new algorithm to avoid the large computational redundancy in encoding.
Some related proposals on complexity reduction for intra coding in HEVC.Cho [9] proposed a fast splitting and pruning method which performed in two complementary steps: (i) early CU split decision; and (ii) early CU pruning decision.Piao et al. [10] presented a rough mode decision (RMD) method to prescreen the most possible candidate modes for the intra prediction coding of HEVC by computing low-complexity RD cost.Shen [11] proposed a CU size decision algorithm, which collects relevant and computational-friendly features to assist decisions on CU splitting.These related works can only got a 50% coding time reduction at most, and none of them take the information content of CU into account of LCU partition.In this paper, we propose an entropy-based fast CU-sized decision algorithm to replace the RDO used in the quadtree structure.This paper is organized as follows: Section 2 briefly introduces the principle of proposed algorithm.In Section 3, we elaborate on the entropy-based fast CU-sized decision algorithm.Experimental results are shown in Section 4, and Section 5 concludes our study.

Principle of the Proposed Algorithm
This study shows that the CUs partitioned by RDO process closely relate to the information content of each CU. Figure 2b shows that the partition of LCUs is related to the information content.Given that Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content, this paper proposed a Shannon entropy [12] technique to replace the RDO in LCU partition.

Proposed CU-Sized Decision Algorithm
The proposed algorithm is introduced in this section.Figure 3 shows the flowchart of our proposed algorithm.
As introduced in the previous section, the key point of the proposed algorithm is to find the relationship between the selected CUs and the entropy of these CUs.Based on this point, the CUs partitioned by using the proposed algorithm can have maximum similarity to the optimized CUs, which is the aim of this study.

Entropy of Each CU
This section shows the calculation of the CU-sized decision algorithm.The equation for the entropy is expressed as follows: In this equation, H(x) is the entropy, p presents the probability of the factor i, and j is the number of factors.To obtain the information content of CUs, we calculated the entropy of all the possible CUs in an LCU.However, before the calculation, the background noise of the LCU was first dislodged.The background noise is the pixel value difference that should not exist among neighbor pixels.Figure 4 shows that the area is very smooth and the pixel values seem to be the same.However, because of the background noise, the pixel values slightly differed from one another, which resulted in an inaccurate description of the information content using entropy.To dislodge the background noise without high computation complexity, we used an anti-ground noise filter.We adopted 8 as the stepper to quantize all the 256 pixel values.Up to 32 (0-31) pixel values were left, which provided a good condition for the subsequent work.We then calculated the entropy of all possible CUs in the LCU.A total of 85 possible CUs were available in the LCU, including one 64 × 64-sized CU, four 32 × 32-sized CUs, sixteen 16 × 16-sized CUs, and sixty-four 8 × 8-sized CUs .We then counted the possibility of the appearance of each pixel in a CU.This possibility was used in the calculation of the entropy.The equation of the possibility count is expressed as: where N is the number of pixels in a CU and n is the number of the pixels whose values are i.
We then calculated the entropy of each CU using the following equation: A total of 85 values for the entropy were calculated, and the results were used as the base for the CU partition.

Threshold and Judgment of the Proposed Algorithm
Based on the theory in Section 2 and the entropy of all the possible CUs, we conducted several experiments to find the relation between the entropy of each CU and the CUs which were partitioned by RDO.To achieve maximum similarity between the proposed CUs and the optimized CUs, we established some rules and thresholds.Through our search, we found these principles to partition LCU: a.If the entropy of the CU is extremely small, the CU is likely to terminate its partition.b.If the entropy of the CU is extremely large, the CU is likely to be partitioned.c.The CUs whose entropy is approximately the average of all the possible CUs typically appear in the final partitioning map.
Based on these rules, we searched for several thresholds.Taking every video sequence into the consideration, we count the relationship between the optimal CUs which is partitioned by HEVC and the corresponding entropy values.After that we count the entropy value of the CU which is split in HEVC.As we got two kinds of entropy values, we sort these entropy values separately.Then we use formulas ( 4)-( 6) to calculate the threshold for CU partitioning: .
T a , T b , T c , represent the thresholds for principle a, b, and c.E optimalxx% presents the top xx% entropy value in the sort ascending of optimal CU entropy values.E splitxx% presents the bottom xx% entropy value in the sort ascending of split CU entropy values.For example, if there are 24 optimal CUs, 36 split CUs in a LCU, E optimal60% means the 14th entropy value in the sort ascending of optimal CU entropy value, E split60% means the 21st entropy value in the sort ascending of optimal CU entropy value.E Average presents the average of all the entropy values in a LCU.The percentage values, 60%, 10%, 90%, are got with the consideration of the loss of BD-rate.
During our research, we found thresholds , which can provide the maximum similarity between the proposed CUs and optimized CUs are almost the same in most video sequences.So, with the consideration of every video sequence, we get the best thresholds: a. CUs whose entropy is smaller than 1.2 will not be partitioned.b.CUs whose entropy exceed 3.5 will be partitioned.c.CUs whose entropy is 0.15 bigger or 0.15 smaller than the average entropy will not be partitioned.
We can distinguish whether a CU is split or not by determining the thresholds.Thus, we can partition an LCU right after we obtain the entropy values.Figures 5(a,b) show the difference between the proposed CUs and the optimized CUs.The parts with red lines in Figure 5(a) are the parts that did not match with the optimized CUs.Figures 5(a,b), as an example, show the similarity between proposed CUs and optimal CUs.Table 1 shows that the proposed algorithm obtained nearly 70% similarity to the partition of the RDO on average in HEVC.The similarity is calculated by formula (7): N CU represents the number of CU in a LCU, n match represents the number of CU which matches the optimal CU.

Experimental Results
Up to 300 frames of each sequence were coded to test the performance of the proposed algorithm, and the test condition is"All Intra-Main" (AI-Main) [13].QP values are set to 22, 27, 32, 37.
A computer with a 2.8 GHz core was used in this experiment.To fully determine the performance of the proposed algorithm, we used HM10.0 with 16 × 16-sized LCU for the comparison.
We used Equation 7 to measure the reduced coding time: T HM . is the coding time of HM10.0 with RDO, T P is the coding time of HM10.0 with the proposed algorithm, and T stands for the time reduction.Table 2 shows that on average, the coding time of HM10.0 resulted in a 62.0% reduction.The Bjøntegaard delta (BD) rate [13] exhibited a 3.68% loss.A 0.10% decrease at PSNR was also observed.Figure 6 shows the curves of HM10.0 with the proposed algorithm, HM10.0 and HM10.0 with 16 × 16-sized LCU. Figure 6 shows that the proposed curve was extremely near the curve of HM10.0 and was better than the curve of HM10.0 with small LCU (16 × 16).

Conclusions
In this paper, we propose a new technology to partition LCU in HEVC.The proposed algorithm aims to highly reduce the computation complexity with an acceptable loss of BD rate.Based on the results shown, the proposed algorithm significantly reduced the coding time with an acceptable decrease in quality, therefore, entropy-based fast LCU partition is a topic worthy of further investigation.

Figure 2 .
Figure 2. Example of the CU partitioning of HEVC: (a) Partitioned by H.264 and (b) Partitioned by HEVC.

Figure 3 .
Figure 3. Flowchart of the proposed algorithm.

Figure 4 .
Figure 4. Example of the anti-ground noise filter.

Figure 5 .
Figure 5. (a) CU presentation of the sequence RaceHorsesC optimized by HM10.0 using the proposed algorithm.(b) CU presentation of the sequence RaceHorsesC optimized by HM10.0 with RDO.

Table 1 .
The similarity between proposed CUs and optimal CUs.

Table 2 .
Results for the proposed algorithm.