# Fast Rate Estimation for RDO Mode Decision in HEVC

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. RDO Bottlenecks in H.265/HEVC

_{ORG}being compressed. Residuals CU

_{RES}are subject to transform (T) and quantization (Q). The encoder should compress data B with the help of the binary arithmetic coder (BAC) in order to estimate the compression bit rate R. Meanwhile, the compression distortion D involves the inverse quantization (IQ) and inverse transform (IT). Restored residuals are summed up with the block prediction P to get the reconstructed block CU

_{REC}. The distortion metric D is based on the comparison of the initial block CU

_{ORG}and the reconstructed block CU

_{REC}. Estimated values R and D are used to get the rate-distortion cost RD

_{cost}of a block coding decision.

_{cost}value estimation in fact requires full block compression, it can be estimated that there are at least 341·35=11,935 extra compressions of the same CTU in the intra-coding algorithm with full RDO (35 intra-prediction modes for 341 possible sub-blocks). Needless to say, none of the industrial compression systems utilize full RDO, as it does not provide any reasonable compression time. The number of RDO candidates is usually reduced [11]. However, still, the complexity of the compression algorithm remains rather high.

_{0}of SBAC is determined after coding the block (n − 1). In a conventional decision algorithm, we have to estimate the compression rate after each of 35 intra-predictions of the n–th block to choose the coding option with the least RD

_{cost}. Let b

_{i}correspond to the data to be arithmetically coded for a coding option with intra-prediction mode i, i ∈ [0; 34], i = Z. Then, r

_{i}corresponds to the number of bits that SBAC produces to represent the i–th coding option. To get RD

_{cost}for the zeroth coding option, we need to estimate r

_{0}by coding each binarized symbol of b

_{0}and changing the state of SBAC. Let the state of SBAC after coding all symbols of b

_{0}be S

_{0,0}. Once we have r

_{0}, we are able to calculate RD

_{cost}for the zeroth coding option.

_{0}of SBAC as soon as it is the only correct state for estimation of this coding option. The only way to restore the SBAC state is to preserve it beforehand and copy back the values of the coding interval and all context models. Preserving the state of SBAC requires copying at least 512 bytes (at least one byte for each of the 512 context models). The same rate estimation operations are performed for the rest of the 33 intra-prediction modes of this very block. Then, the state of SBAC is updated, and the 35 intra-coding options for the block (n + 1) are estimated. Given that there are 341 possible blocks in a CTU partitioning, this copying consumes a considerable part of the RDO time.

_{cost}calculation. The usage of SBAC as a bit counter requires not only state preserving, but also produces extra operations of interval calculations (at least 24 bytes per operation).

## 3. Rate Estimation Observations

#### 3.1. Splitting Data Rate Estimation

#### 3.2. Prediction Data Rate Estimation

^{5}possible modes are left.

**C**) and Four People (Class

**E**), N × N partitioning produces less bits per context-dependent bin compared to 2N × 2N partitioning. At the same time, Class

**C**video sequence RaceHorses produces almost an identical number of bits for both partitioning modes. Those video sequences have more details, and 4 × 4-pixel prediction is used more often than in other sequences. Therefore, the obtained estimation of the entropy value when used in RDO will increase the RD

_{cost}of 4 × 4 prediction units, and they will be chosen less often. This influence of RD

_{cost}might change the frequency of N × N partitioning selection and provide a higher bit rate overhead for the mentioned video sequences.

**C**), RaceHorses, BQSquare and BlowingBubbles (Class

**D**), have almost equal probability of MPM and LPM. The replacement of BAC-based estimation with the precalculated entropy estimation should force the encoder to increase the selection of MPM in the RDO process for those sequences.

#### 3.3. Residual Data Rate Estimation

_{RES}on Figure 1) is the data left after the subtraction of predicted pixel values from the original pixel values of a coding unit. The residual pixel values are subject to discrete transform and quantization. Residual data coding in HEVC is performed on the level of a transform block (TB). TB is a matrix of transformed and quantized residuals. Its size can be 32 ×32, 16 ×16, 8 ×8 or 4 ×4. The coefficients are scanned in diagonal, horizontal or vertical order depending on the intra-prediction mode (Figure 5). Any TB is scanned within 4 ×4 sub-blocks. The diagonal scan starts from the bottom right corner and proceeds to the top left corner, scanning each diagonal line in the direction from top right to bottom left. The scan pattern consists of a diagonal scan of the 4 ×4 sub-blocks and a diagonal scan within each of the 4 ×4 sub-blocks [16]. Horizontal and vertical scans (Figure 5) may also be applied in the intra-case for 4 ×4 and 8 ×8 TBs. The horizontal and vertical scans are defined row-by-row and column-by-column, respectively, within the 4 ×4 sub-blocks. The scan over the 4 ×4 sub-blocks is the same as within the sub-block.

- Significant coefficient flag (SC);
- Significant coefficients group flag (SG);
- Last significant coefficient position X (PX);
- Last significant coefficient position Y (PY);
- “Coefficient value greater that one” flag (G1);
- “Coefficient value greater than two” flag (G2);
- Coefficients signs (SIGNS);
- Remaining coefficient level (RL).

- Significant coefficient flag (SC);
- Significant coefficient group flag (SG);
- Last significant position X (PX);
- Last significant position Y (PY);
- Coefficient value greater that one flag (G1);
- Coefficient value greater than two flag (G2);
- Bypass mode (BP)

_{i}) of the symbols B

_{i}within the i–th context group [4,17]:

_{i}(j) is the occurrence probability of the symbol j ∈ Z, 0 ≤ j ≤ 1 in the message B

_{i}. Considering that N

_{i}= S(B

_{i}) is the total number of symbols in the message B

_{i}, the estimated bit rate will be:

_{7}) = 1.0 for bypass-coded bins, and seven is the number of context groups.

_{R}are the corresponding standard deviations. The correlation coefficient between our estimation $\widehat{R}(B)$ and real value R(B) for all test sequences roughly equals 0.999.

## 4. Proposed Fast Rate Estimation Algorithm

_{i}. The entropy value H(B

_{i}) of bins in the B

_{i}syntax group is used to estimate the bit rate of a message. We also count the bit size of the prediction header as described in Subsection 3.2. The overall estimation $\widehat{R}(B)$

_{RD}estimation. Furthermore, the elimination of the arithmetic coder reduces block interdependencies that might be used in the development of the RDO estimation algorithm with the parallel processing of several blocks.

## 5. Results and Discussion

_{YUV}) is calculated as the weighted sum of the PSNR per picture of the individual components (PSNR

_{Y}, PSNR

_{U}and PSNR

_{V}) [20]:

_{Y}, PSNR

_{U}and PSNR

_{V}are each computed as:

_{AVG}is the average MSE value for the video frames:

_{Y}value of the luma color component (BD-RATE (Y)). Furthermore, we calculate Bjontegaard delta PSNR values BD-PSNR (Y) and BD-PSNR (UV) as an average delta PSNR for the luma and chroma components, respectively. The value ∆T represents the average savings of the compression time of the HM reference encoder for a certain test sequence:

**E***, is 1.54% and 1.56% without Class

**E*** test sequences. The compression efficiency loss on the 10-bit sequences, Nebuta and SteamLocotive, is only 0.43% and 0.41%, respectively. These sequences have smooth textures that neglect the mode decision errors. On the other hand, sequences BQMall, PartyScene, BQSquare, BlowingBubbles, BasketballPass and SlideEditing have a lot of details that reveal the mode decision errors. This results in a 1.96%–2.35% bit rate overhead.

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Sharabayko, M.; Ponomarev, O.; Chernyak, R. Intra-Compression Efficiency in VP9 and HEVC. Appl. Math. Sci.
**2013**, 7, 6803–6824. [Google Scholar] - Grois, D.; Marpe, D.; Mulayoff, A.; Itzhaky, B.; Hadar, O. Performance Comparison of H.265/MPEG-HEVC, VP9, and H.264/MPEG-AVC Encoders. Proceedings of 30th Picture Coding Symposium 2013 (PCS 2013), San Jose, CA, USA, 8–11 December 2013; pp. 394–397.
- Sharabayko, M. Next Generation Video Codecs: HEVC, VP9 and DAALA. In Youth and Contemporary Information Technologies; Tomsk Polytechnic University: Tomsk, Russia, 2013; Volume 13, pp. 35–37. [Google Scholar]
- Berger, T. Rate Distortion Theory: Mathematical Basis for Data Compression (Prentice-Hall Series in Information and System Sciences); Prentice-Hall: Endlewood Cliffs, NJ, USA, 1971; p. 352. [Google Scholar]
- Stockhammer, T.; Kontopodis, D.; Wiegand, T. Rate-distortion optimization for JVT/H.26L video coding in packet loss environment. In International Packet Video Workshop on Packet Loss Environment; Munich University of Technology: Munich, Germany, 2002; pp. 1–12. [Google Scholar]
- Sullivan, G.J.; Wiegand, T. Rate-distortion optimization for video compression. IEEE Signal Process. Mag.
**1998**, 15, 74–90. [Google Scholar] - Wiegand, T.; Girod, B. Lagrange multiplier selection in hybrid video coder control. In Proceedings of International Conference on Image Processing; IEEE: Thessaloniki, Greece, 2001; Volume 3, pp. 542–545. [Google Scholar]
- Bossen, F. CE1: Table-based bit estimation for CABAC. In Document of ITU-T Q.6/SG16 JCTVC-G763; ITU-T: Geneva, Switzerland, 2011. [Google Scholar]
- Sheng, Z.; Zhou, D.; Sun, H.; Goto, S. Low-Complexity Rate-Distortion Optimization Algorithms for HEVC intra-Prediction. In MultiMedia Modeling; Springer International Publishing: Berlin, Germany, 2014; Lecture Notes in Computer Science; Volume 8325, pp. 541–552. [Google Scholar]
- HEVC software repository. Available online: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/ accessed on 17 December 2014.
- Zhao, L.; Zhang, L.; Ma, S.; Zhao, D. Fast mode decision algorithm for intra-prediction in HEVC. Proceedings of Conference of Visual Communications and Image Processing (VCIP), Tainan, Taiwan, 6–9 November 2011; pp. 1–4.
- Sole, J.; Joshi, R.; Nguyen, N.; Ji, T.; Karczewicz, M.; Clare, G.; Henry, F.; Duenas, A. Transform Coefficient Coding in HEVC. IEEE Trans. Circ. Syst. Video Tech.
**2012**, 22, 1765–1777. [Google Scholar] - Recommendation ITU-T H.265: High Efficiency Video coding. 2013. Available online: http://www.itu.int/dms_pubrec/itu-t/rec/h/T-REC-H.265-201304-S!!SUM-HTM-E.htm accessed on 19 December 2014.
- Bossen, F. Common test conditions and software reference configurations. ITU-T SC16/Q6, 11th JCT-VC Meeting, Shanghai, China, 10–19 October 2012. Doc. JCTVC-K1100.
- Bossen, F. Common test conditions and software reference configurations. ITU-T SC16/Q6, 2nd JCT-VC Meeting, Geneva, Switzerland, 21–28 July 2010. Doc. JCTVC-B300.
- Sole, J.; Joshi, R.; Karczewicz, M. Non-CE11: Diagonal Sub-Block Scan for HE Residual Coding. Input Document to JCT-VC JCTVC-G323, 7th Joint Collaborative Team on Video Coding (JCT-VC) Meeting, Geneva, Switzerland, 2011.
- Shannon, C.E. A mathematical theory of Communication. Bell Sys. Techn. J
**1948**, 27, 379–423. [Google Scholar] - JCT-VC test sequences. Available online: ftp://ftp.tnt.uni-hannover.de/testsequences accessed on 17 December 2014.
- Bjontegaard, G. Improvements of the BD-PSNR model. ITU-T SC16/Q6, 35th VCEG Meeting, Berlin, Germany, 16–18 July 2008. Doc. VCEG-AI11.
- Ohm, J.; Sullivan, G.; Schwarz, H.; Tan, T.K.; Wiegand, T. Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC). IEEE Trans. Circ. Syst. Video Tech.
**2012**, 22, 1669–1684. [Google Scholar] - Wang, J.; Yu, X.; He, D. On BD-Rate calculation. Presented at 6th JCT-VC Meeting, Torino, Italy, 14–22 July 2011. Doc. JCTVC-F270.
- Sharabayko, M.; Ponomarev, O. Intra Prediction Header Bits Estimation Algorithm for RDO in H.265/HEVC. Appl. Math. Sci.
**2014**, 8, 8721–8736. [Google Scholar]

**Figure 4.**Illustration of the neighbor PUs, whose intra-prediction modes are the MPM (most probable mode) for the current PU.

**Figure 8.**Average percentage of bins by context groups for the input of the SBAC on the BasketballDrill video sequence for (

**A**) 4 ×4 transform unit (TU), (

**B**) 8 ×8 TU, (

**C**) 16 ×16 TU and (

**D**) 32 ×32 TU.

**Figure 9.**Scatter plot of the rate estimation on the (

**A**) BasketballDrill and (

**B**) PeopleOnStreet test sequences.

**Figure 10.**Rate-distortion plots for the (

**A**) BQTerrace, (

**B**) Nebuta, (

**C**) SlideEditing and (

**D**) BasketballDrive test sequences.

**Table 1.**Average self-information of the context-dependent bins of the prediction unit. LPM, least probable mode.

Class | Partition mode | Luma prediction | Chroma prediction | |||
---|---|---|---|---|---|---|

2N × 2N | N × N | 1 (MPM) | 0 (LPM) | 0 (as luma) | 1 | |

A | 0.60 | 2.14 | 0.54 | 1.94 | 0.33 | 3.14 |

B | 0.85 | 1.66 | 0.47 | 2.14 | 0.37 | 2.95 |

C | 0.53 | 2.47 | 0.69 | 1.62 | 0.51 | 2.48 |

D | 0.48 | 2.23 | 0.81 | 1.37 | 0.49 | 2.38 |

E | 0.96 | 1.32 | 0.53 | 1.90 | 0.24 | 3.45 |

E* | 0.83 | 1.81 | 0.51 | 1.97 | 0.16 | 4.01 |

F | 0.37 | 2.66 | 0.49 | 2.09 | 0.32 | 3.28 |

Average | 0.65 | 2.06 | 0.58 | 1.86 | 0.36 | 3.04 |

**Table 2.**Compression efficiency loss of the proposed rate estimation algorithm. BD-RATE, Bjontegaard delta rate.

Class | Sequence | Resolution | HM intra-main | ||||
---|---|---|---|---|---|---|---|

BD-RATE, % | BD-RATE (Y), % | BD-PSNR (Y), dB | BD-PSNR (UV), dB | ∆T, % | |||

A | Traffic | 2560×1600 | 1.27 | 1.23 | −0.07 | −0.05 | −15.43 |

PeopleOnStreet | 1.47 | 1.51 | −0.09 | −0.05 | −16.68 | ||

Nebuta | 0.43 | 0.41 | −0.03 | −0.02 | −21.00 | ||

SteamLocomotive | 0.41 | 0.46 | −0.03 | 0.00 | −17.73 | ||

B | Kimono | 1920×1080 | 0.64 | 0.50 | −0.02 | −0.03 | −12.29 |

ParkScene | 1.41 | 1.44 | −0.06 | −0.04 | −16.55 | ||

Cactus | 1.61 | 1.58 | −0.06 | −0.04 | −16.98 | ||

BQTerrace | 1.70 | 1.73 | −0.10 | −0.04 | −19.66 | ||

BasketballDrive | 1.58 | 1.60 | −0.04 | −0.03 | −13.98 | ||

C | RaceHorses (C) | 832×480 | 1.35 | 1.32 | −0.09 | −0.06 | −18.79 |

BQMall | 2.02 | 1.89 | −0.12 | −0.10 | −19.00 | ||

PartyScene | 1.96 | 1.88 | −0.15 | −0.11 | −23.75 | ||

BasketballDrill | 1.76 | 1.67 | −0.08 | −0.08 | −17.20 | ||

D | RaceHorses (D) | 416×240 | 1.70 | 1.56 | −0.11 | −0.11 | −19.32 |

BQSquare | 2.24 | 2.17 | −0.19 | −0.12 | −24.21 | ||

BlowingBubbles | 1.96 | 1.81 | −0.11 | −0.10 | −20.52 | ||

BasketballPass | 2.02 | 1.93 | −0.11 | −0.10 | −17.36 | ||

E | FourPeople | 1280×720 | 1.60 | 1.64 | −0.09 | −0.06 | −15.33 |

KristenAndSara | 1.67 | 1.67 | −0.08 | −0.06 | −12.85 | ||

Johnny | 1.48 | 1.46 | −0.06 | −0.04 | −13.36 | ||

E* | Vidyo1 | 1280×720 | 1.45 | 1.52 | −0.07 | −0.03 | −13.16 |

Vidyo3 | 1.39 | 1.50 | −0.08 | −0.02 | −13.75 | ||

Vidyo4 | 1.29 | 1.43 | −0.06 | −0.02 | −12.84 | ||

F | BaskeballDrillText | 832×480 | 1.72 | 1.60 | −0.09 | −0.09 | −16.96 |

ChinaSpeed | 1024×768 | 1.60 | 1.39 | −0.13 | −0.16 | −18.21 | |

SlideEditing | 1280×720 | 2.35 | 2.25 | −0.36 | −0.26 | −22.06 | |

SlideShow | 1.57 | 1.48 | −0.09 | −0.07 | −11.89 | ||

Average | 1.54 | 1.50 | −0.10 | −0.07 | −17.07 |

© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sharabayko, M.P.; Ponomarev, O.G.
Fast Rate Estimation for RDO Mode Decision in HEVC. *Entropy* **2014**, *16*, 6667-6685.
https://doi.org/10.3390/e16126667

**AMA Style**

Sharabayko MP, Ponomarev OG.
Fast Rate Estimation for RDO Mode Decision in HEVC. *Entropy*. 2014; 16(12):6667-6685.
https://doi.org/10.3390/e16126667

**Chicago/Turabian Style**

Sharabayko, Maxim P., and Oleg G. Ponomarev.
2014. "Fast Rate Estimation for RDO Mode Decision in HEVC" *Entropy* 16, no. 12: 6667-6685.
https://doi.org/10.3390/e16126667