Hierarchical Reinforcement Learning-Based Adaptive Initial QP Selection and Rate Control for H.266/VVC
Abstract
:1. Introduction
1.1. The Role of Rate Control in Video Encoding
1.2. The Role of Initial Quantization Parameter Selection in Video Encoding
1.3. Paper Organization
2. Related Work
2.1. Research on Rate Control in H.266/VVC
2.2. Research on Initial Quantization Parameter Selection in Rate Control
2.3. Research on Reinforcement Learning in Rate Control
2.4. Research on Hierarchical Reinforcement Learning
3. Problem Statement
3.1. Background
3.2. Challenges and Requirements
3.3. Problem Definition
4. Model Construction
4.1. Hierarchical Reinforcement Learning Architecture
4.2. Definition of States, Actions, and Rewards
4.2.1. State Space
- (1)
- High-level State Sh:
- (2)
- Low-level State Sl:
4.2.2. Action Space
4.2.3. Reward Function
4.3. Reinforcement Learning Objective
4.4. Model Advantages
5. Algorithm Design
Algorithm 1. Training the High-Level Policy Network |
1: |
2: Batch sampling: Randomly sample a mini-batch of data (Sh, Ah, Sh, Sh′) from the offline dataset. |
3: . |
4: . |
5: . |
6: every fixed number of steps. |
Algorithm 2. Training the Low-Level Policy Network |
1: . |
2: Batch sampling: Randomly sample a mini-batch of data (Sl, Al, Sl, Sl′) from the offline dataset. |
3: . |
4: . |
5: . |
6: every fixed number of steps. |
6. Experimental Results and Analysis
6.1. Experimental Setup
6.2. Experimental Results Evaluation
6.3. Computational Complexity and HRL Model Convergence Analysis
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, Y.; Wang, S.; Ip, H.; Kwong, S. Rate distortion optimization with adaptive content modeling for random-access versatile video coding. Inf. Sci. 2023, 645, 119325. [Google Scholar] [CrossRef]
- Wei, X.; Zhou, M.; Wang, H.; Yang, H.; Chen, L.; Kwong, S. Recent advances in rate control: From optimization to implementation and beyond. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 17–33. [Google Scholar] [CrossRef]
- Liu, D.; Li, Y.; Lin, J.; Li, H.; Wu, F. Deep learning-based video coding: A review and a case study. ACM Comput. Surv. (CSUR) 2020, 53, 1–35. [Google Scholar] [CrossRef]
- Li, Y.; Chen, Z. Rate Control for VVC, Document, JVET K0390. In Proceedings of the JVET, 11th Meeting, Ljublijana, Slovernia, 10–18 July 2018. [Google Scholar]
- Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J. Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
- Gao, W.; Kwong, S.; Jiang, Q.; Fong, C.-K.; Wong, P.H.W.; Yuen, W.Y.F. Data-driven rate control for rate-distortion optimization in HEVC based on simplified effective initial QP learning. IEEE Trans. Broadcast. 2019, 65, 94–108. [Google Scholar] [CrossRef]
- Yang, Z.; Gao, W.; Li, G.; Yan, Y. Sur-driven video coding rate control for jointly optimizing perceptual quality and buffer control. IEEE Trans. Image Process. 2023, 32, 5451–5464. [Google Scholar] [CrossRef]
- Guo, H.; Zhu, C.; Xu, M.; Li, S. Inter-block dependency-based CTU level rate control for HEVC. IEEE Trans. Broadcast. 2019, 66, 113–126. [Google Scholar] [CrossRef]
- Li, Y.; Mou, X. Joint optimization for SSIM-based CTU-level bit allocation and rate distortion optimization. IEEE Trans. Broadcast. 2021, 67, 500–511. [Google Scholar] [CrossRef]
- Li, L.; Yan, N.; Li, Z.; Liu, S.; Li, H. λ-domain perceptual rate control for 360-degree video compression. IEEE J. Sel. Top. Signal Process. 2019, 14, 130–145. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, M.; Wang, S.; Ni, Z.; Kwong, S. A CTU-level screen content rate control for low-delay versatile video coding. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 5227–5241. [Google Scholar] [CrossRef]
- Mao, Y.; Wang, M.; Wang, S.; Kwong, S. High efficiency rate control for versatile video coding based on composite Cauchy distribution. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2371–2384. [Google Scholar] [CrossRef]
- Lin, J.; Huang, A.; Zhao, T.; Wang, X.; Kwong, S. λ-domain VVC rate control based on nash equilibrium. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 3477–3487. [Google Scholar] [CrossRef]
- Mao, Y.; Wang, M.; Ni, Z.; Wang, S.; Kwong, S. Neural network based rate control for versatile video coding. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6072–6085. [Google Scholar] [CrossRef]
- Wang, T.; Li, F.; Cosman, P.C. Learning-based rate control for video-based point cloud compression. IEEE Trans. Image Process. 2022, 31, 2175–2189. [Google Scholar] [CrossRef]
- Chen, Y.; Mao, Y.; Wang, S.; Zhang, X.; Kwong, S. Learning from Coding Features: High Efficiency Rate Control for AOMedia Video 1. IEEE MultiMedia 2023, 30, 16–25. [Google Scholar] [CrossRef]
- Zhao, Z.; He, X.; Xiong, S.; He, L.; Chen, H.; Sheriff, R.E. A high-performance rate control algorithm in versatile video coding based on spatial and temporal feature complexity. IEEE Trans. Broadcast. 2023, 69, 753–766. [Google Scholar] [CrossRef]
- Liao, J.; Li, L.; Liu, D.; Li, H. Content-adaptive Rate-Distortion Modeling for Frame-level Rate Control in Versatile Video Coding. IEEE Trans. Multimed. 2024, 26, 6864–6879. [Google Scholar] [CrossRef]
- Liu, F.; Chen, Z. Multi-objective optimization of quality in VVC rate control for low-delay video coding. IEEE Trans. Image Process. 2021, 30, 4706–4718. [Google Scholar] [CrossRef]
- Liu, H.; Zhu, S.; Zeng, B. Inter-frame dependency-based rate control for vvc low-delay coding. IEEE Signal Process. Lett. 2022, 29, 2727–2731. [Google Scholar] [CrossRef]
- Gao, W.; Jiang, Q.; Wang, R.; Ma, S.; Li, G.; Kwong, S. Consistent quality oriented rate control in HEVC via balancing intra and inter frame coding. IEEE Trans. Ind. Inform. 2021, 18, 1594–1604. [Google Scholar] [CrossRef]
- Yan, T.; Ra, I.H.; Wen, H.; Weng, M.-H.; Zhang, Q.; Che, Y. CTU layer rate control algorithm in scene change video for free-viewpoint video. IEEE Access 2020, 8, 24549–24560. [Google Scholar] [CrossRef]
- Chen, Y.; Kwong, S.; Zhou, M.; Wang, S.; Zhu, G.; Wang, Y. Intra frame rate control for versatile video coding with quadratic rate-distortion modelling. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4422–4426. [Google Scholar]
- Zhou, Y.; Xu, G.; Tang, K.; Tian, L.; Sun, Y. Video coding optimization in AVS2. Inf. Process. Manag. 2022, 59, 102808. [Google Scholar] [CrossRef]
- Pan, Z.; Yi, X.; Zhang, Y.; Yuan, H.; Wang, F.L.; Kwong, S. Frame-level Bit Allocation Optimization Based on Video Content Characteristics for HEVC. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2020, 16, 1–20. [Google Scholar] [CrossRef]
- HoangVan, X. Adaptive quantization parameter estimation for HEVC based surveillance scalable video coding. Electronics 2020, 9, 915. [Google Scholar] [CrossRef]
- Chen, Z.; Shi, J.; Li, W. Learned fast HEVC intra coding. IEEE Trans. Image Process. 2020, 29, 5431–5446. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.H.; Peng, W.H.; Chung, C.H. Reinforcement learning for HEVC/H. 265 intra-frame rate control. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar]
- Smirnov, N.; Tomforde, S. Real-time rate control of webrtc video streams in 5g networks: Improving quality of experience with deep reinforcement learning. J. Syst. Archit. 2024, 148, 103066. [Google Scholar] [CrossRef]
- Li, N.; Zhang, Y.; Zhu, L.; Luo, W.; Kwong, S. Reinforcement learning based coding unit early termination algorithm for high efficiency video coding. J. Vis. Commun. Image Represent. 2019, 60, 276–286. [Google Scholar] [CrossRef]
- Helle, P.; Schwarz, H.; Wiegand, T.; Müller, K.-R. Reinforcement learning for video encoder control in HEVC. In Proceedings of the 2017 International Conference on Systems, Signals and Image Processing (IWSSIP), Poznań, Poland, 22–24 May 2017; pp. 1–5. [Google Scholar]
- Chen, S.; Aramvith, S.; Miyanaga, Y. Learning-Based Rate Control for High Efficiency Video Coding. Sensors 2023, 23, 3607. [Google Scholar] [CrossRef] [PubMed]
- Ren, G.; Liu, Z.; Chen, Z.; Liu, S. Reinforcement learning based ROI bit allocation for gaming video coding in VVC. In Proceedings of the 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 5–8 December 2021; pp. 1–5. [Google Scholar]
- Zhang, H.; Li, J.; Li, B.; Lu, Y. A deep reinforcement learning approach to multiple streams’ joint bitrate allocation. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 2415–2426. [Google Scholar] [CrossRef]
- Zhou, M.; Wei, X.; Kwong, S.; Jia, W.; Fang, B. Rate control method based on deep reinforcement learning for dynamic video sequences in HEVC. IEEE Trans. Multimedia. 2020, 23, 1106–1121. [Google Scholar] [CrossRef]
- Hutsebaut-Buysse, M.; Mets, K.; Latré, S. Hierarchical reinforcement learning: A survey and open research challenges. Mach. Learn. Knowl. Extr. 2022, 4, 172–221. [Google Scholar] [CrossRef]
- Luo, J.; Xu, C.; Geng, X.; Feng, G.; Fang, K.; Tan, L. Multi-stage cable routing through hierarchical imitation learning. IEEE Trans. Robot. 2024, 40, 1476–1491. [Google Scholar] [CrossRef]
- Yuan, H.; Gao, W.; Ma, S.; Yan, Y. Divide-and-conquer-based RDO-free CU partitioning for 8K video compression. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–20. [Google Scholar] [CrossRef]
- Yuan, H.; Wang, Q.; Liu, Q.; Huo, J.; Li, P. Hybrid distortion-based rate-distortion optimization and rate control for H. 265/HEVC. IEEE Trans. Consum. Electron. 2021, 67, 97–106. [Google Scholar] [CrossRef]
- Sutton, R.S.; Precup, D.; Singh, S. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 1999, 112, 181–211. [Google Scholar] [CrossRef]
- Xie, G.; Li, X.; Lin, S.; Chen, Z.; Zhang, L.; Zhang, K. Hierarchical reinforcement learning based video semantic coding for segmentation. In Proceedings of the 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP), Suzhou, China, 24 August 2022; pp. 1–5. [Google Scholar]
- Lee, J.K.; Kim, N.; Kang, J.W. Reinforcement learning for rate-distortion optimized hierarchical prediction structure. IEEE Access 2023, 11, 20240–20253. [Google Scholar] [CrossRef]
- Andersson, K.; Enhorn, J.; Sjöberg, R.; Ström, J.; Litwic, L. Addition of a GOP Hierarchy of 32 for Random Access Configuration for VTM, Document, JVET-S0180. In Proceedings of the JVET, 19th Meeting, Geneva, Swizerland, 22 June–1 July 2020. [Google Scholar]
- VVC Software, VTM-13.0. Available online: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tags/VTM-13.0/ (accessed on 20 February 2022).
Class | Video Sequences | Resolution | Frame Num. | Frame Rate (fps) | Bit Depth (bit) |
---|---|---|---|---|---|
A | Campfire | 3840 × 2160 | 300 | 60 | 10 |
ParkRunning3 | 3840 × 2160 | 300 | 50 | 10 | |
Tango2 | 3840 × 2160 | 294 | 60 | 10 | |
DaylightRoad2 | 3840 × 2160 | 300 | 60 | 10 | |
B | MarketPlace | 1920 × 1080 | 600 | 60 | 8 |
RitualDance | 1920 × 1080 | 600 | 60 | 8 | |
Cactus | 1920 × 1080 | 500 | 50 | 8 | |
BasketballDrive | 1920 × 1080 | 500 | 50 | 8 | |
BQTerrace | 1920 × 1080 | 600 | 60 | 8 | |
C | RaceHorses | 832 × 480 | 300 | 30 | 8 |
BQMall | 832 × 480 | 600 | 60 | 8 | |
PartyScene | 832 × 480 | 500 | 50 | 8 | |
BasketballDrill | 832 × 480 | 500 | 50 | 8 | |
D | RaceHorses | 416 × 240 | 300 | 30 | 8 |
BQSquare | 416 × 240 | 600 | 60 | 8 | |
BlowingBubbles | 416 × 240 | 500 | 50 | 8 | |
BasketballPass | 416 × 240 | 500 | 50 | 8 | |
E | FourPeople | 1280 × 720 | 600 | 60 | 8 |
Johnny | 1280 × 720 | 600 | 60 | 8 | |
KristenAndSara | 1280 × 720 | 600 | 60 | 8 | |
F | BasketballDrillText | 832 × 480 | 500 | 50 | 8 |
ChinaSpeed | 1024 × 768 | 500 | 30 | 8 | |
SlideEditing | 1280 × 720 | 300 | 30 | 8 | |
SlideShow | 1280 × 720 | 500 | 20 | 8 |
Class | Video Sequences | Gao et al. [6] | Proposed HRL | Fixed QP | |||
---|---|---|---|---|---|---|---|
BD-Rate (%) | BD-PSNR (dB) | BD-Rate (%) | BD-PSNR (dB) | BD-Rate (%) | BD-PSNR (dB) | ||
A | Tango2 | −11.231 | 0.254 | −12.481 | 0.276 | −12.762 | 0.312 |
ParkRunning3 | −6.450 | 0.462 | −6.260 | 0.473 | −7.234 | 0.512 | |
Campfire | −7.642 | 0.352 | −7.834 | 0.364 | −8.423 | 0.457 | |
DaylightRoad2 | −10.534 | 0.423 | −11.725 | 0.415 | −13.542 | 0.523 | |
Average | −8.964 | 0.373 | −9.575 | 0.382 | −10.490 | 0.451 | |
B | Cactus | −6.832 | 0.261 | −7.123 | 0.265 | −10.470 | 0.324 |
BasketballDrive | −18.670 | 0.681 | −17.570 | 0.683 | −22.950 | 0.760 | |
BQTerrace | −5.424 | 0.206 | −6.124 | 0.216 | −7.417 | 0.247 | |
Average | −10.309 | 0.383 | −10.272 | 0.388 | −13.612 | 0.444 | |
C | RaceHorses | −8.844 | 0.325 | −9.100 | 0.327 | −11.500 | 0.481 |
BQMall | −6.839 | 0.322 | −7.120 | 0.328 | −7.180 | 0.331 | |
PartyScene | −4.131 | 0.206 | −4.250 | 0.208 | −4.871 | 0.248 | |
BasketballDrill | −8.845 | 0.435 | −8.749 | 0.439 | −6.690 | 0.353 | |
Average | −7.165 | 0.322 | −7.305 | 0.326 | −7.560 | 0.353 | |
D | RaceHorses | −9.577 | 0.389 | −9.677 | 0.391 | −11.950 | 0.660 |
BQSquare | −9.770 | 0.412 | −9.888 | 0.415 | −6.638 | 0.371 | |
BlowingBubbles | −5.011 | 0.292 | −5.120 | 0.291 | −5.393 | 0.294 | |
BasketballPass | −5.788 | 0.357 | −5.873 | 0.361 | −6.566 | 0.378 | |
Average | −7.537 | 0.363 | −7.640 | 0.365 | −7.637 | 0.426 | |
E | FourPeople | −6.181 | 0.465 | −6.256 | 0.470 | −9.545 | 0.646 |
Johnny | −15.148 | 0.491 | −15.428 | 0.489 | −17.960 | 0.578 | |
KristenAndSara | −10.522 | 0.612 | −11.233 | 0.611 | −15.140 | 0.728 | |
Average | −10.617 | 0.523 | −10.972 | 0.523 | −14.215 | 0.651 | |
F | BasketballDrillText | −4.976 | 0.256 | −5.176 | 0.257 | −6.310 | 0.304 |
ChinaSpeed | −4.280 | 0.212 | −5.190 | 0.232 | −5.493 | 0.308 | |
SlideEditing | −25.430 | 3.153 | −24.890 | 3.183 | −32.690 | 3.292 | |
SlideShow | −33.560 | 0.411 | −34.610 | 0.431 | −41.940 | 0.454 | |
Average | −17.062 | 1.008 | −21.563 | 1.282 | −21.608 | 1.090 | |
Total Average | −9.404 | 0.457 | −10.531 | 0.506 | −11.361 | 0.523 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, S.; Jin, B.; Tian, S.; Liu, J.; Deng, Z.; Shi, C. Hierarchical Reinforcement Learning-Based Adaptive Initial QP Selection and Rate Control for H.266/VVC. Electronics 2024, 13, 5028. https://doi.org/10.3390/electronics13245028
He S, Jin B, Tian S, Liu J, Deng Z, Shi C. Hierarchical Reinforcement Learning-Based Adaptive Initial QP Selection and Rate Control for H.266/VVC. Electronics. 2024; 13(24):5028. https://doi.org/10.3390/electronics13245028
Chicago/Turabian StyleHe, Shuqian, Biao Jin, Shangneng Tian, Jiayu Liu, Zhengjie Deng, and Chun Shi. 2024. "Hierarchical Reinforcement Learning-Based Adaptive Initial QP Selection and Rate Control for H.266/VVC" Electronics 13, no. 24: 5028. https://doi.org/10.3390/electronics13245028
APA StyleHe, S., Jin, B., Tian, S., Liu, J., Deng, Z., & Shi, C. (2024). Hierarchical Reinforcement Learning-Based Adaptive Initial QP Selection and Rate Control for H.266/VVC. Electronics, 13(24), 5028. https://doi.org/10.3390/electronics13245028