MDPI - Publisher of Open Access Journals

25 pages, 5088 KiB

Open AccessArticle

Improved Perceptual Quality of Traffic Signs and Lights for the Teleoperation of Autonomous Vehicle Remote Driving via Multi-Category Region of Interest Video Compression

by Itai Dror and Ofer Hadar

Entropy 2025, 27(7), 674; https://doi.org/10.3390/e27070674 - 24 Jun 2025

Viewed by 586

Abstract

Autonomous vehicles are a promising solution to traffic congestion, air pollution, accidents, wasted time, and resources. However, remote driver intervention may be necessary in extreme situations to ensure safe roadside parking or complete remote takeover. In these cases, high-quality real-time video streaming is [...] Read more.

Autonomous vehicles are a promising solution to traffic congestion, air pollution, accidents, wasted time, and resources. However, remote driver intervention may be necessary in extreme situations to ensure safe roadside parking or complete remote takeover. In these cases, high-quality real-time video streaming is crucial for remote driving. In a preliminary study, we presented a region of interest (ROI) High-Efficiency Video Coding (HEVC) method where the image was segmented into two categories: ROI and background. This involved allocating more bandwidth to the ROI, which yielded an improvement in the visibility of classes essential for driving while transmitting the background at a lower quality. However, migrating the bandwidth to the large ROI portion of the image did not substantially improve the quality of traffic signs and lights. This study proposes a method that categorizes ROIs into three tiers: background, weak ROI, and strong ROI. To evaluate this approach, we utilized a photo-realistic driving scenario database created with the Cognata self-driving car simulation platform. We used semantic segmentation to categorize the compression quality of a Coding Tree Unit (CTU) according to its pixel classes. A background CTU contains only sky, trees, vegetation, or building classes. Essentials for remote driving include classes such as pedestrians, road marks, and cars. Difficult-to-recognize classes, such as traffic signs (especially textual ones) and traffic lights, are categorized as a strong ROI. We applied thresholds to determine whether the number of pixels in a CTU of a particular category was sufficient to classify it as a strong or weak ROI and then allocated bandwidth accordingly. Our results demonstrate that this multi-category ROI compression method significantly enhances the perceptual quality of traffic signs (especially textual ones) and traffic lights by up to 5.5 dB compared to a simpler two-category (background/foreground) partition. This improvement in critical areas is achieved by reducing the fidelity of less critical background elements, while the visual quality of other essential driving-related classes (weak ROI) is at least maintained. Full article

(This article belongs to the Special Issue Information Theory and Coding for Image/Video Processing)

► Show Figures

Figure 1

19 pages, 3140 KiB

Open AccessArticle

Fast Algorithm for Depth Map Intra-Frame Coding 3D-HEVC Based on Swin Transformer and Multi-Branch Network

by Fengqin Wang, Yangang Du and Qiuwen Zhang

Electronics 2025, 14(9), 1703; https://doi.org/10.3390/electronics14091703 - 22 Apr 2025

Cited by 1 | Viewed by 371

Abstract

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) efficiently compresses 3D video by incorporating depth map encoding techniques. However, the quadtree partitioning of depth map coding units (CUs) greatly increases computational complexity, contributing to over 90% of the total encoding time. To overcome the limitations of [...] Read more.

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) efficiently compresses 3D video by incorporating depth map encoding techniques. However, the quadtree partitioning of depth map coding units (CUs) greatly increases computational complexity, contributing to over 90% of the total encoding time. To overcome the limitations of existing methods in complex edge modeling and partitioning efficiency, this paper presents Swin-Hier Net, a hierarchical CU partitioning prediction model based on the Swin Transformer. First, a multi-branch feature fusion architecture is designed: the Swin Transformer’s shifted window attention mechanism extracts global contextual features, lightweight CNNs capture local texture details, and traditional edge/variance features enhance multi-scale representation. Second, a recursive hierarchical decision mechanism dynamically activates sub-CU prediction branches based on the partitioning probability of parent nodes, ensuring strict compliance with the HEVC standard quadtree syntax. Additionally, a hybrid pooling strategy and dilated convolutions improve edge feature retention. Experiments on 3D-HEVC standard test sequences show that, compared to exhaustive traversal methods, the proposed algorithm reduces encoding time by 72.7% on average, lowers the BD-Rate by 1.16%, improves CU partitioning accuracy to 94.5%, and maintains a synthesized view PSNR of 38.68 dB (baseline: 38.72 dB). The model seamlessly integrates into the HTM encoder, offering an efficient solution for real-time 3D video applications. Full article

► Show Figures

Figure 1

19 pages, 5667 KiB

Open AccessArticle

Content-Symmetrical Multidimensional Transpose of Image Sequences for the High Efficiency Video Coding (HEVC) All-Intra Configuration

by Tamer Shanableh

Symmetry 2025, 17(4), 598; https://doi.org/10.3390/sym17040598 - 15 Apr 2025

Viewed by 401

Abstract

Enhancing the quality of video coding whilst maintaining compliance with the syntax of video coding standards is challenging. In the literature, many solutions have been proposed that apply mainly to two-pass encoding, bitrate control algorithms, and enhancements of locally decoded images in the [...] Read more.

Enhancing the quality of video coding whilst maintaining compliance with the syntax of video coding standards is challenging. In the literature, many solutions have been proposed that apply mainly to two-pass encoding, bitrate control algorithms, and enhancements of locally decoded images in the motion-compensation loop. This work proposes a pre- and post-coding solution using the content-symmetrical multidimensional transpose of raw video sequences. The content-symmetrical multidimensional transpose results in images composed of slices of the temporal domain whilst preserving the video content. Such slices have higher spatial homogeneity at the expense of reducing the temporal resemblance. As such, an all-intra configuration is an excellent choice for compressing such images. Prior to displaying the decoded images, a content-symmetrical multidimensional transpose is applied again to restore the original form of the input images. Moreover, we propose a lightweight two-pass encoding solution in which we apply systematic temporal subsampling on the multidimensional transposed image sequences prior to the first-pass encoding. This noticeably reduces the complexity of the encoding process of the first pass and gives an indication as to whether or not the proposed solution is suitable for the video sequence at hand. Using the HEVC video codec, the experimental results revealed that the proposed solution results in a lower percentage of coding unit splits in comparison to regular HEVC coding without the multidimensional transpose of image sequences. This finding supports the claim of there being increasing spatial coherence as a result of the proposed solution. Additionally, using four quantization parameters, and in comparison to regular HEVC encoding, the resulting BD rate is −15.12%, which indicates a noticeable bitrate reduction. The BD-PSNR, on the other hand, was 1.62 dB, indicating an enhancement in the quality of the decoded images. Despite all of these benefits, the proposed solution has limitations, which are also discussed in the paper. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 2362 KiB

Open AccessArticle

Fast Coding Unit Partitioning Method for Video-Based Point Cloud Compression: Combining Convolutional Neural Networks and Bayesian Optimization

by Wenjun Song, Xinqi Liu and Qiuwen Zhang

Electronics 2025, 14(7), 1295; https://doi.org/10.3390/electronics14071295 - 25 Mar 2025

Viewed by 470

Abstract

As 5G technology and 3D capture techniques have been rapidly developing, there has been a remarkable increase in the demand for effectively compressing dynamic 3D point cloud data. Video-based point cloud compression (V-PCC), which is an innovative method for 3D point cloud compression, [...] Read more.

As 5G technology and 3D capture techniques have been rapidly developing, there has been a remarkable increase in the demand for effectively compressing dynamic 3D point cloud data. Video-based point cloud compression (V-PCC), which is an innovative method for 3D point cloud compression, makes use of High-Efficiency Video Coding (HEVC) to carry out the compression of 3D point clouds. This is accomplished through the projection of the point clouds onto two-dimensional video frames. However, V-PCC faces significant coding complexity, particularly for dynamic 3D point clouds, which can be up to four times more complex to process than a conventional video. To address this challenge, we propose an adaptive coding unit (CU) partitioning method that integrates occupancy graphs, convolutional neural networks (CNNs), and Bayesian optimization. In this approach, the coding units (CUs) are first divided into dense regions, sparse regions, and complex composite regions by calculating the occupancy rate R of the CUs, and then an initial classification decision is made using a convolutional neural network (CNN) framework. For regions where the CNN outputs low-confidence classifications, Bayesian optimization is employed to refine the partitioning and enhance accuracy. The findings from the experiments show that the suggested method can efficiently decrease the coding complexity of V-PCC, all the while maintaining a high level of coding quality. Specifically, the average coding time of the geometric graph is reduced by 57.37%, the attribute graph by 54.43%, and the overall coding time by 54.75%. Although the BD rate slightly increases compared with that of the baseline V-PCC method, the impact on video quality is negligible. Additionally, the proposed algorithm outperforms existing methods in terms of geometric compression efficiency and computational time savings. This study’s innovation lies in combining deep learning with Bayesian optimization to deliver an efficient CU partitioning strategy for V-PCC, improving coding speed and reducing computational resource consumption, thereby advancing the practical application of V-PCC. Full article

► Show Figures

Figure 1

16 pages, 10696 KiB

Open AccessArticle

A Framework for Symmetric-Quality S3D Video Streaming Services

by Juhyeon Lee, Seungjun Lee, Sunghoon Kim and Dongwook Kang

Appl. Sci. 2024, 14(23), 11011; https://doi.org/10.3390/app142311011 - 27 Nov 2024

Viewed by 766

Abstract

This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with [...] Read more.

This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with different viewpoints and resolutions using a Cross-View SHVC encoder, where the low-resolution video is encoded as the base layer and the other video as the enhancement layer. This encoder provides resolution diversity and allows the decoder to combine the two videos, enabling 3D video services. Even when 3D videos are composed of left and right videos with different resolutions, the viewer tends to perceive the quality based on the higher-resolution video due to the binocular suppression effect, where the brain prioritizes the high-quality image and suppresses the lower-quality one. However, recent experiments have shown that when the disparity between resolutions exceeds a certain threshold, it can lead to a subjective degradation of the perceived 3D video quality. To address this issue, a conditional replenishment algorithm has been studied, which replaces some blocks of the video using a disparity-compensated left-view image based on rate–distortion cost. This conditional replenishment algorithm (also known as VEI technology) effectively reduces the quality difference between the base layer and enhancement layer videos. However, the algorithm alone cannot fully compensate for the quality difference between the left and right videos. In this paper, we propose a novel encoding framework to solve the asymmetry issue between the left and right videos in 3D video services and achieve symmetrical video quality. The proposed framework focuses on improving the quality of the right-view video by combining the conditional replenishment algorithm with Cross-View SHVC. Specifically, the framework leverages the non-HEVC option of the SHVC encoder, using a VEI (Video Enhancement Information) restored image as the base layer to provide higher-quality prediction signals and reduce encoding complexity. Experimental results using animation and live-action UHD sequences show that the proposed method achieves BD-RATE reductions of 57.78% and 45.10% compared with HEVC and SHVC codecs, respectively. Full article

► Show Figures

Figure 1

16 pages, 10945 KiB

Open AccessArticle

Impact of Video Motion Content on HEVC Coding Efficiency

by Khalid A. M. Salih, Ismail Amin Ali and Ramadhan J. Mstafa

Computers 2024, 13(8), 204; https://doi.org/10.3390/computers13080204 - 18 Aug 2024

Cited by 1 | Viewed by 1939

Abstract

Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the [...] Read more.

Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the compression performance highly depends on the complexity of the encoded video sequence. This paper evaluates the effects of motion content on coding performance and suggests an adaptive encoding scheme based on the motion content of encoded video. To evaluate the effects of motion content on the compression performance of HEVC, we tested three coding configurations with different Group of Pictures (GOP) structures and intra refresh mechanisms. Namely, open GOP IPPP, open GOP Periodic-I, and closed GOP periodic-IDR coding structures were tested using several test sequences with a range of resolutions and motion activity. All sequences were first tested to check their motion activity. The rate–distortion curves were produced for all the test sequences and coding configurations. Our results show that the performance of IPPP coding configuration is significantly better (up to 4 dB) than periodic-I and periodic-IDR configurations for sequences with low motion activity. For test sequences with intermediate motion activity, IPPP configuration can still achieve a reasonable quality improvement over periodic-I and periodic-IDR configurations. However, for test sequences with high motion activity, IPPP configuration has a very small performance advantage over periodic-I and periodic-IDR configurations. Our results indicate the importance of selecting the appropriate coding structure according to the motion activity of the video being encoded. Full article

► Show Figures

Figure 1

19 pages, 746 KiB

Open AccessArticle

Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine

by Xiaoke Su, Yaqiong Liu and Qiuwen Zhang

Electronics 2024, 13(13), 2586; https://doi.org/10.3390/electronics13132586 - 1 Jul 2024

Viewed by 1427

Abstract

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) [...] Read more.

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) cost calculation, resulting in a complex encoding process. Therefore, enhancing encoding efficiency and reducing redundant computations are key objectives for optimizing 3D-HEVC. This paper introduces a fast-encoding method for 3D-HEVC, comprising an adaptive CU partitioning algorithm and a rapid rate–distortion-optimization (RDO) algorithm. Based on the ALV features extracted from each coding unit, a Gradient Boosting Machine (GBM) model is constructed to obtain the corresponding CU thresholds. These thresholds are compared with the ALV to further decide whether to continue dividing the coding unit. The RDO algorithm is used to optimize the RD cost calculation process, selecting the optimal prediction mode as much as possible. The simulation results show that this method saves 52.49% of complexity while ensuring good video quality. Full article

► Show Figures

Figure 1

16 pages, 1739 KiB

Open AccessArticle

Light-Field Image Compression Based on a Two-Dimensional Prediction Coding Structure

by Jianrui Shao, Enjian Bai, Xueqin Jiang and Yun Wu

Information 2024, 15(6), 339; https://doi.org/10.3390/info15060339 - 7 Jun 2024

Cited by 1 | Viewed by 1564

Abstract

Light-field images (LFIs) are gaining increased attention within the field of 3D imaging, virtual reality, and digital refocusing, owing to their wealth of spatial and angular information. The escalating volume of LFI data poses challenges in terms of storage and transmission. To address [...] Read more.

Light-field images (LFIs) are gaining increased attention within the field of 3D imaging, virtual reality, and digital refocusing, owing to their wealth of spatial and angular information. The escalating volume of LFI data poses challenges in terms of storage and transmission. To address this problem, this paper introduces an MSHPE (most-similar hierarchical prediction encoding) structure based on light-field multi-view images. By systematically exploring the similarities among sub-views, our structure obtains residual views through the subtraction of the encoded view from its corresponding reference view. Regarding the encoding process, this paper implements a new encoding scheme to process all residual views, achieving lossless compression. High-efficiency video coding (HEVC) is applied to encode select residual views, thereby achieving lossy compression. Furthermore, the introduced structure is conceptualized as a layered coding scheme, enabling progressive transmission and showing good random access performance. Experimental results demonstrate the superior compression performance attained by encoding residual views according to the proposed structure, outperforming alternative structures. Notably, when HEVC is employed for encoding residual views, significant bit savings are observed compared to the direct encoding of original views. The final restored view presents better detail quality, reinforcing the effectiveness of this approach. Full article

► Show Figures

Figure 1

17 pages, 2928 KiB

Open AccessArticle

Research on Quantization Parameter Decision Scheme for High Efficiency Video Coding

by Xuesong Jin and Yansong Chai

Appl. Sci. 2023, 13(23), 12758; https://doi.org/10.3390/app132312758 - 28 Nov 2023

Cited by 2 | Viewed by 1802

Abstract

High-Efficiency Video Coding (HEVC) is one of the most widely studied coding standards. It still uses the block-based hybrid coding framework of Advanced Video Coding (AVC), and compared to AVC, it can double the compression ratio while maintaining the same quality of reconstructed [...] Read more.

High-Efficiency Video Coding (HEVC) is one of the most widely studied coding standards. It still uses the block-based hybrid coding framework of Advanced Video Coding (AVC), and compared to AVC, it can double the compression ratio while maintaining the same quality of reconstructed video. The quantization module is an important module in video coding. In the process of quantization, quantization parameter is an important factor in determining the bitrate in video coding, especially in the case of limited channel bandwidth. It is particularly important to select a reasonable quantization parameter to make the bitrate as close as possible to the target bitrate. Aiming at the problem of unreasonable selection of quantization parameters in codecs, this paper proposes using a differential evolution algorithm to assign quantization parameter values to the coding tree unit (CTU) in each frame of 360-degree panoramic video based on HEVC so as to strike a balance between bitrate and distortion. Firstly, the number of CTU rows in a 360-degree panoramic video frame is considered as the dimension of the optimization problem. Then, a trial vector is obtained by randomly selecting the vectors in the population for mutation and crossover. In the mutation step, the algorithm generates a new parameter vector by adding the weighted difference between two population vectors to a third vector. And the elements in the new parameter vector are selected according to the crossover rate. Finally, the trial vector is regarded as the quantization parameter of each CTU in the CTU row to encode, and the vector with the least rate distortion is selected. The algorithm will produce the optimal quantization parameter combination for the current video. The experimental results show that compared to the benchmark algorithm of HEVC reference software HM-16.20, the proposed algorithm can provide a bit saving of 1.86%, while the peak signal-to-noise ratio (PSNR) can be improved by 0.07 dB. Full article

► Show Figures

Figure 1

20 pages, 6779 KiB

Open AccessArticle

Fast CU Partition Algorithm for Intra Frame Coding Based on Joint Texture Classification and CNN

by Ting Wang, Geng Wei, Huayu Li, ThiOanh Bui, Qian Zeng and Ruliang Wang

Sensors 2023, 23(18), 7923; https://doi.org/10.3390/s23187923 - 15 Sep 2023

Cited by 2 | Viewed by 1611

Abstract

High-efficiency video coding (HEVC/H.265) is one of the most widely used video coding standards. HEVC introduces a quad-tree coding unit (CU) partition structure to improve video compression efficiency. The determination of the optimal CU partition is achieved through the brute-force search rate-distortion optimization [...] Read more.

High-efficiency video coding (HEVC/H.265) is one of the most widely used video coding standards. HEVC introduces a quad-tree coding unit (CU) partition structure to improve video compression efficiency. The determination of the optimal CU partition is achieved through the brute-force search rate-distortion optimization method, which may result in high encoding complexity and hardware implementation challenges. To address this problem, this paper proposes a method that combines convolutional neural networks (CNN) with joint texture recognition to reduce encoding complexity. First, a classification decision method based on the global and local texture features of the CU is proposed, efficiently dividing the CU into smooth and complex texture regions. Second, for the CUs in smooth texture regions, the partition is determined by terminating early. For the CUs in complex texture regions, a proposed CNN is used for predictive partitioning, thus avoiding the traditional recursive approach. Finally, combined with texture classification, the proposed CNN achieves a good balance between the coding complexity and the coding performance. The experimental results demonstrate that the proposed algorithm reduces computational complexity by 61.23%, while only increasing BD-BR by 1.86% and decreasing BD-PSNR by just 0.09 dB. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

22 pages, 4694 KiB

Open AccessArticle

Reducing Video Coding Complexity Based on CNN-CBAM in HEVC

by Huayu Li, Geng Wei, Ting Wang, ThiOanh Bui, Qian Zeng and Ruliang Wang

Appl. Sci. 2023, 13(18), 10135; https://doi.org/10.3390/app131810135 - 8 Sep 2023

Cited by 4 | Viewed by 1726

Abstract

High-efficiency video coding (HEVC) outperforms H.264 in coding efficiency. However, the rate–distortion optimization (RDO) process in coding tree unit (CTU) partitioning requires an exhaustive exploration of all possible quad-tree partitions, resulting in high encoding complexity. To simplify this process, this paper proposed a [...] Read more.

High-efficiency video coding (HEVC) outperforms H.264 in coding efficiency. However, the rate–distortion optimization (RDO) process in coding tree unit (CTU) partitioning requires an exhaustive exploration of all possible quad-tree partitions, resulting in high encoding complexity. To simplify this process, this paper proposed a convolution neural network (CNN) based optimization algorithm combined with a hybrid attention mechanism module. Firstly, we designed a CNN compatible with the current coding unit (CU) size to accurately predict the CU partitions. In addition, we also designed a convolution block to enhance the information interaction between CU blocks. Then, we introduced the convolution block attention module (CBAM) into CNN, called CNN-CBAM. This module concentrates on important regions in the image and attends to the target object correctly. Finally, we integrated the CNN-CBAM into the HEVC coding framework for CU partition prediction in advance. The proposed network was trained, validated, and tested using a large scale dataset covering various scenes and objects, which provides extensive samples for intra-frame CU partition prediction in HEVC. The experimental findings demonstrate that our scheme can reduce the coding time by up to 64.05% on average compared to a traditional HM16.5 encoder, with only 0.09 dB degradation in BD-PSNR and a 1.94% increase in BD-BR. Full article

► Show Figures

Figure 1

20 pages, 6695 KiB

Open AccessArticle

Compression Performance Analysis of Experimental Holographic Data Coding Systems

by Tianyu Dong, Kwan-Jung Oh, Joongki Park and Euee S. Jang

Sensors 2023, 23(18), 7684; https://doi.org/10.3390/s23187684 - 6 Sep 2023

Cited by 3 | Viewed by 2016

Abstract

It is challenging to find a proper way to compress computer-generated holography (CGH) data owing to their huge data requirements and characteristics. This study proposes CGH data coding systems with high-efficiency video coding (HEVC), three-dimensional extensions of HEVC (3D-HEVC), and video-based point cloud [...] Read more.

It is challenging to find a proper way to compress computer-generated holography (CGH) data owing to their huge data requirements and characteristics. This study proposes CGH data coding systems with high-efficiency video coding (HEVC), three-dimensional extensions of HEVC (3D-HEVC), and video-based point cloud compression (V-PCC) codecs. In the proposed system, we implemented a procedure for codec usage and format conversion and evaluated the objective and subjective results to analyze the performance of the three coding systems. We discuss the relative advantages and disadvantages of the three coding systems with respect to their coding efficiency and reconstruction results. Our analysis concluded that 3D-HEVC and V-PCC are potential solutions for compressing red, green, blue, and depth (RGBD)-sourced CGH data. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

17 pages, 5960 KiB

Open AccessArticle

Improving Compressed Video Using Single Lightweight Model with Temporal Fusion Module

by Tien-Ying Kuo, Yu-Jen Wei, Po-Chyi Su and Chang-Hao Chao

Sensors 2023, 23(9), 4511; https://doi.org/10.3390/s23094511 - 5 May 2023

Viewed by 1724

Abstract

Video compression algorithms are commonly used to reduce the number of bits required to represent a video with a high compression ratio. However, this can result in the loss of content details and visual artifacts that affect the overall quality of the video. [...] Read more.

Video compression algorithms are commonly used to reduce the number of bits required to represent a video with a high compression ratio. However, this can result in the loss of content details and visual artifacts that affect the overall quality of the video. We propose a learning-based restoration method to address this issue, which can handle varying degrees of compression artifacts with a single model by predicting the difference between the original and compressed video frames to restore video quality. To achieve this, we adopted a recursive neural network model with dilated convolution, which increases the receptive field of the model while keeping the number of parameters low, making it suitable for deployment on a variety of hardware devices. We also designed a temporal fusion module and integrated the color channels into the objective function. This enables the model to analyze temporal correlation and repair chromaticity artifacts. Despite handling color channels, and unlike other methods that have to train a different model for each quantization parameter (QP), the number of parameters in our lightweight model is kept to only about 269 k, requiring only about one-twelfth of the parameters used by other methods. Our model applied to the HEVC test model (HM) improves the compressed video quality by an average of 0.18 dB of BD-PSNR and −5.06% of BD-BR. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

16 pages, 5003 KiB

Open AccessArticle

A Method to Reduce the Intra-Frame Prediction Complexity of HEVC Based on D-CNN

by Ting Wang, Geng Wei, Huayu Li, ThiOanh Bui, Qian Zeng and Ruliang Wang

Electronics 2023, 12(9), 2091; https://doi.org/10.3390/electronics12092091 - 4 May 2023

Cited by 7 | Viewed by 2034

Abstract

Among a series of video coding standards jointly developed by ITU-T, VCEG, and MPEG, high-efficiency video coding (HEVC) is one of the most widely used video coding standards today. Therefore, it is still necessary to further reduce the coding complexity of HEVC. In [...] Read more.

Among a series of video coding standards jointly developed by ITU-T, VCEG, and MPEG, high-efficiency video coding (HEVC) is one of the most widely used video coding standards today. Therefore, it is still necessary to further reduce the coding complexity of HEVC. In the HEVC standard, a flexible partitioning procedure entitled “quad-tree partition” is proposed to significantly improve the coding efficiency, which, however, leads to high coding complexity. To reduce the coding complexity of the intra-frame prediction, this paper proposes a scheme based on a densely connected convolution neural network (D-CNN) to predict the partition of coding units (CUs). Firstly, a densely connected block was designed to improve the efficiency of the CU partition by fully extracting the pixel features of CTU. Then, efficient channel attention (ECA) and adaptive convolution kernel size were applied to a fast CU partition for the first time to capture the information of the D-CNN convolution channels. Finally, a threshold optimization strategy was formulated to select the best threshold for each depth to further balance the computation complexity of video coding and the performance of RD. The experimental results show that the proposed method reduces the encoding time of HEVC by 60.14%, with a negligible reduction in RD performance, which is better than the existing fast partitioning methods. Full article

► Show Figures

Figure 1

20 pages, 8135 KiB

Open AccessArticle

Learning-Based Rate Control for High Efficiency Video Coding

by Sovann Chen, Supavadee Aramvith and Yoshikazu Miyanaga

Sensors 2023, 23(7), 3607; https://doi.org/10.3390/s23073607 - 30 Mar 2023

Cited by 4 | Viewed by 2360

Abstract

High efficiency video coding (HEVC) has dramatically enhanced coding efficiency compared to the previous video coding standard, H.264/AVC. However, the existing rate control updates its parameters according to a fixed initialization, which can cause errors in the prediction of bit allocation to each [...] Read more.

High efficiency video coding (HEVC) has dramatically enhanced coding efficiency compared to the previous video coding standard, H.264/AVC. However, the existing rate control updates its parameters according to a fixed initialization, which can cause errors in the prediction of bit allocation to each coding tree unit (CTU) in frames. This paper proposes a learning-based mapping method between rate control parameters and video contents to achieve an accurate target bit rate and good video quality. The proposed framework contains two main structural codings, including spatial and temporal coding. We initiate an effective learning-based particle swarm optimization for spatial and temporal coding to determine the optimal parameters at the CTU level. For temporal coding at the picture level, we introduce semantic residual information into the parameter updating process to regulate the bit correctly on the actual picture. Experimental results indicate that the proposed algorithm is effective for HEVC and outperforms the state-of-the-art rate control in the HEVC reference software (HM-16.10) by 0.19 dB on average and up to 0.41 dB for low-delay P coding structure. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes)

► Show Figures

Figure 1

Search Results (40)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (40)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI