MDPI - Publisher of Open Access Journals

19 pages, 2731 KB

Open AccessArticle

DCMFF-Net: A Low-Complexity Intra-Frame Encoding Method with Double Convolution and Multi-Scale Feature Fusion

by Xiao Shi, Geng Wei, Linqiang Li, Guihua Luo, Yu Zhou and Shan Zhu

Electronics 2025, 14(24), 4863; https://doi.org/10.3390/electronics14244863 - 10 Dec 2025

Viewed by 131

H. 265/HEVC still dominates the video encoding application market with its mature industrial ecosystem and excellent hardware support. However, its high computational complexity remains a major barrier to its wider application. To tackle this problem, we introduce an efficient intra-encoding approach that leverages [...] Read more.

H. 265/HEVC still dominates the video encoding application market with its mature industrial ecosystem and excellent hardware support. However, its high computational complexity remains a major barrier to its wider application. To tackle this problem, we introduce an efficient intra-encoding approach that leverages double convolution, multi-scale feature fusion. Firstly, a three-branch architecture aims to capture multi-scale information from the source image, and then use non-overlapping and overlapping parallel convolution structures in each branch to achieve feature fusion for each branch. Secondly, combined with an attention mechanism, the output features of multiple branches are fused to highlight important features, reduce detail loss, and effectively balance encoding quality and complexity. Finally, by combining multi-scale feature fusion and double convolutional feature fusion, a feature hybrid network is formed to accurately predict whether the coding unit (CU) should be divided, and achieve fast encoding. Experimental results on multiple datasets demonstrate that against the traditional HM16.5 benchmark, our method reduces the average encoding time by 63.52%, with a marginal BD-BR rise of 1.95% and a BD-PSNR drop of 0.09 dB, demonstrating its superiority. Full article

► Show Figures

Figure 1

27 pages, 12490 KB

Open AccessArticle

Fast CU Division Algorithm for Different Occupancy Types of CUs in Geometric Videos

by Nana Li, Tiantian Zhang, Jinchao Zhao and Qiuwen Zhang

Electronics 2025, 14(20), 4124; https://doi.org/10.3390/electronics14204124 - 21 Oct 2025

Viewed by 368

Abstract

Video-based point cloud compression (V-PCC) is a 3D point cloud compression standard that first projects the point cloud from 3D space onto 2D space, thereby generating geometric and attribute videos, and then encodes the geometric and attribute videos using high-efficiency video coding (HEVC). [...] Read more.

Video-based point cloud compression (V-PCC) is a 3D point cloud compression standard that first projects the point cloud from 3D space onto 2D space, thereby generating geometric and attribute videos, and then encodes the geometric and attribute videos using high-efficiency video coding (HEVC). In the whole coding process, the coding of geometric videos is extremely time-consuming, mainly because the division of geometric video coding units has high computational complexity. In order to effectively reduce the coding complexity of geometric videos in video-based point cloud compression, we propose a fast segmentation algorithm based on the occupancy type of coding units. First, the CUs are divided into three categories—unoccupied, partially occupied, and fully occupied—based on the occupancy graph. For unoccupied CUs, the segmentation is terminated immediately; for partially occupied CUs, a geometric visual perception factor is designed based on their spatial depth variation characteristics, thus realizing early depth range skipping based on visual sensitivity; and, for fully occupied CUs, a lightweight fully connected network is used to make the fast segmentation decision. The experimental results show that, under the full intra-frame configuration, this algorithm significantly reduces the coding time complexity while almost maintaining the coding quality; i.e., the BD rate of D1 and D2 only increases by an average of 0.11% and 0.28% under the total coding rate, where the geometric video coding time saving reaches up to 58.71% and the overall V-PCC coding time saving reaches up to 53.96%. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

25 pages, 5088 KB

Open AccessArticle

Improved Perceptual Quality of Traffic Signs and Lights for the Teleoperation of Autonomous Vehicle Remote Driving via Multi-Category Region of Interest Video Compression

by Itai Dror and Ofer Hadar

Entropy 2025, 27(7), 674; https://doi.org/10.3390/e27070674 - 24 Jun 2025

Viewed by 1274

Abstract

Autonomous vehicles are a promising solution to traffic congestion, air pollution, accidents, wasted time, and resources. However, remote driver intervention may be necessary in extreme situations to ensure safe roadside parking or complete remote takeover. In these cases, high-quality real-time video streaming is [...] Read more.

Autonomous vehicles are a promising solution to traffic congestion, air pollution, accidents, wasted time, and resources. However, remote driver intervention may be necessary in extreme situations to ensure safe roadside parking or complete remote takeover. In these cases, high-quality real-time video streaming is crucial for remote driving. In a preliminary study, we presented a region of interest (ROI) High-Efficiency Video Coding (HEVC) method where the image was segmented into two categories: ROI and background. This involved allocating more bandwidth to the ROI, which yielded an improvement in the visibility of classes essential for driving while transmitting the background at a lower quality. However, migrating the bandwidth to the large ROI portion of the image did not substantially improve the quality of traffic signs and lights. This study proposes a method that categorizes ROIs into three tiers: background, weak ROI, and strong ROI. To evaluate this approach, we utilized a photo-realistic driving scenario database created with the Cognata self-driving car simulation platform. We used semantic segmentation to categorize the compression quality of a Coding Tree Unit (CTU) according to its pixel classes. A background CTU contains only sky, trees, vegetation, or building classes. Essentials for remote driving include classes such as pedestrians, road marks, and cars. Difficult-to-recognize classes, such as traffic signs (especially textual ones) and traffic lights, are categorized as a strong ROI. We applied thresholds to determine whether the number of pixels in a CTU of a particular category was sufficient to classify it as a strong or weak ROI and then allocated bandwidth accordingly. Our results demonstrate that this multi-category ROI compression method significantly enhances the perceptual quality of traffic signs (especially textual ones) and traffic lights by up to 5.5 dB compared to a simpler two-category (background/foreground) partition. This improvement in critical areas is achieved by reducing the fidelity of less critical background elements, while the visual quality of other essential driving-related classes (weak ROI) is at least maintained. Full article

(This article belongs to the Special Issue Information Theory and Coding for Image/Video Processing)

► Show Figures

Figure 1

32 pages, 4311 KB

Open AccessArticle

DRGNet: Enhanced VVC Reconstructed Frames Using Dual-Path Residual Gating for High-Resolution Video

by Zezhen Gai, Tanni Das and Kiho Choi

Sensors 2025, 25(12), 3744; https://doi.org/10.3390/s25123744 - 15 Jun 2025

Cited by 1 | Viewed by 1040

Abstract

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding [...] Read more.

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding technologies, such as Advanced Video Coding/H.264 (AVC), High Efficiency Video Coding/H.265 (HEVC), and Versatile Video Coding/H.266 (VVC), which significantly improves compression efficiency while maintaining high video quality. However, during the encoding process, compression artifacts and the loss of visual details remain unavoidable challenges, particularly in high-resolution video processing, where the massive amount of image data tends to introduce more artifacts and noise, ultimately affecting the user’s viewing experience. Therefore, effectively reducing artifacts, removing noise, and minimizing detail loss have become critical issues in enhancing video quality. To address these challenges, this paper proposes a post-processing method based on Convolutional Neural Network (CNN) that improves the quality of VVC-reconstructed frames through deep feature extraction and fusion. The proposed method is built upon a high-resolution dual-path residual gating system, which integrates deep features from different convolutional layers and introduces convolutional blocks equipped with gating mechanisms. By ingeniously combining gating operations with residual connections, the proposed approach ensures smooth gradient flow while enhancing feature selection capabilities. It selectively preserves critical information while effectively removing artifacts. Furthermore, the introduction of residual connections reinforces the retention of original details, achieving high-quality image restoration. Under the same bitrate conditions, the proposed method significantly improves the Peak Signal-to-Noise Ratio (PSNR) value, thereby optimizing video coding quality and providing users with a clearer and more detailed visual experience. Extensive experimental results demonstrate that the proposed method achieves outstanding performance across Random Access (RA), Low Delay B-frame (LDB), and All Intra (AI) configurations, achieving BD-Rate improvements of 6.1%, 7.36%, and 7.1% for the luma component, respectively, due to the remarkable PSNR enhancement. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes: 2nd Edition)

► Show Figures

Figure 1

18 pages, 1845 KB

Open AccessArticle

Fast Intra-Prediction Mode Decision Algorithm for Versatile Video Coding Based on Gradient and Convolutional Neural Network

by Nana Li, Zhenyi Wang, Qiuwen Zhang, Lei He and Weizheng Zhang

Electronics 2025, 14(10), 2031; https://doi.org/10.3390/electronics14102031 - 16 May 2025

Cited by 1 | Viewed by 1489

Abstract

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To [...] Read more.

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To accommodate this, VVC increases the number of intra-prediction modes from 35 to 67, leading to a substantial rise in computational demands. This study presents a fast intra-prediction mode selection algorithm that combines gradient analysis and CNN. First, the Laplace operator is employed to estimate the texture direction of the current CU block, identifying the most probable prediction direction and skipping over half of the redundant candidate modes, thereby significantly reducing the number of mode searches. Second, to further minimize computational complexity, two efficient neural network models, MIP-NET and ISP-NET, are developed to determine whether to terminate the prediction process for Matrix Intra Prediction(MIP) and Intra Sub-Partitioning(ISP) modes early, avoiding unnecessary calculations. This approach maintains coding performance while significantly lowering the time complexity of intra-prediction mode selection. Experimental results demonstrate that the algorithm achieves a 35.04% reduction in encoding time with only a 0.69% increase in BD-BR, striking a balance between video quality and coding efficiency. Full article

► Show Figures

Figure 1

19 pages, 3140 KB

Open AccessArticle

Fast Algorithm for Depth Map Intra-Frame Coding 3D-HEVC Based on Swin Transformer and Multi-Branch Network

by Fengqin Wang, Yangang Du and Qiuwen Zhang

Electronics 2025, 14(9), 1703; https://doi.org/10.3390/electronics14091703 - 22 Apr 2025

Cited by 1 | Viewed by 847

Abstract

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) efficiently compresses 3D video by incorporating depth map encoding techniques. However, the quadtree partitioning of depth map coding units (CUs) greatly increases computational complexity, contributing to over 90% of the total encoding time. To overcome the limitations of [...] Read more.

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) efficiently compresses 3D video by incorporating depth map encoding techniques. However, the quadtree partitioning of depth map coding units (CUs) greatly increases computational complexity, contributing to over 90% of the total encoding time. To overcome the limitations of existing methods in complex edge modeling and partitioning efficiency, this paper presents Swin-Hier Net, a hierarchical CU partitioning prediction model based on the Swin Transformer. First, a multi-branch feature fusion architecture is designed: the Swin Transformer’s shifted window attention mechanism extracts global contextual features, lightweight CNNs capture local texture details, and traditional edge/variance features enhance multi-scale representation. Second, a recursive hierarchical decision mechanism dynamically activates sub-CU prediction branches based on the partitioning probability of parent nodes, ensuring strict compliance with the HEVC standard quadtree syntax. Additionally, a hybrid pooling strategy and dilated convolutions improve edge feature retention. Experiments on 3D-HEVC standard test sequences show that, compared to exhaustive traversal methods, the proposed algorithm reduces encoding time by 72.7% on average, lowers the BD-Rate by 1.16%, improves CU partitioning accuracy to 94.5%, and maintains a synthesized view PSNR of 38.68 dB (baseline: 38.72 dB). The model seamlessly integrates into the HTM encoder, offering an efficient solution for real-time 3D video applications. Full article

► Show Figures

Figure 1

19 pages, 5667 KB

Open AccessArticle

Content-Symmetrical Multidimensional Transpose of Image Sequences for the High Efficiency Video Coding (HEVC) All-Intra Configuration

by Tamer Shanableh

Symmetry 2025, 17(4), 598; https://doi.org/10.3390/sym17040598 - 15 Apr 2025

Viewed by 586

Abstract

Enhancing the quality of video coding whilst maintaining compliance with the syntax of video coding standards is challenging. In the literature, many solutions have been proposed that apply mainly to two-pass encoding, bitrate control algorithms, and enhancements of locally decoded images in the [...] Read more.

Enhancing the quality of video coding whilst maintaining compliance with the syntax of video coding standards is challenging. In the literature, many solutions have been proposed that apply mainly to two-pass encoding, bitrate control algorithms, and enhancements of locally decoded images in the motion-compensation loop. This work proposes a pre- and post-coding solution using the content-symmetrical multidimensional transpose of raw video sequences. The content-symmetrical multidimensional transpose results in images composed of slices of the temporal domain whilst preserving the video content. Such slices have higher spatial homogeneity at the expense of reducing the temporal resemblance. As such, an all-intra configuration is an excellent choice for compressing such images. Prior to displaying the decoded images, a content-symmetrical multidimensional transpose is applied again to restore the original form of the input images. Moreover, we propose a lightweight two-pass encoding solution in which we apply systematic temporal subsampling on the multidimensional transposed image sequences prior to the first-pass encoding. This noticeably reduces the complexity of the encoding process of the first pass and gives an indication as to whether or not the proposed solution is suitable for the video sequence at hand. Using the HEVC video codec, the experimental results revealed that the proposed solution results in a lower percentage of coding unit splits in comparison to regular HEVC coding without the multidimensional transpose of image sequences. This finding supports the claim of there being increasing spatial coherence as a result of the proposed solution. Additionally, using four quantization parameters, and in comparison to regular HEVC encoding, the resulting BD rate is −15.12%, which indicates a noticeable bitrate reduction. The BD-PSNR, on the other hand, was 1.62 dB, indicating an enhancement in the quality of the decoded images. Despite all of these benefits, the proposed solution has limitations, which are also discussed in the paper. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 2362 KB

Open AccessArticle

Fast Coding Unit Partitioning Method for Video-Based Point Cloud Compression: Combining Convolutional Neural Networks and Bayesian Optimization

by Wenjun Song, Xinqi Liu and Qiuwen Zhang

Electronics 2025, 14(7), 1295; https://doi.org/10.3390/electronics14071295 - 25 Mar 2025

Cited by 1 | Viewed by 1224

Abstract

As 5G technology and 3D capture techniques have been rapidly developing, there has been a remarkable increase in the demand for effectively compressing dynamic 3D point cloud data. Video-based point cloud compression (V-PCC), which is an innovative method for 3D point cloud compression, [...] Read more.

As 5G technology and 3D capture techniques have been rapidly developing, there has been a remarkable increase in the demand for effectively compressing dynamic 3D point cloud data. Video-based point cloud compression (V-PCC), which is an innovative method for 3D point cloud compression, makes use of High-Efficiency Video Coding (HEVC) to carry out the compression of 3D point clouds. This is accomplished through the projection of the point clouds onto two-dimensional video frames. However, V-PCC faces significant coding complexity, particularly for dynamic 3D point clouds, which can be up to four times more complex to process than a conventional video. To address this challenge, we propose an adaptive coding unit (CU) partitioning method that integrates occupancy graphs, convolutional neural networks (CNNs), and Bayesian optimization. In this approach, the coding units (CUs) are first divided into dense regions, sparse regions, and complex composite regions by calculating the occupancy rate R of the CUs, and then an initial classification decision is made using a convolutional neural network (CNN) framework. For regions where the CNN outputs low-confidence classifications, Bayesian optimization is employed to refine the partitioning and enhance accuracy. The findings from the experiments show that the suggested method can efficiently decrease the coding complexity of V-PCC, all the while maintaining a high level of coding quality. Specifically, the average coding time of the geometric graph is reduced by 57.37%, the attribute graph by 54.43%, and the overall coding time by 54.75%. Although the BD rate slightly increases compared with that of the baseline V-PCC method, the impact on video quality is negligible. Additionally, the proposed algorithm outperforms existing methods in terms of geometric compression efficiency and computational time savings. This study’s innovation lies in combining deep learning with Bayesian optimization to deliver an efficient CU partitioning strategy for V-PCC, improving coding speed and reducing computational resource consumption, thereby advancing the practical application of V-PCC. Full article

► Show Figures

Figure 1

16 pages, 433 KB

Open AccessArticle

A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree

by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang

Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024

Viewed by 1273

Abstract

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article

► Show Figures

Figure 1

16 pages, 10696 KB

Open AccessArticle

A Framework for Symmetric-Quality S3D Video Streaming Services

by Juhyeon Lee, Seungjun Lee, Sunghoon Kim and Dongwook Kang

Appl. Sci. 2024, 14(23), 11011; https://doi.org/10.3390/app142311011 - 27 Nov 2024

Cited by 1 | Viewed by 1050

Abstract

This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with [...] Read more.

This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with different viewpoints and resolutions using a Cross-View SHVC encoder, where the low-resolution video is encoded as the base layer and the other video as the enhancement layer. This encoder provides resolution diversity and allows the decoder to combine the two videos, enabling 3D video services. Even when 3D videos are composed of left and right videos with different resolutions, the viewer tends to perceive the quality based on the higher-resolution video due to the binocular suppression effect, where the brain prioritizes the high-quality image and suppresses the lower-quality one. However, recent experiments have shown that when the disparity between resolutions exceeds a certain threshold, it can lead to a subjective degradation of the perceived 3D video quality. To address this issue, a conditional replenishment algorithm has been studied, which replaces some blocks of the video using a disparity-compensated left-view image based on rate–distortion cost. This conditional replenishment algorithm (also known as VEI technology) effectively reduces the quality difference between the base layer and enhancement layer videos. However, the algorithm alone cannot fully compensate for the quality difference between the left and right videos. In this paper, we propose a novel encoding framework to solve the asymmetry issue between the left and right videos in 3D video services and achieve symmetrical video quality. The proposed framework focuses on improving the quality of the right-view video by combining the conditional replenishment algorithm with Cross-View SHVC. Specifically, the framework leverages the non-HEVC option of the SHVC encoder, using a VEI (Video Enhancement Information) restored image as the base layer to provide higher-quality prediction signals and reduce encoding complexity. Experimental results using animation and live-action UHD sequences show that the proposed method achieves BD-RATE reductions of 57.78% and 45.10% compared with HEVC and SHVC codecs, respectively. Full article

► Show Figures

Figure 1

24 pages, 6380 KB

Open AccessArticle

Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec

by Woowoen Gwun, Kiho Choi and Gwang Hoon Park

Mathematics 2024, 12(18), 2874; https://doi.org/10.3390/math12182874 - 15 Sep 2024

Cited by 2 | Viewed by 2208

Abstract

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating [...] Read more.

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos. Full article

(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 7340 KB

Open AccessArticle

Versatile Video Coding-Post Processing Feature Fusion: A Post-Processing Convolutional Neural Network with Progressive Feature Fusion for Efficient Video Enhancement

by Tanni Das, Xilong Liang and Kiho Choi

Appl. Sci. 2024, 14(18), 8276; https://doi.org/10.3390/app14188276 - 13 Sep 2024

Cited by 3 | Viewed by 3384

Abstract

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such [...] Read more.

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such as blockiness, blurriness, and ringing, which can detract from the viewer’s experience. To ensure a seamless and engaging video experience, it is essential to remove these artifacts, which improves viewer comfort and engagement. In this paper, we propose a deep feature fusion based convolutional neural network (CNN) architecture (VVC-PPFF) for post-processing approach to further enhance the performance of VVC. The proposed network, VVC-PPFF, harnesses the power of CNNs to enhance decoded frames, significantly improving the coding efficiency of the state-of-the-art VVC video coding standard. By combining deep features from early and later convolution layers, the network learns to extract both low-level and high-level features, resulting in more generalized outputs that adapt to different quantization parameter (QP) values. The proposed VVC-PPFF network achieves outstanding performance, with Bjøntegaard Delta Rate (BD-Rate) improvements of 5.81% and 6.98% for luma components in random access (RA) and low-delay (LD) configurations, respectively, while also boosting peak signal-to-noise ratio (PSNR). Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

34 pages, 2908 KB

Open AccessArticle

A Hybrid Contrast and Texture Masking Model to Boost High Efficiency Video Coding Perceptual Rate-Distortion Performance

by Javier Ruiz Atencia, Otoniel López-Granado, Manuel Pérez Malumbres, Miguel Martínez-Rach, Damian Ruiz Coll, Gerardo Fernández Escribano and Glenn Van Wallendael

Electronics 2024, 13(16), 3341; https://doi.org/10.3390/electronics13163341 - 22 Aug 2024

Cited by 1 | Viewed by 1455

Abstract

As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast [...] Read more.

As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast masking and texture masking, jointly operating under the High Efficiency Video Coding (HEVC) standard. These techniques aim to improve the subjective quality of the reconstructed video at the same bit rate. For contrast masking, we propose the use of a dedicated weighting matrix for each block size (from

4 \times 4

up to

32 \times 32

), unlike the HEVC standard, which only defines an

8 \times 8

weighting matrix which it is upscaled to build the

16 \times 16

and

32 \times 32

weighting matrices (a

4 \times 4

weighting matrix is not supported). Our approach achieves average Bjøntegaard Delta-Rate (BD-rate) gains of between

2.5 %

and

4.48 %

, depending on the perceptual metric and coding mode used. On the other hand, we propose a novel texture masking scheme based on the classification of each coding unit to provide an over-quantization depending on the coding unit texture level. Thus, for each coding unit, its mean directional variance features are computed to feed a support vector machine model that properly predicts the texture type (plane, edge, or texture). According to this classification, the block’s energy, the type of coding unit, and its size, an over-quantization value is computed as a QP offset (DQP) to be applied to this coding unit. By applying both techniques in the HEVC reference software, an overall average of

5.79 %

BD-rate gain is achieved proving their complementarity. Full article

(This article belongs to the Special Issue Recent Advances in Image/Video Compression and Coding)

► Show Figures

Figure 1

16 pages, 10945 KB

Open AccessArticle

Impact of Video Motion Content on HEVC Coding Efficiency

by Khalid A. M. Salih, Ismail Amin Ali and Ramadhan J. Mstafa

Computers 2024, 13(8), 204; https://doi.org/10.3390/computers13080204 - 18 Aug 2024

Cited by 2 | Viewed by 3122

Abstract

Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the [...] Read more.

Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the compression performance highly depends on the complexity of the encoded video sequence. This paper evaluates the effects of motion content on coding performance and suggests an adaptive encoding scheme based on the motion content of encoded video. To evaluate the effects of motion content on the compression performance of HEVC, we tested three coding configurations with different Group of Pictures (GOP) structures and intra refresh mechanisms. Namely, open GOP IPPP, open GOP Periodic-I, and closed GOP periodic-IDR coding structures were tested using several test sequences with a range of resolutions and motion activity. All sequences were first tested to check their motion activity. The rate–distortion curves were produced for all the test sequences and coding configurations. Our results show that the performance of IPPP coding configuration is significantly better (up to 4 dB) than periodic-I and periodic-IDR configurations for sequences with low motion activity. For test sequences with intermediate motion activity, IPPP configuration can still achieve a reasonable quality improvement over periodic-I and periodic-IDR configurations. However, for test sequences with high motion activity, IPPP configuration has a very small performance advantage over periodic-I and periodic-IDR configurations. Our results indicate the importance of selecting the appropriate coding structure according to the motion activity of the video being encoded. Full article

► Show Figures

Figure 1

19 pages, 746 KB

Open AccessArticle

Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine

by Xiaoke Su, Yaqiong Liu and Qiuwen Zhang

Electronics 2024, 13(13), 2586; https://doi.org/10.3390/electronics13132586 - 1 Jul 2024

Cited by 1 | Viewed by 1706

Abstract

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) [...] Read more.

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) cost calculation, resulting in a complex encoding process. Therefore, enhancing encoding efficiency and reducing redundant computations are key objectives for optimizing 3D-HEVC. This paper introduces a fast-encoding method for 3D-HEVC, comprising an adaptive CU partitioning algorithm and a rapid rate–distortion-optimization (RDO) algorithm. Based on the ALV features extracted from each coding unit, a Gradient Boosting Machine (GBM) model is constructed to obtain the corresponding CU thresholds. These thresholds are compared with the ALV to further decide whether to continue dividing the coding unit. The RDO algorithm is used to optimize the RD cost calculation process, selecting the optimal prediction mode as much as possible. The simulation results show that this method saves 52.49% of complexity while ensuring good video quality. Full article

► Show Figures

Figure 1

Search Results (116)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (116)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI