MDPI - Publisher of Open Access Journals

17 pages, 1788 KiB

Open AccessArticle

Detection of Double Compression in HEVC Videos Containing B-Frames

by Yoshihisa Furushita, Daniele Baracchi, Marco Fontani, Dasara Shullani and Alessandro Piva

J. Imaging 2025, 11(7), 211; https://doi.org/10.3390/jimaging11070211 - 27 Jun 2025

Viewed by 332

This study proposes a method to detect double compression in H.265/HEVC videos containing B-frames, a scenario underexplored in previous research. The method extracts frame-level encoding features—including frame type, coding unit (CU) size, quantization parameter (QP), and prediction modes—and represents each video as a [...] Read more.

This study proposes a method to detect double compression in H.265/HEVC videos containing B-frames, a scenario underexplored in previous research. The method extracts frame-level encoding features—including frame type, coding unit (CU) size, quantization parameter (QP), and prediction modes—and represents each video as a 28-dimensional feature vector. A bidirectional Long Short-Term Memory (Bi-LSTM) classifier is then trained to model temporal inconsistencies introduced during recompression. To evaluate the method, we created a dataset of 129 HEVC-encoded YUV videos derived from 43 original sequences, covering various bitrate combinations and GOP structures. The proposed method achieved a detection accuracy of 80.06%, outperforming two existing baselines. These results demonstrate the practical applicability of the proposed approach in realistic double compression scenarios. Full article

(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)

► Show Figures

Figure 1

32 pages, 4311 KiB

Open AccessArticle

DRGNet: Enhanced VVC Reconstructed Frames Using Dual-Path Residual Gating for High-Resolution Video

by Zezhen Gai, Tanni Das and Kiho Choi

Sensors 2025, 25(12), 3744; https://doi.org/10.3390/s25123744 - 15 Jun 2025

Viewed by 484

Abstract

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding [...] Read more.

In recent years, with the rapid development of the Internet and mobile devices, the high-resolution video industry has ushered in a booming golden era, making video content the primary driver of Internet traffic. This trend has spurred continuous innovation in efficient video coding technologies, such as Advanced Video Coding/H.264 (AVC), High Efficiency Video Coding/H.265 (HEVC), and Versatile Video Coding/H.266 (VVC), which significantly improves compression efficiency while maintaining high video quality. However, during the encoding process, compression artifacts and the loss of visual details remain unavoidable challenges, particularly in high-resolution video processing, where the massive amount of image data tends to introduce more artifacts and noise, ultimately affecting the user’s viewing experience. Therefore, effectively reducing artifacts, removing noise, and minimizing detail loss have become critical issues in enhancing video quality. To address these challenges, this paper proposes a post-processing method based on Convolutional Neural Network (CNN) that improves the quality of VVC-reconstructed frames through deep feature extraction and fusion. The proposed method is built upon a high-resolution dual-path residual gating system, which integrates deep features from different convolutional layers and introduces convolutional blocks equipped with gating mechanisms. By ingeniously combining gating operations with residual connections, the proposed approach ensures smooth gradient flow while enhancing feature selection capabilities. It selectively preserves critical information while effectively removing artifacts. Furthermore, the introduction of residual connections reinforces the retention of original details, achieving high-quality image restoration. Under the same bitrate conditions, the proposed method significantly improves the Peak Signal-to-Noise Ratio (PSNR) value, thereby optimizing video coding quality and providing users with a clearer and more detailed visual experience. Extensive experimental results demonstrate that the proposed method achieves outstanding performance across Random Access (RA), Low Delay B-frame (LDB), and All Intra (AI) configurations, achieving BD-Rate improvements of 6.1%, 7.36%, and 7.1% for the luma component, respectively, due to the remarkable PSNR enhancement. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes: 2nd Edition)

► Show Figures

Figure 1

18 pages, 1845 KiB

Open AccessArticle

Fast Intra-Prediction Mode Decision Algorithm for Versatile Video Coding Based on Gradient and Convolutional Neural Network

by Nana Li, Zhenyi Wang, Qiuwen Zhang, Lei He and Weizheng Zhang

Electronics 2025, 14(10), 2031; https://doi.org/10.3390/electronics14102031 - 16 May 2025

Viewed by 566

Abstract

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To [...] Read more.

The latest Versatile Video Coding(H.266/VVC) standard introduces the QTMT structure, enabling more flexible block partitioning and significantly enhancing coding efficiency compared to its predecessor, High-Efficiency Video Coding (H.265/HEVC). However, this new structure results in changes to the size of Coding Units (CUs). To accommodate this, VVC increases the number of intra-prediction modes from 35 to 67, leading to a substantial rise in computational demands. This study presents a fast intra-prediction mode selection algorithm that combines gradient analysis and CNN. First, the Laplace operator is employed to estimate the texture direction of the current CU block, identifying the most probable prediction direction and skipping over half of the redundant candidate modes, thereby significantly reducing the number of mode searches. Second, to further minimize computational complexity, two efficient neural network models, MIP-NET and ISP-NET, are developed to determine whether to terminate the prediction process for Matrix Intra Prediction(MIP) and Intra Sub-Partitioning(ISP) modes early, avoiding unnecessary calculations. This approach maintains coding performance while significantly lowering the time complexity of intra-prediction mode selection. Experimental results demonstrate that the algorithm achieves a 35.04% reduction in encoding time with only a 0.69% increase in BD-BR, striking a balance between video quality and coding efficiency. Full article

► Show Figures

Figure 1

16 pages, 433 KiB

Open AccessArticle

A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree

by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang

Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024

Viewed by 887

Abstract

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article

► Show Figures

Figure 1

12 pages, 1792 KiB

Open AccessArticle

Information Bottleneck Driven Deep Video Compression—IBOpenDVCW

by Timor Leiderman and Yosef Ben Ezra

Entropy 2024, 26(10), 836; https://doi.org/10.3390/e26100836 - 30 Sep 2024

Viewed by 1718

Abstract

Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information [...] Read more.

Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information and mutual information across various mother wavelets and decomposition levels. Additionally, we replace the conventional average pooling layers with a discrete wavelet transform creating more advanced pooling methods to investigate their effects on information and mutual information. Our results demonstrate that the proposed model and training technique outperform existing state-of-the-art video compression methods, delivering competitive rate-distortion performance compared to the AVC/H.264 and HEVC/H.265 codecs. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

24 pages, 6380 KiB

Open AccessArticle

Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec

by Woowoen Gwun, Kiho Choi and Gwang Hoon Park

Mathematics 2024, 12(18), 2874; https://doi.org/10.3390/math12182874 - 15 Sep 2024

Cited by 1 | Viewed by 1658

Abstract

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating [...] Read more.

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos. Full article

(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 7340 KiB

Open AccessArticle

Versatile Video Coding-Post Processing Feature Fusion: A Post-Processing Convolutional Neural Network with Progressive Feature Fusion for Efficient Video Enhancement

by Tanni Das, Xilong Liang and Kiho Choi

Appl. Sci. 2024, 14(18), 8276; https://doi.org/10.3390/app14188276 - 13 Sep 2024

Cited by 2 | Viewed by 2208

Abstract

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such [...] Read more.

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such as blockiness, blurriness, and ringing, which can detract from the viewer’s experience. To ensure a seamless and engaging video experience, it is essential to remove these artifacts, which improves viewer comfort and engagement. In this paper, we propose a deep feature fusion based convolutional neural network (CNN) architecture (VVC-PPFF) for post-processing approach to further enhance the performance of VVC. The proposed network, VVC-PPFF, harnesses the power of CNNs to enhance decoded frames, significantly improving the coding efficiency of the state-of-the-art VVC video coding standard. By combining deep features from early and later convolution layers, the network learns to extract both low-level and high-level features, resulting in more generalized outputs that adapt to different quantization parameter (QP) values. The proposed VVC-PPFF network achieves outstanding performance, with Bjøntegaard Delta Rate (BD-Rate) improvements of 5.81% and 6.98% for luma components in random access (RA) and low-delay (LD) configurations, respectively, while also boosting peak signal-to-noise ratio (PSNR). Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

21 pages, 18910 KiB

Open AccessArticle

Performance Comparison of VVC, AV1, HEVC, and AVC for High Resolutions

by Miroslav Uhrina, Lukas Sevcik, Juraj Bienik and Lenka Smatanova

Electronics 2024, 13(5), 953; https://doi.org/10.3390/electronics13050953 - 1 Mar 2024

Cited by 14 | Viewed by 19482

Abstract

Over the years, there has been growing interest in multimedia services, especially in the video domain, where firms and subscribers require higher resolutions, framerates, and sampling precision. This results in a huge amount of data that needs to be processed, stored, and transmitted. [...] Read more.

Over the years, there has been growing interest in multimedia services, especially in the video domain, where firms and subscribers require higher resolutions, framerates, and sampling precision. This results in a huge amount of data that needs to be processed, stored, and transmitted. As a result, researchers face the challenge of developing new compression standards that can reduce the amount of data while maintaining the same quality. In this paper, the compression performance of the latest and most commonly used video codecs, namely H.266/VVC, AV1, H265/HEVC, and H.264/AVC was examined. The test set included seven sequences of various content at 8K, Ultra HD (UHD), and Full HD (FHD) resolutions, encoded to bitrates ranging from 1 to 15 Mbps for FHD and UHD resolutions and from 5 to 50 Mbps for 8K resolution. Objective quality metrics, such as peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM), and video multi-method assessment fusion (VMAF) were used to measure codec performance. The results showed that H.266/VVC outperformed all other codecs, namely H.264/AVC, H.265/HEVC, and AV1, in terms of the Bjøntegaard delta (BD) model. The average bitrate savings were approximately 78% for H.266/VVC, 63% for AV1, and 53% for H.265/HEVC relative to H.264/AVC, 59% for H.266/VVC and 22% for AV1 compared to H.264/AVC, and 46% for H.266/VVC relative to AV1 (all for 8K resolution). The results also showed that codec performance varied depending on resolution, with higher resolutions showing greater efficiency for newly developed codecs, such as H.266/VVC and AV1. This confirms the fact that the H.266/VVC and AV1 codecs were primarily developed for videos at high resolutions, such as 8K and/or UHD. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

22 pages, 2143 KiB

Open AccessArticle

Optimization of the Generative Multi-Symbol Architecture of the Binary Arithmetic Coder for UHDTV Video Encoders

by Grzegorz Pastuszak

Electronics 2023, 12(22), 4643; https://doi.org/10.3390/electronics12224643 - 14 Nov 2023

Cited by 2 | Viewed by 1253

Abstract

Previous studies have shown that the application of the M-coder in the H.264/AVC and H.265/HEVC video coding standards allows for highly parallel implementations without decreasing maximal frequencies. Although the primary limitation on throughput, originating from the range register update, can be eliminated, other [...] Read more.

Previous studies have shown that the application of the M-coder in the H.264/AVC and H.265/HEVC video coding standards allows for highly parallel implementations without decreasing maximal frequencies. Although the primary limitation on throughput, originating from the range register update, can be eliminated, other limitations are associated with low register processing. Their negative impact is revealed at higher degrees of parallelism, leading to a gradual throughput saturation. This paper presents optimizations introduced to the generative hardware architecture to increase throughputs and hardware efficiencies. Firstly, it can process more than one bypass-mode subseries in one clock cycle. Secondly, aggregated contributions to the codestream are buffered before the low register update. Thirdly, the number of contributions used to update the low register in one clock cycle is decreased to save resources. Fourthly, the maximal one-clock-cycle renormalization shift of the low register is increased from 32 to 64 bit positions. As a result of these optimizations, the binary arithmetic coder, configured for series lengths of 27 and 2 symbols, increases the throughput from 18.37 to 37.42 symbols per clock cycle for high-quality H.265/HEVC compression. The logic consumption increases from 205.6k to 246.1k gates when synthesized on 90 nm TSMC technology. The design can operate at 570 MHz. Full article

(This article belongs to the Special Issue New Technology of Image & Video Processing)

► Show Figures

Figure 1

20 pages, 6779 KiB

Open AccessArticle

Fast CU Partition Algorithm for Intra Frame Coding Based on Joint Texture Classification and CNN

by Ting Wang, Geng Wei, Huayu Li, ThiOanh Bui, Qian Zeng and Ruliang Wang

Sensors 2023, 23(18), 7923; https://doi.org/10.3390/s23187923 - 15 Sep 2023

Cited by 2 | Viewed by 1627

Abstract

High-efficiency video coding (HEVC/H.265) is one of the most widely used video coding standards. HEVC introduces a quad-tree coding unit (CU) partition structure to improve video compression efficiency. The determination of the optimal CU partition is achieved through the brute-force search rate-distortion optimization [...] Read more.

High-efficiency video coding (HEVC/H.265) is one of the most widely used video coding standards. HEVC introduces a quad-tree coding unit (CU) partition structure to improve video compression efficiency. The determination of the optimal CU partition is achieved through the brute-force search rate-distortion optimization method, which may result in high encoding complexity and hardware implementation challenges. To address this problem, this paper proposes a method that combines convolutional neural networks (CNN) with joint texture recognition to reduce encoding complexity. First, a classification decision method based on the global and local texture features of the CU is proposed, efficiently dividing the CU into smooth and complex texture regions. Second, for the CUs in smooth texture regions, the partition is determined by terminating early. For the CUs in complex texture regions, a proposed CNN is used for predictive partitioning, thus avoiding the traditional recursive approach. Finally, combined with texture classification, the proposed CNN achieves a good balance between the coding complexity and the coding performance. The experimental results demonstrate that the proposed algorithm reduces computational complexity by 61.23%, while only increasing BD-BR by 1.86% and decreasing BD-PSNR by just 0.09 dB. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

22 pages, 4694 KiB

Open AccessArticle

Reducing Video Coding Complexity Based on CNN-CBAM in HEVC

by Huayu Li, Geng Wei, Ting Wang, ThiOanh Bui, Qian Zeng and Ruliang Wang

Appl. Sci. 2023, 13(18), 10135; https://doi.org/10.3390/app131810135 - 8 Sep 2023

Cited by 4 | Viewed by 1758

Abstract

High-efficiency video coding (HEVC) outperforms H.264 in coding efficiency. However, the rate–distortion optimization (RDO) process in coding tree unit (CTU) partitioning requires an exhaustive exploration of all possible quad-tree partitions, resulting in high encoding complexity. To simplify this process, this paper proposed a [...] Read more.

High-efficiency video coding (HEVC) outperforms H.264 in coding efficiency. However, the rate–distortion optimization (RDO) process in coding tree unit (CTU) partitioning requires an exhaustive exploration of all possible quad-tree partitions, resulting in high encoding complexity. To simplify this process, this paper proposed a convolution neural network (CNN) based optimization algorithm combined with a hybrid attention mechanism module. Firstly, we designed a CNN compatible with the current coding unit (CU) size to accurately predict the CU partitions. In addition, we also designed a convolution block to enhance the information interaction between CU blocks. Then, we introduced the convolution block attention module (CBAM) into CNN, called CNN-CBAM. This module concentrates on important regions in the image and attends to the target object correctly. Finally, we integrated the CNN-CBAM into the HEVC coding framework for CU partition prediction in advance. The proposed network was trained, validated, and tested using a large scale dataset covering various scenes and objects, which provides extensive samples for intra-frame CU partition prediction in HEVC. The experimental findings demonstrate that our scheme can reduce the coding time by up to 64.05% on average compared to a traditional HM16.5 encoder, with only 0.09 dB degradation in BD-PSNR and a 1.94% increase in BD-BR. Full article

► Show Figures

Figure 1

28 pages, 22448 KiB

Open AccessArticle

Subjective Quality Assessment of V-PCC-Compressed Dynamic Point Clouds Degraded by Packet Losses

by Emil Dumic and Luis A. da Silva Cruz

Sensors 2023, 23(12), 5623; https://doi.org/10.3390/s23125623 - 15 Jun 2023

Cited by 7 | Viewed by 3646

Abstract

This article describes an empirical exploration on the effect of information loss affecting compressed representations of dynamic point clouds on the subjective quality of the reconstructed point clouds. The study involved compressing a set of test dynamic point clouds using the MPEG V-PCC [...] Read more.

This article describes an empirical exploration on the effect of information loss affecting compressed representations of dynamic point clouds on the subjective quality of the reconstructed point clouds. The study involved compressing a set of test dynamic point clouds using the MPEG V-PCC (Video-based Point Cloud Compression) codec at 5 different levels of compression and applying simulated packet losses with three packet loss rates (0.5%, 1% and 2%) to the V-PCC sub-bitstreams prior to decoding and reconstructing the dynamic point clouds. The recovered dynamic point clouds qualities were then assessed by human observers in experiments conducted at two research laboratories in Croatia and Portugal, to collect MOS (Mean Opinion Score) values. These scores were subject to a set of statistical analyses to measure the degree of correlation of the data from the two laboratories, as well as the degree of correlation between the MOS values and a selection of objective quality measures, while taking into account compression level and packet loss rates. The subjective quality measures considered, all of the full-reference type, included point cloud specific measures, as well as others adapted from image and video quality measures. In the case of image-based quality measures, FSIM (Feature Similarity index), MSE (Mean Squared Error), and SSIM (Structural Similarity index) yielded the highest correlation with subjective scores in both laboratories, while PCQM (Point Cloud Quality Metric) showed the highest correlation among all point cloud-specific objective measures. The study showed that even 0.5% packet loss rates reduce the decoded point clouds subjective quality by more than 1 to 1.5 MOS scale units, pointing out the need to adequately protect the bitstreams against losses. The results also showed that the degradations in V-PCC occupancy and geometry sub-bitstreams have significantly higher (negative) impact on decoded point cloud subjective quality than degradations of the attribute sub-bitstream. Full article

(This article belongs to the Special Issue Advances in Coding, Sensing, and Processing for CFA Images, Light Field Images, and Point Cloud)

► Show Figures

Figure 1

17 pages, 2447 KiB

Open AccessArticle

FSVM- and DAG-SVM-Based Fast CU-Partitioning Algorithm for VVC Intra-Coding

by Fengqin Wang, Zhiying Wang and Qiuwen Zhang

Symmetry 2023, 15(5), 1078; https://doi.org/10.3390/sym15051078 - 12 May 2023

Cited by 4 | Viewed by 1963

Abstract

H.266/VVC introduces the QTMT partitioning structure, building upon the foundation laid by H.265/HEVC, which makes the partitioning more diverse and flexible but also brings huge coding complexity. To better address the problem, we propose a fast CU decision algorithm based on FSVMs and [...] Read more.

H.266/VVC introduces the QTMT partitioning structure, building upon the foundation laid by H.265/HEVC, which makes the partitioning more diverse and flexible but also brings huge coding complexity. To better address the problem, we propose a fast CU decision algorithm based on FSVMs and DAG-SVMs to reduce encoding time. The algorithm divides the CU-partitioning process into two stages and symmetrically extracts some of the same CU features. Firstly, CU is input into the trained FSVM model, extracting the standard deviation, directional complexity, and content difference complexity of the CUs, and it uses these features to make a judgment on whether to terminate the partitioning early. Then, the determination of the partition type of CU is regarded as a multi-classification problem, and a DAG-SVM classifier is used to classify it. The extracted features serve as input to the classifier, which predicts the partition type of the CU and thereby prevents unnecessary partitioning. The results of the experiment indicate that compared with the reference software VTM10.0 anchoring algorithm, the algorithm can save 49.38%~58.04% of coding time, and BDBR only increases by 0.76%~1.37%. The video quality and encoding performance are guaranteed while the encoding complexity is effectively reduced. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

13 pages, 3064 KiB

Open AccessCommunication

Visual Perception Based Intra Coding Algorithm for H.266/VVC

by Yu-Hsiang Tsai, Chen-Rung Lu, Mei-Juan Chen, Meng-Chun Hsieh, Chieh-Ming Yang and Chia-Hung Yeh

Electronics 2023, 12(9), 2079; https://doi.org/10.3390/electronics12092079 - 1 May 2023

Cited by 6 | Viewed by 3427

Abstract

The latest international video coding standard, H.266/Versatile Video Coding (VVC), supports high-definition videos, with resolutions from 4 K to 8 K or even larger. It offers a higher compression ratio than its predecessor, H.265/High Efficiency Video Coding (HEVC). In addition to the quadtree [...] Read more.

The latest international video coding standard, H.266/Versatile Video Coding (VVC), supports high-definition videos, with resolutions from 4 K to 8 K or even larger. It offers a higher compression ratio than its predecessor, H.265/High Efficiency Video Coding (HEVC). In addition to the quadtree partition structure of H.265/HEVC, the nested multi-type tree (MTT) structure of H.266/VVC provides more diverse splits through binary and ternary trees. It also includes many new coding tools, which tremendously increases the encoding complexity. This paper proposes a fast intra coding algorithm for H.266/VVC based on visual perception analysis. The algorithm applies the factor of average background luminance for just-noticeable-distortion to identify the visually distinguishable (VD) pixels within a coding unit (CU). We propose calculating the variances of the numbers of VD pixels in various MTT splits of a CU. Intra sub-partitions and matrix weighted intra prediction are turned off conditionally based on the variance of the four variances for MTT splits and a thresholding criterion. The fast horizontal/vertical splitting decisions for binary and ternary trees are proposed by utilizing random forest classifiers of machine learning techniques, which use the information of VD pixels and the quantization parameter. Experimental results show that the proposed algorithm achieves around 47.26% encoding time reduction with a Bjøntegaard Delta Bitrate (BDBR) of 1.535% on average under the All Intra configuration. Overall, this algorithm can significantly speed up H.266/VVC intra coding and outperform previous studies. Full article

(This article belongs to the Special Issue Selected Papers from 2022 IET International Conference on Engineering Technologies and Applications)

► Show Figures

Figure 1

19 pages, 1269 KiB

Open AccessArticle

A Highly Pipelined and Highly Parallel VLSI Architecture of CABAC Encoder for UHDTV Applications

by Chen Fu, Heming Sun, Zhiqiang Zhang and Jinjia Zhou

Sensors 2023, 23(9), 4293; https://doi.org/10.3390/s23094293 - 26 Apr 2023

Cited by 2 | Viewed by 2553

Abstract

Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and [...] Read more.

Recently, specifically designed video codecs have been preferred due to the expansion of video data in Internet of Things (IoT) devices. Context Adaptive Binary Arithmetic Coding (CABAC) is the entropy coding module widely used in recent video coding standards such as HEVC/H.265 and VVC/H.266. CABAC is a well known throughput bottleneck due to its strong data dependencies. Because the required context model of the current bin often depends on the results of the previous bin, the context model cannot be prefetched early enough and then results in pipeline stalls. To solve this problem, we propose a prediction-based context model prefetching strategy, effectively eliminating the clock consumption of the contextual model for accessing data in memory. Moreover, we offer multi-result context model update (MCMU) to reduce the critical path delay of context model updates in multi-bin/clock architecture. Furthermore, we apply pre-range update and pre-renormalize techniques to reduce the multiplex BAE’s route delay due to the incomplete reliance on the encoding process. Moreover, to further speed up the processing, we propose to process four regular and several bypass bins in parallel with a variable bypass bin incorporation (VBBI) technique. Finally, a quad-loop cache is developed to improve the compatibility of data interactions between the entropy encoder and other video encoder modules. As a result, the pipeline architecture based on the context model prefetching strategy can remove up to 45.66% of the coding time due to stalls of the regular bin, and the parallel architecture can also save 29.25% of the coding time due to model update on average under the condition that the Quantization Parameter (QP) is equal to 22. At the same time, the throughput of our proposed parallel architecture can reach 2191 Mbin/s, which is sufficient to meet the requirements of 8 K Ultra High Definition Television (UHDTV). Additionally, the hardware efficiency (Mbins/s per k gates) of the proposed architecture is higher than that of existing advanced pipeline and parallel architectures. Full article

(This article belongs to the Special Issue Image/Video Coding and Processing Techniques for Intelligent Sensor Nodes)

► Show Figures

Figure 1

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (64)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI