MDPI - Publisher of Open Access Journals

18 pages, 2846 KB

Open AccessArticle

A Quantization-Adaptive Early Termination Method for Fast Coding Unit Partitioning in VVC

by Donggeon Jo and Dongsan Jun

Mathematics 2026, 14(10), 1587; https://doi.org/10.3390/math14101587 - 7 May 2026

Viewed by 291

Versatile Video Coding (VVC) achieves higher compression efficiency than the previous High Efficiency Video Coding (HEVC) standard by employing advanced coding tools, including Quad Tree (QT) and Multi-Type Tree (MTT) block partitioning, extended intra prediction modes, and affine motion compensation. Among these tools, [...] Read more.

Versatile Video Coding (VVC) achieves higher compression efficiency than the previous High Efficiency Video Coding (HEVC) standard by employing advanced coding tools, including Quad Tree (QT) and Multi-Type Tree (MTT) block partitioning, extended intra prediction modes, and affine motion compensation. Among these tools, the QT-MTT hierarchical partitioning structure significantly increases encoder complexity, since Rate-Distortion Optimization (RDO) must be performed over an exponentially growing number of partition candidates. To mitigate this complexity, a quantization-adaptive early termination method is proposed that combines neural network-based and rule-based partitioning strategies. The proposed decision mechanism significantly reduces the number of Coding Unit (CU) partition candidates, which directly lowers the number of required RDO evaluations and overall encoder complexity. Experimental results demonstrate that the proposed method achieves a 38.28% reduction in encoding time with only a 0.85% increase in Bjøntegaard Delta Bitrate (BD-BR) under the VVC common test conditions. These results indicate that the proposed method effectively balances computational complexity and rate-distortion performance. Full article

(This article belongs to the Special Issue Coding Theory and the Impact of AI)

► Show Figures

Figure 1

17 pages, 2088 KB

Open AccessArticle

Perception-Driven and Object-Aware Fast MTT Partitioning for H.266/VVC: A Saliency-Guided Complexity Reduction Framework

by Chih-Ying Lin, Jia-Yi Yeh, Yu-Cheng Chen, Yi-Fan Li, Chih-Ming Lien, Mei-Juan Chen and Chia-Hung Yeh

Electronics 2026, 15(1), 133; https://doi.org/10.3390/electronics15010133 - 27 Dec 2025

Cited by 1 | Viewed by 859

Abstract

The H.266/Versatile Video Coding (VVC) standard was developed to address the growing demand for compressing ultra-high-definition video content, supporting resolutions ranging from 4K to 8K and beyond. H.266/VVC improves coding efficiency by introducing a flexible quadtree with nested multi-type tree (QT-MTT) partitioning and [...] Read more.

The H.266/Versatile Video Coding (VVC) standard was developed to address the growing demand for compressing ultra-high-definition video content, supporting resolutions ranging from 4K to 8K and beyond. H.266/VVC improves coding efficiency by introducing a flexible quadtree with nested multi-type tree (QT-MTT) partitioning and various advanced coding tools. However, these improvements substantially increase the encoding complexity. To address this issue, we propose a perception-driven and object-aware algorithm that accelerates the MTT process in H.266/VVC intra coding. Our method integrates pixel-level saliency detection with object bounding box detection. Specifically, visually distinguishable (VD) pixels are identified using a just noticeable distortion (JND) model based on average background luminance, while detected-object regions are extracted using a YOLO object detection network. These two types of perceptual information are combined to guide adaptive encoding decisions. For each frame, a perception-driven pixel map labeled with VD pixels and a YOLO-based object map are generated. Within the MTT framework, partitioning decisions are determined jointly by standard deviation metrics derived from VD pixels and detected-object region coverage. By incorporating flexible threshold settings, the proposed method can meet different users’ requirements. In this paper, we performed experiments under three threshold settings. The experimental results demonstrate that the proposed method reduces H.266/VVC intra coding time by 27.94% to 43.11%, with BDBR increases of only 1.02% to 1.53%, thus achieving an appropriate trade-off between encoding speed and coding efficiency. Full article

(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence, 2nd Edition)

► Show Figures

Figure 1

29 pages, 10629 KB

Open AccessEditor’s ChoiceArticle

Content-Adaptive Reversible Data Hiding with Multi-Stage Prediction Schemes

by Hsiang-Cheh Huang, Feng-Cheng Chang and Hong-Yi Li

Sensors 2025, 25(19), 6228; https://doi.org/10.3390/s25196228 - 8 Oct 2025

Cited by 2 | Viewed by 1465

Abstract

With the proliferation of image-capturing and display-enabled IoT devices, ensuring the authenticity and integrity of visual data has become increasingly critical, especially in light of emerging cybersecurity threats and powerful generative AI tools. One of the major challenges in such sensor-based systems is [...] Read more.

With the proliferation of image-capturing and display-enabled IoT devices, ensuring the authenticity and integrity of visual data has become increasingly critical, especially in light of emerging cybersecurity threats and powerful generative AI tools. One of the major challenges in such sensor-based systems is the ability to protect privacy while maintaining data usability. Reversible data hiding has attracted growing attention due to its reversibility and ease of implementation, making it a viable solution for secure image communication in IoT environments. In this paper, we propose reversible data hiding techniques tailored to the content characteristics of images. Our approach leverages subsampling and quadtree partitioning, combined with multi-stage prediction schemes, to generate a predicted image aligned with the original. Secret information is embedded by analyzing the difference histogram between the original and predicted images, and enhanced through multi-round rotation techniques and a multi-level embedding strategy to boost capacity. By employing both subsampling and quadtree decomposition, the embedding strategy dynamically adapts to the inherent characteristics of the input image. Furthermore, we investigate the trade-off between embedding capacity and marked image quality. Experimental results demonstrate improved embedding performance, high visual fidelity, and low implementation complexity, highlighting the method’s suitability for resource-constrained IoT applications. Full article

(This article belongs to the Special Issue Data Security Approaches for Autonomous Systems, IoT, and Smart Sensing Systems)

► Show Figures

Figure 1

20 pages, 23718 KB

Open AccessArticle

A Mamba-Based Hierarchical Partitioning Framework for Upper-Level Wind Field Reconstruction

by Wantong Chen, Yifan Zhang, Ruihua Liu, Shuguang Sun and Qing Feng

Aerospace 2025, 12(9), 842; https://doi.org/10.3390/aerospace12090842 - 18 Sep 2025

Viewed by 878

Abstract

An accurate perception of upper-level wind fields is essential for improving civil aviation safety and route optimization. However, the sparsity of observational data and the structural complexity of wind fields make reconstruction highly challenging. To address this, we propose QuadMamba-WindNet (QMW-Net), a structure-enhanced [...] Read more.

An accurate perception of upper-level wind fields is essential for improving civil aviation safety and route optimization. However, the sparsity of observational data and the structural complexity of wind fields make reconstruction highly challenging. To address this, we propose QuadMamba-WindNet (QMW-Net), a structure-enhanced deep neural network that integrates a hierarchical state-space modeling framework with a learnable quad-tree-based regional partitioning mechanism, enabling multi-scale adaptive encoding and efficient dynamic modeling. The model is trained end-to-end on ERA5 reanalysis data and validated with simulated flight trajectory observation masks, allowing the reconstruction of complete horizontal wind fields at target altitude levels. Experimental results show that QMW-Net achieves a mean absolute error (MAE) of 1.62 m/s and a mean relative error (MRE) of 6.68% for wind speed reconstruction at 300 hPa, with a mean directional error of 4.85° and an

R^{2}

of 0.93, demonstrating high accuracy and stable error convergence. Compared with Physics-Informed Neural Networks (PINNs) and Gaussian Process Regression (GPR), QMW-Net delivers superior predictive performance and generalization across multiple test sets. The proposed model provides refined wind field support for civil aviation forecasting and trajectory planning, and shows potential for broader applications in high-dynamic flight environments and atmospheric sensing. Full article

(This article belongs to the Section Air Traffic and Transportation)

► Show Figures

Figure 1

24 pages, 6296 KB

Open AccessArticle

Efficient Weather Routing Method in Coastal and Island-Rich Waters Guided by Ship Trajectory Big Data

by Yinfei Zhou, Lihua Zhang, Shuaidong Jia and Zeyuan Dai

J. Mar. Sci. Eng. 2025, 13(9), 1801; https://doi.org/10.3390/jmse13091801 - 17 Sep 2025

Cited by 3 | Viewed by 1799

Abstract

Weather routing is a critical guarantee for the safe and economical navigation of ships. Existing methods for weather routing still face challenges in selecting the appropriate planning granularity. A granularity that is overly coarse may result in routes passing through coastal and island-rich [...] Read more.

Weather routing is a critical guarantee for the safe and economical navigation of ships. Existing methods for weather routing still face challenges in selecting the appropriate planning granularity. A granularity that is overly coarse may result in routes passing through coastal and island-rich waters, such as coastal zones and reefs, thus compromising navigational safety. Conversely, a granularity that is excessively fine leads to an exponential increase in computational complexity, rendering the problem intractable. To address this issue, this paper proposes an efficient method for weather routing in coastal and island-rich waters, guided by ship trajectory big data: First, an adaptive quadtree is used to partition the navigable space into an adaptive grid, based on which a route network is constructed using ship trajectory big data. Next, a ship motion model is introduced to build both static and dynamic marine environmental fields, which are used to dynamically update the time weights of the route network. Finally, using the updated route network as a guide, the method aims to minimize voyage time and employs an improved time-varying A* algorithm for weather routing. Experimental results show that the proposed method effectively adapts to coastal and island-rich waters, outperforming the baseline SIMROUTE in safety, optimization, and efficiency. Unlike SIMROUTE, which crosses restricted areas, it avoids such risks entirely. It achieves average reductions of 6.8% in route length and 4.3% in navigation time and is 5.8 times faster than SIMROUTE for fine-grained planning. This balances voyage time, safety, and efficiency, offering a practical weather routing solution. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

19 pages, 2675 KB

Open AccessArticle

Fast Intra-Coding Unit Partitioning for 3D-HEVC Depth Maps via Hierarchical Feature Fusion

by Fangmei Liu, He Zhang and Qiuwen Zhang

Electronics 2025, 14(18), 3646; https://doi.org/10.3390/electronics14183646 - 15 Sep 2025

Cited by 1 | Viewed by 991

Abstract

As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like [...] Read more.

As a new generation 3D video coding standard, 3D-HEVC offers highly efficient compression. However, its recursive quadtree partitioning mechanism and frequent rate-distortion optimization (RDO) computations lead to a significant increase in coding complexity. Particularly, intra-frame coding in depth maps, which incorporates tools like depth modeling modes (DMMs), substantially prolongs the decision-making process for coding unit (CU) partitioning, becoming a critical bottleneck in compression encoding time. To address this issue, this paper proposes a fast CU partitioning framework based on hierarchical feature fusion convolutional neural networks (HFF-CNNs). It aims to significantly accelerate the overall encoding process while ensuring excellent encoding quality by optimizing depth map CU partitioning decisions. This framework synergistically captures CU’s global structure and local details through multi-scale feature extraction and channel attention mechanisms (SE module). It introduces the wavelet energy ratio designed for quantifying the texture complexity of depth map CU and the quantization parameter (QP) that reflects the encoding quality as external features, enhancing the dynamic perception ability of the model from different dimensions. Ultimately, it outputs depth-corresponding partitioning predictions through three fully connected layers, strictly adhering to HEVC’s quad-tree recursive segmentation mechanism. Experimental results demonstrate that, across eight standard test sequences, the proposed method achieves an average encoding time reduction of 48.43%, significantly lowering intra-frame encoding complexity with a BDBR increment of only 0.35%. The model exhibits outstanding lightweight characteristics with minimal inference time overhead. Compared with the representative methods under comparison, this method achieves a better balance between cross-resolution adaptability and computational efficiency, providing a feasible optimization path for real-time 3D-HEVC applications. Full article

► Show Figures

Figure 1

19 pages, 3140 KB

Open AccessArticle

Fast Algorithm for Depth Map Intra-Frame Coding 3D-HEVC Based on Swin Transformer and Multi-Branch Network

by Fengqin Wang, Yangang Du and Qiuwen Zhang

Electronics 2025, 14(9), 1703; https://doi.org/10.3390/electronics14091703 - 22 Apr 2025

Cited by 2 | Viewed by 1282

Abstract

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) efficiently compresses 3D video by incorporating depth map encoding techniques. However, the quadtree partitioning of depth map coding units (CUs) greatly increases computational complexity, contributing to over 90% of the total encoding time. To overcome the limitations of [...] Read more.

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) efficiently compresses 3D video by incorporating depth map encoding techniques. However, the quadtree partitioning of depth map coding units (CUs) greatly increases computational complexity, contributing to over 90% of the total encoding time. To overcome the limitations of existing methods in complex edge modeling and partitioning efficiency, this paper presents Swin-Hier Net, a hierarchical CU partitioning prediction model based on the Swin Transformer. First, a multi-branch feature fusion architecture is designed: the Swin Transformer’s shifted window attention mechanism extracts global contextual features, lightweight CNNs capture local texture details, and traditional edge/variance features enhance multi-scale representation. Second, a recursive hierarchical decision mechanism dynamically activates sub-CU prediction branches based on the partitioning probability of parent nodes, ensuring strict compliance with the HEVC standard quadtree syntax. Additionally, a hybrid pooling strategy and dilated convolutions improve edge feature retention. Experiments on 3D-HEVC standard test sequences show that, compared to exhaustive traversal methods, the proposed algorithm reduces encoding time by 72.7% on average, lowers the BD-Rate by 1.16%, improves CU partitioning accuracy to 94.5%, and maintains a synthesized view PSNR of 38.68 dB (baseline: 38.72 dB). The model seamlessly integrates into the HTM encoder, offering an efficient solution for real-time 3D video applications. Full article

► Show Figures

Figure 1

18 pages, 4154 KB

Open AccessArticle

The T-DBSCAN Algorithm for Stopover Site Identification of Migration Birds Based on Satellite Positioning Data

by Xinwu He, Xiqun Liu, Jiajia Liu, Youwen Li, Zhenggang Xu, Ping Mo and Tian Huang

Biology 2025, 14(3), 277; https://doi.org/10.3390/biology14030277 - 7 Mar 2025

Cited by 1 | Viewed by 2724

Abstract

With the acceleration of social development and urbanization, birds’ natural habitats have been greatly disturbed and threatened. Satellite tracking technology can collect much bird activity data, providing important data support for habitat protection research. However, satellite data are usually characterized by discontinuity, extensive [...] Read more.

With the acceleration of social development and urbanization, birds’ natural habitats have been greatly disturbed and threatened. Satellite tracking technology can collect much bird activity data, providing important data support for habitat protection research. However, satellite data are usually characterized by discontinuity, extensive periods, and inconsistent frequency, which challenges cluster analysis. Habitat research frequently employs clustering techniques, but conventional clustering algorithms struggle to adjust to these data features, particularly when it comes to time dimension changes and irregular data sampling. T-DBSCAN, an enhanced clustering algorithm, is suggested to accommodate this intricate data need. T-DBSCAN is improved based on the traditional DBSCAN algorithm, which combines a quadtree structure to optimize the efficiency of spatial partitioning and introduces a convex hull algorithmic strategy to perform the boundary identification and clustering processing, thus improving the efficiency and accuracy of the algorithm. T-DBSCAN is made to account efficiently for the uniformity of data sampling and changes in the time dimension. Tests demonstrate that the algorithm outperforms conventional habitat identification accuracy and processing efficiency techniques. It can also manage large amounts of discontinuous satellite tracking data, making it a dependable tool for studying bird habitats. Full article

(This article belongs to the Special Issue Bird Biology and Conservation)

► Show Figures

Figure 1

16 pages, 433 KB

Open AccessArticle

A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree

by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang

Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024

Viewed by 1745

Abstract

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article

► Show Figures

Figure 1

19 pages, 1012 KB

Open AccessArticle

Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm

by Wenjun Song, Congxian Li and Qiuwen Zhang

Electronics 2024, 13(17), 3465; https://doi.org/10.3390/electronics13173465 - 31 Aug 2024

Cited by 2 | Viewed by 1664

Abstract

H.266/Versatile Video Coding (VVC) introduces new techniques that build upon previous standards, proposing a nested multi-type tree quadtree (QTMT). The introduction of this structure significantly enhances video coding efficiency; additionally, the number of directional modes in H.266 has increased by 32 compared to [...] Read more.

H.266/Versatile Video Coding (VVC) introduces new techniques that build upon previous standards, proposing a nested multi-type tree quadtree (QTMT). The introduction of this structure significantly enhances video coding efficiency; additionally, the number of directional modes in H.266 has increased by 32 compared to H.265, accommodating a greater variety of texture patterns. However, the changes in the related structures have also led to a significant increase in encoding complexity. To address the issue of excessive computational complexity, this paper proposes a targeted rapid Coding Units segmenting approach combined with decision-making for an intra-frame modes algorithm. In the first phase of the algorithm, we extract different features for CU blocks of various sizes and input them into the decision tree model’s classifier for classification processing, determining the CU partitioning mode to prematurely terminate the partitioning, thereby reducing the encoding complexity to some extent. In the second phase of the algorithm, we put forward an intra-frame mode decision strategy grounded in gradient descent techniques with a bidirectional search mode. This maximizes the approach to the global optimum, thereby obtaining the optimal intra-frame mode and further reducing the encoding complexity. Experimentation has demonstrated that the algorithm achieves a 54.53% reduction in encoding time. In comparison, the BD-BR (Bitrate-Distortion Rate) only increases by 1.38%, striking an optimal balance between the fidelity of video and the efficacy of the encoding process. Full article

(This article belongs to the Special Issue Image and Video Processing and Retrieval Based on Machine Learning and Deep Learning)

► Show Figures

Figure 1

19 pages, 7973 KB

Open AccessArticle

Determining Thresholds for Optimal Adaptive Discrete Cosine Transformation

by Alexander Khanov, Anastasija Shulzhenko, Anzhelika Voroshilova, Alexander Zubarev, Timur Karimov and Shakeeb Fahmi

Algorithms 2024, 17(8), 366; https://doi.org/10.3390/a17080366 - 21 Aug 2024

Cited by 3 | Viewed by 2518

Abstract

The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive [...] Read more.

The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive DCT (ADCT) designed to deal with heterogeneous image structure and it may be found, for example, in the HEVC video codec. Adaptivity means that the image is divided into an uneven grid of squares: smaller ones retain information about details better, while larger squares are efficient for homogeneous backgrounds. The practical use of adaptive DCT algorithms is complicated by the lack of optimal threshold search algorithms for image partitioning procedures. In this paper, we propose a novel method for optimal threshold search in ADCT using a metric based on tonal distribution. We define two thresholds: pm, the threshold defining solid mean coloring, and ps, defining the quadtree fragment splitting. In our algorithm, the values of these thresholds are calculated via polynomial functions of the tonal distribution of a particular image or fragment. The polynomial coefficients are determined using the dedicated optimization procedure on the dataset containing images from the specific domain, urban road scenes in our case. In the experimental part of the study, we show that ADCT allows a higher compression ratio compared to non-adaptive DCT at the same level of quality loss, up to 66% for acceptable quality. The proposed algorithm may be used directly for image compression, or as a core of video compression framework in traffic-demanding applications, such as urban video surveillance systems. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

20 pages, 22124 KB

Open AccessArticle

A Reversible Data-Hiding Method for Encrypted Images Based on Adaptive Quadtree Partitioning and MSB Prediction

by Ya Yue, Minqing Zhang, Fuqiang Di and Peizheng Lai

Appl. Sci. 2024, 14(14), 6376; https://doi.org/10.3390/app14146376 - 22 Jul 2024

Cited by 1 | Viewed by 2056

Abstract

To address the vulnerability of the widely used block permutation and co-XOR (BPCX) encryption algorithm in reversible data-hiding in the encrypted domain (RDH-ED), which is susceptible to known-plaintext attacks (KPAs), and to enhance embedding capacity, we propose a novel technique of reversible data-hiding [...] Read more.

To address the vulnerability of the widely used block permutation and co-XOR (BPCX) encryption algorithm in reversible data-hiding in the encrypted domain (RDH-ED), which is susceptible to known-plaintext attacks (KPAs), and to enhance embedding capacity, we propose a novel technique of reversible data-hiding in encrypted images (RDH-EI). This method incorporates adaptive quadtree partitioning and most significant bit (MSB) prediction. To counteract KPAs, we introduce pixel modulation specifically targeting pixels within blocks of the same level during the encryption phase. During data embedding, we utilize tagging bits to indicate the state of the pixel blocks, capitalizing on pixel redundancy within those blocks to augment embedding capacity. Our experimental results demonstrate that our method not only achieves reversibility and separability but also significantly boosts embedding capacity and method security. Notably, the average embedding rate across the 10,000 images tested stands at 2.4731, surpassing previous methods by 0.2106 and 0.037, respectively. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 940 KB

Open AccessArticle

Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications

by Lei Chen, Baoping Cheng, Haotian Zhu, Haowen Qin, Lihua Deng and Lei Luo

Electronics 2024, 13(11), 2150; https://doi.org/10.3390/electronics13112150 - 31 May 2024

Cited by 10 | Viewed by 3700

Abstract

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet [...] Read more.

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet of video things. In the case of intra coding, VVC utilizes the brute-force recursive search for both the partition structure of the coding unit (CU), which is based on the quadtree with nested multi-type tree (QTMT), and 67 intra prediction modes, compared to 35 in HEVC. As a result, we offer optimization strategies for CU partition decision and intra coding modes to lessen the computational overhead. Regarding the high complexity of the CU partition process, first, CUs are categorized as simple, fuzzy, and complex based on their texture characteristics. Then, we train two random forest classifiers to speed up the RDO-based brute-force recursive search process. One of the classifiers directly predicts the optimal partition modes for simple and complex CUs, while another classifier determines the early termination of the partition process for fuzzy CUs. Meanwhile, to reduce the complexity of intra mode prediction, a fast hierarchical intra mode search method is designed based on the texture features of CUs, including texture complexity, texture direction, and texture context information. Extensive experimental findings demonstrate that the proposed approach reduces complexity by up to 77% compared to the latest VVC reference software (VTM-23.1). Additionally, an average coding time saving of 70% is achieved with only a 1.65% increase in BDBR. Furthermore, when compared to state-of-the-art methods, the proposed method also achieves the largest time saving with comparable BDBR loss. These findings indicate that our method is superior to other up-to-date methods in terms of lowering VVC intra coding complexity, which provides an elective solution for power-constrained applications. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

17 pages, 512 KB

Open AccessArticle

Fast Coding Unit Partitioning Algorithm for Video Coding Standard Based on Block Segmentation and Block Connection Structure and CNN

by Nana Li, Zhenyi Wang and Qiuwen Zhang

Electronics 2024, 13(9), 1767; https://doi.org/10.3390/electronics13091767 - 2 May 2024

Cited by 4 | Viewed by 2674

Abstract

The recently introduced Video Coding Standard, VVC, presents a novel Quadtree plus Nested Multi-Type Tree (QTMTT) block structure. This structure enables a more flexible block partition and demonstrates enhanced compression performance compared to its predecessor, HEVC. However, The introduction of the new structure [...] Read more.

The recently introduced Video Coding Standard, VVC, presents a novel Quadtree plus Nested Multi-Type Tree (QTMTT) block structure. This structure enables a more flexible block partition and demonstrates enhanced compression performance compared to its predecessor, HEVC. However, The introduction of the new structure has led to a more complex partition search process, resulting in a considerable increase in time complexity. The QTMTT structure yields diverse Coding Unit (CU) block sizes, posing challenges for CNN model inference. In this study, we propose a representation structure termed Block Segmentation and Block Connection (BSC), rooted in texture features. This ensures that partial CU blocks are uniformly represented in size. To address different-sized CUs, various levels of CNN models are designed for prediction. Moreover, we introduce a post-processing method and a multi-thresholding scheme to further mitigate errors introduced by CNNs. This allows for flexible and adjustable acceleration, achieving a trade-off between coding time complexity and performance. Experimental results indicate that, in comparison to VTM-10.0, our “Fast” scheme reduces the average complexity by 57.14% with a 1.86% increase in BDBR. Meanwhile, the “Moderate” scheme reduces average complexity by 50.14% with only a 1.39% increase in BDBR. Full article

(This article belongs to the Special Issue Recent Advances in Image/Video Compression and Coding)

► Show Figures

Figure 1

23 pages, 5497 KB

Open AccessArticle

Fast Decision-Tree-Based Series Partitioning and Mode Prediction Termination Algorithm for H.266/VVC

by Ye Li, Zhihao He and Qiuwen Zhang

Electronics 2024, 13(7), 1250; https://doi.org/10.3390/electronics13071250 - 27 Mar 2024

Cited by 7 | Viewed by 2340

Abstract

With the advancement of network technology, multimedia videos have emerged as a crucial channel for individuals to access external information, owing to their realistic and intuitive effects. In the presence of high frame rate and high dynamic range videos, the coding efficiency of [...] Read more.

With the advancement of network technology, multimedia videos have emerged as a crucial channel for individuals to access external information, owing to their realistic and intuitive effects. In the presence of high frame rate and high dynamic range videos, the coding efficiency of high-efficiency video coding (HEVC) falls short of meeting the storage and transmission demands of the video content. Therefore, versatile video coding (VVC) introduces a nested quadtree plus multi-type tree (QTMT) segmentation structure based on the HEVC standard, while also expanding the intra-prediction modes from 35 to 67. While the new technology introduced by VVC has enhanced compression performance, it concurrently introduces a higher level of computational complexity. To enhance coding efficiency and diminish computational complexity, this paper explores two key aspects: coding unit (CU) partition decision-making and intra-frame mode selection. Firstly, to address the flexible partitioning structure of QTMT, we propose a decision-tree-based series partitioning decision algorithm for partitioning decisions. Through concatenating the quadtree (QT) partition division decision with the multi-type tree (MT) division decision, a strategy is implemented to determine whether to skip the MT division decision based on texture characteristics. If the MT partition decision is used, four decision tree classifiers are used to judge different partition types. Secondly, for intra-frame mode selection, this paper proposes an ensemble-learning-based algorithm for mode prediction termination. Through the reordering of complete candidate modes and the assessment of prediction accuracy, the termination of redundant candidate modes is accomplished. Experimental results show that compared with the VVC test model (VTM), the algorithm proposed in this paper achieves an average time saving of 54.74%, while the BDBR only increases by 1.61%. Full article

(This article belongs to the Special Issue Signal, Image and Video Processing: Development and Applications)

► Show Figures

Figure 1

Search Results (42)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (42)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI