Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine

Su, Xiaoke; Liu, Yaqiong; Zhang, Qiuwen

doi:10.3390/electronics13132586

Open AccessArticle

Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine

by

Xiaoke Su

,

Yaqiong Liu

and

Qiuwen Zhang

^*

College of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(13), 2586; https://doi.org/10.3390/electronics13132586

Submission received: 3 June 2024 / Revised: 25 June 2024 / Accepted: 29 June 2024 / Published: 1 July 2024

Download

Browse Figures

Versions Notes

Abstract

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) cost calculation, resulting in a complex encoding process. Therefore, enhancing encoding efficiency and reducing redundant computations are key objectives for optimizing 3D-HEVC. This paper introduces a fast-encoding method for 3D-HEVC, comprising an adaptive CU partitioning algorithm and a rapid rate–distortion-optimization (RDO) algorithm. Based on the ALV features extracted from each coding unit, a Gradient Boosting Machine (GBM) model is constructed to obtain the corresponding CU thresholds. These thresholds are compared with the ALV to further decide whether to continue dividing the coding unit. The RDO algorithm is used to optimize the RD cost calculation process, selecting the optimal prediction mode as much as possible. The simulation results show that this method saves 52.49% of complexity while ensuring good video quality.

Keywords:

3D-HEVC; CU split; depth maps; GBM; RDO

1. Introduction

In the past few years, alongside the widespread growth in demand for digital video applications, the rapid development of 3D movies, gaming, and virtual reality technologies has led to increasing consumer demands for higher quality and more efficient video solutions [1]. Traditional 2D video encoding technologies can no longer meet these needs. Compared to 2D videos, 3D videos provide a more realistic immersive experience, offering audiences more vivid visual effects. At the same time, with the increasing allure of 3D videos, coupled with the growing demand for more immersive viewing experiences, the rapid proliferation of 3D applications has been propelled [2,3]. To meet the users’ desire for high-quality 3D content, Moving Picture Experts Group (MPEG) and Video Coding Experts Group (VCEG) collaborated to develop the 3D-HEVC standard. This standard employs a series of intricate encoding tools and dependencies between components, effectively enhancing video compression efficiency, thus making the transmission and storage of 3D content more efficient and reliable [4,5].

In the 3D-HEVC standard, one of the multi-view videos is regarded as the primary view or independent view. Its encoding is compatible with HEVC and does not rely on depth maps or other views. Additionally, the encoding structure has been expanded upon, inheriting from HEVC, introducing a series of new encoding techniques. The incorporation of these new techniques not only enhances the flexibility and efficiency of encoding, but also enables 3D-HEVC to better adapt to the encoding requirements of different types of 3D content [6]. Compared to traditional HEVC encoding methods, the utilization of the Multi-View plus Depth (MVD) encoding format for 3D videos represents a significant change. This new encoding format further reduces the data volume during the encoding of 3D videos, thereby enhancing compression efficiency. By simultaneously considering information from multiple views and depth images, the MVD encoding format can better capture the details of 3D scenes and reduce the cost of data transmission and storage while maintaining high quality. This difference from traditional HEVC encoding methods brings greater efficiency and advantages to the transmission and display of 3D videos [7].

The 3D-HEVC standard utilizes Depth-Image-Based Rendering (DIBR) for generating virtual views, incorporating depth maps within its encoding structure [8]. Depth maps, distinct from texture maps, depict distance information between objects and the camera through grayscale values, with each pixel denoting the length from the camera to the corresponding scene point. Typically, it exhibits less intricate details, featuring predominantly large uniform regions interspersed with sharp transitions where pixel values change abruptly. Integrating depth-map-encoding methods augments the encoding intricacy, particularly in accurately representing edge regions, thereby amplifying the overall complexity of 3D video encoding [9]. To streamline depth map encoding, 3D-HEVC introduces a suite of tools and efficient coding modes tailored for this purpose, including Depth Modeling Mode Segments (DMMS) [10], texture-aware depth prediction modes, and motion parameter inheritance modes [11]. These innovations significantly mitigate the bit requirement for encoding videos in 3D-HEVC.

The 3D-HEVC standard is renowned for its high compression efficiency and excellent synthesis, but it comes with high encoding complexity. In this, decisions regarding the size of coding units (CUs) involve various prediction modes, selecting the mode with the minimum RD cost. This comprehensive approach, considering spatial, temporal, motion, and depth information, ensures optimal compression efficiency and synthesis quality while reducing encoding complexity [12]. The 3D-HEVC standard adopts the quadtree coding structure from HEVC as shown in Figure 1a, allowing for hierarchical division and coding of video frames. The image block undergoes progressive division into four sub-blocks, each of which may be subdivided into four smaller sub-blocks, creating a tree-like structure represented by a tree diagram, as depicted in Figure 1b. This process recursively splits from a 64 × 64 to four identically sized CUs down to 8 × 8 samples [13]. The depths of 0, 1, 2, and 3 correspond to CU sizes from 64 × 64 down to 8 × 8, respectively. At each CU depth, a series of operations is performed, including checking the motion estimation of all prediction modes and the RDO process. Using RDO to explore all depth levels, the configuration with the lowest RD cost is selected [14]. These processes strive to enhance compression efficiency while simultaneously escalating computational demands and complexity, thereby constraining the real-time applicability of 3D-HEVC. Hence, expediting the CU size-determination process holds paramount importance in bolstering real-time performance and has garnered considerable research interest.

In 3D-HEVC encoding prediction, as illustrated in Figure 2, the mode count has expanded to 35, encompassing a DC mode, a planar mode, and 33 angular modes [15]. Although the increase in the number of modes improves encoding efficiency, it also makes the encoding process more complex due to the heightened computational demands required for highly complex RDO on every mode. Selecting the optimal candidate mode from the 35 available significantly increases computational complexity and encoding time.

To confirm the accuracy of virtual views, an RDO standard was designed, with the corresponding RD cost calculated by utilizing the introduced synthetic view distortion (SVD) [16]. To make the SVD calculation more precise, rendering operations were introduced, significantly increasing the complexity of computation. To overcome this challenge, rapid RDO methods for depth video have been intended. These methods fundamentally aim to reduce redundant complexity by examining accumulated RD costs and avoiding unnecessary computations. In traditional RDO processes, exhaustive searches are performed for every CU level to ascertain the most favorable coding mode, resulting in significantly increased computational load. Rapid RDO methods, on the other hand, employ a more intelligent strategy by accumulating and analyzing already computed RD costs, selectively determining the next computation direction, thus reducing unnecessary computation steps. The introduction of this approach provides an effective means to decrease the encoding intricacy of 3D-HEVC, laying the foundation for more efficient 3D video coding and transmission.

Existing fast 3D-HEVC encoding methods often face challenges such as long encoding times, high computational complexity, and redundant algorithmic processes. These issues can hinder the efficiency and practicality of 3D-video-encoding systems. To address these challenges, we propose a novel approach that uniquely improves these drawbacks. By utilizing an adaptive coding-unit (CU)-partitioning algorithm based on Gradient Boosting Machine (GBM) and a fast rate–distortion-optimization (RDO) algorithm, our method dynamically adjusts CU partitioning using the GBM model to simplify the decision-making processes, thereby reducing encoding time and complexity. Additionally, by introducing an early termination module in the RD cost calculation process, we reduce the redundancy in the RD cost computation during encoding. This innovative strategy combines the aforementioned methods to minimize redundant computations in 3D-HEVC depth map encoding while simultaneously reducing encoding time and complexity. It not only enhances encoding efficiency, but also maintains encoding performance, making it a promising solution for optimizing 3D-video-encoding systems.

Therefore, by combining the GBM algorithm with rapid RDO methods, this paper suggests a rapid encoding approach tailored for 3D-HEVC. The arrangement of the next sections is outlined as follows: Section 2 will introduce the related work; Section 3 will elaborate on the fast encoding algorithm; experimental results will be showcased in Section 4, while Section 5 will provide a summary of the entire content.

2. Related Work

To effectively alleviate the complexity of 3D video depth map encoding, researchers have predominantly introduced two types of fast decision algorithms. The first type focuses on CU size optimization, aiming to expedite the encoding process in 3D-HEVC. It utilizes specific strategies and models to accelerate the division and decision process of CUs, classifying CU divisions into either continuing to divide further or not dividing at all, to further reduce encoding complexity. Reference [17] explores the application of machine learning techniques in encoder characteristics. Specifically, they use these techniques to construct a static decision tree to determine whether coding units (CUs) should undergo further partitioning. This approach effectively considers contextual information for encoding by analyzing and evaluating encoder attributes such as motion vectors and transform coefficients. The introduction of this method provides new ideas and approaches for enhancing the decision-making capability and compression efficiency of encoders. Reference [18] applies machine learning algorithms to identify features closely associated with CU division. It utilizes an improved neural network model trained on these features, setting thresholds and executing early termination of CU division accordingly. Reference [19] uses the uniformity of the CU’s structural tensor to examine the association between CU complexity and depth size, then extracts each CU’s structural tensor to characterize its uniformity, employing a fast algorithm to achieve early termination of CU splitting. Reference [20] employs Bayesian decision rules and examines the association between texture videos and spatially neighboring tree blocks to study the characteristics of depth video tree blocks. This study emphasizes the importance of early mode selection and adaptive CU pruning termination. Additionally, Reference [21] regards early CU level decisions as a clustering problem and develops three clustering models for this purpose. They propose an early CU-level-partitioning scheme for depth video encoding based on an unsupervised learning approach, providing a new solution path for improving encoding efficiency and reducing redundancy. It is worth noting that multiple techniques are typically employed to detect homogeneous regions during the CU partitioning process in 3D-HEVC. These techniques include global variance, texture analysis, and structure tensor analysis [17,19].

Another type is the prediction mode fast decision algorithm. This algorithm addresses the complexity issues brought about by the MVD encoding schema and the quadtree structure used in the 3D-HEVC encoding standard. Through prediction-mode-selection algorithms, the complexity of the intra-frame encoding of depth maps in 3D videos can be effectively reduced. Reference [22] designed a rapid method for 3D-HEVC utilizing Bayesian decision theory, aimed at optimizing 3D-HEVC technology. Reference [23] analyzed the distribution and statistical likelihood of inter-frame forecasting modes in the encoding method, observing a high correlation between the dependent view modes and the base view, which facilitated early-stage decisions. Reference [24] delves deeper into the correlations within the coding internal information to comprehend the characteristics and structure of video content across different views and components. Tailored to different types of coding units (CUs), they devise a clever strategy based on complexity analysis of encoding modes, allocating different intra-frame candidate modes to different CU types. By leveraging the best intra-frame prediction mode and its corresponding RD cost at the current CU depth level, they devise a mechanism to bypass redundant intra-frame prediction sizes, thereby further enhancing encoding efficiency and speed. This research provides new insights and methods for intra-frame mode selection in depth image coding, making significant contributions to improving the performance and practicality of video coding. Furthermore, Reference [25] delves into the content attributes of tree blocks by analyzing the spatiotemporal, inter-view, and texture–depth correlations. Specifically, they focus on exploring adaptive skipping of unnecessary prediction modes. Reference [26] employed a similar approach, utilizing the correlations between time, space, perspective, and components to train the model. This research enhances the prediction accuracy of the XG-Boost model and makes significant contributions to improving video coding performance by accelerating the depth-map-encoding process.

Existing fast encoding algorithms for 3D-HEVC perform excellently in accelerating the CU partitioning and decision-making process, as well as optimizing intra-frame encoding. However, these algorithms have deficiencies in dynamic adaptability and optimizing the redundancy in the RDO calculation process. Moreover, most of these algorithms focus solely on either CU division or mode decision, neglecting the redundancy in the RDO calculation process. To address this, we combine an adaptive CU partitioning algorithm with a fast RDO algorithm to further enhance encoding efficiency and reduce complexity. We use the highly accurate GBM algorithm to build models and obtain appropriate thresholds for CU division. Simultaneously, a fast rate–distortion-optimization algorithm is employed in the RDO process, introducing an early termination module to eliminate potential complexity in the RD-cost-calculation process.

3. Proposed Algorithm

3.1. GBM-Based Adaptive CU Partition

3.1.1. Observations and Analysis

The division of coding units in 3D-HEVC is a complex and crucial process, and it is also key to improving the compression ratio. It inherits the quadtree division structure from HEVC, which is used to finely divide the coding units to adapt to the complex image structures. As depicted in Figure 1, during the division process, each Coding Tree Unit (CTU) is uniformly divided into smaller CUs until the tiniest CU size reaches 8 × 8. For inter-frame coding, 3D-HEVC encodes depth maps at three depth levels. The RDO process is traversed for all available depth levels and modes to obtain the encoding mode with the lowest

R D_{c o s t}

, conducted as follows:

R D_{\cos t} = S S E_{luma} + ω_{chroma} \times S S E_{chroma} + λ_{mode} \times R_{mode}

(1)

In this setup,

S S E_{l u m a}

and

S S E_{c h r o m a}

, respectively, represent the distortion in the luminance and chrominance components between the current CU and the matching CU, where

ω_{c h r o m a}

denotes the chrominance weighting parameter (chrominance weighting factor),

λ_{m o d e}

represents the Lagrange multiplier, and

R_{m o d e}

represents the total bitrate cost. This approach yields favorable RD performance, but also introduces substantial complexity, constraining the encoder’s application in real-time scenarios.

Generally, for CUs with significant global motion or in areas of complex motion, larger depth levels are chosen, while for CUs in areas of minor motion or uniform regions, smaller depth levels are often selected. To better assess the influence of depth map complexity on the encoder, we conducted statistical studies on the allocation of computational resources in the 3D-HEVC encoder. Experiments were conducted on HTM-16.1, encoding four experimental video sequences. The resolutions for these sequences are 1920 × 1088 and 1024 × 768, including “Balloons”, “Kendo”, “Shark”, and “Poznan_Hall2”. The simulation settings included evaluation through four pairs of QPs at (25, 34), (30, 39), (35, 42), and (40, 45); 150 frames were simulated; View Synthesis Optimization (VSO) was enabled.

Table 1 displays the allocation of coding unit (CU) levels for depth maps in 3D-HEVC depth encoding, showing the changes in CU sizes under different quantization parameters (QPs). It is evident that, for depth maps, across eight test sequences, the likelihood of selecting depth level 0 surpasses 80%. This phenomenon is due to depth maps being different from typical texture video content, characterized by sharp edge regions and large uniform areas. According to Figure 3, we can infer that, as the maximum encoding unit (CU) depth level increases, the encoding complexity significantly rises. However, in most cases, CUs opt for depth level 0 as the preferred size. Therefore, by identifying the current CU depth level and bypassing unnecessary CU sizes, the mode decision process can markedly diminish the computational complexity, thereby saving a majority of encoding time. An adaptive termination scheme is used in the CU division process, where CU division operations cease when the encoder reaches the early termination condition. This approach allows for more precise CU division, significantly saving encoding time and achieving better encoding results.

In the CU division process for depth encoding in 3D-HEVC, coding units often choose smaller depth levels as the optimal depth. As previously analyzed, we can use the characteristics of depth map variations to assess block properties, and use algorithms to find appropriate threshold conditions to simplify the CU division process and skip the significant computational complexity caused by unnecessary prediction modes and RD costs at various depth levels. If a region’s CUs have similar characteristic properties, then that region is considered homogeneous. Although many techniques exist for detecting uniform areas in images, they often have high computational costs and are not suitable for implementation in fast algorithms.

3.1.2. Gradient Boosting Machines Algorithm

Gradient Boosting Machine (GBM) is an efficient machine learning algorithm and a variant of the boosting algorithm. It is based on the concept of “ensemble learning”, which constructs a powerful learning model through the fusion of numerous weak learners, typically decision trees [27]. Each weak learner aims to adjust to the negative gradient of the loss function of the preceding cumulative model, thus minimizing the cumulative model’s loss along the direction of the negative gradient. The core idea is to train a new model in each iteration, with the task of correcting the prediction errors of the previous model on the training data. Ultimately, GBM combines multiple models, each focusing on correcting the residuals of the previous model, thereby forming a more powerful and accurate ensemble model. This iterative process allows GBM to gradually adapt to the complexity of the data, thereby improving the overall model’s generalization ability [28].

Boosting-type algorithms can be considered as additive models:

f (x) = \sum_{m = 1}^{M} α_{m} T (x; θ_{m})

(2)

In the formula,

α_{m}

represents the weight proportion, also known as the weight coefficient, of the base learner; the number of learners is denoted by M; the m-th base learner is represented by

T (x; θ_{m})

, where

θ_{m}

are the parameters of a classification tree, which could also be the parameters of a regression tree model, and they are the parameters used to learn the classifier. The problem of minimizing the loss function can approximately represent the boosting learning model, provided that the loss function has been specified and the training data given, with the optimization objective function being:

\underset{f}{argmin} \sum_{i = 1}^{N} L (y_{i,} f (x_{i}))

(3)

In this context,

x_{i}

represents the input features and

y_{i}

represents the class labels, where

y_{i} \in 0, 1

. The objective is to determine a function

f (x)

for classification that also minimizes the loss function

L (y_{i}, F (x_{i}))

. Gradient Boosting Machine (GBM) iteratively fits residuals, training a new weak learner in each iteration to correct the residuals of the previous model, thereby gradually improving the overall model performance. The GBM method involves constructing new base learners based on the gradient descent direction of the loss function of previous base learners, aiming to continuously reduce the overall loss of the model through the ensemble of base learners [29], thus continuously optimizing the model. The GBM algorithm proceeds as follows:

The first step is to input training data

(x_{i}, y_{i})

, build the boosting tree model

f_{m} (x)

, and initialize it to zero. In the m-th base learner, calculate

g_{m} (x_{i})

, which is the gradient, based on which the parameters

θ^{'}

of the m-th learner are computed, and then, update the model

f_{m} (x)

.

3.1.3. Feature Selection and Adaptive CU Size Decision Algorithm

We explore all potential features for partitioning and, for each feature, examine all feasible partitioning values by organizing observations of each feature. If a region’s coding units have similar features and attributes, the region is considered homogeneous, with no significant differences between different coding units within the region. In 3D-HEVC, a range of methods are employed to detect homogeneous regions, which encompasses global variance, Absolute Local Variation (ALV), adaptive loop filtering, and structural tensors. These techniques help identify regions with similar features and properties, thereby recognizing homogeneous areas. Particularly for blocks with noise, ALV is more precise and effective. Therefore, we can utilize the properties of ALV to decrease the complexity of inter-frame coding in 3D-HEVC.

First, the data (block pixels) are divided into a specific set of subsets. Then, a predefined size mask M is created, where all elements are unit elements, and the center of the mask is positioned over each element (pixel) within the data (block pixels). By aligning the mask with block elements, the data are partitioned into subsets at the pixel level, with each subset comprising the same number of elements (pixels) as the block. Let us consider a 3 × 3 mask M. Following the convolution of the mask with the block elements, the resultant coding unit (CU) comprises N pixels, with each pixel being influenced by its neighboring mask elements (right, left, top, bottom, and diagonal). Illustrated in Figure 4, every pixel consists of nine elements; the local variance (LV) is then computed for each pixel

x (i, j)

according to Formula (4):

L V = \frac{1}{M \times M} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{M - 1} x_{i, j}^{2} - {[\frac{1}{M \times M} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{M - 1} x_{i, j}]}^{2}

(4)

where M represents the dimensions of the 3 × 3 mask. Based on each pixel’s LV, we calculate each CU’s Absolute Local Variation (ALV) as follows:

A L V_{C U} = \frac{1}{N \times N} \sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} L V_{i, j}

(5)

In this procedure, N represents the pixel count within every CU size, while

L V (i, j)

denotes the luminance value located at coordinates

(i, j)

. This method facilitates the extraction of luminance values for every pixel block, serving as a foundational step for subsequent processing and analysis.

For each coding unit, the complexity is evaluated using the Average Local Variance (ALV). For larger CU sizes, lower ALV values are generally preferred. Conversely, if the ALV value is high, this indicates that the area exhibits a higher level of complexity, and therefore, the CU will be encoded using the smallest size possible. Figure 5 shows an analysis of the Cumulative Distribution Function (CDF) of Average Local Variance (ALV) under QP (30, 39). It has been noted that, when ALV is low, depth 0 is predominantly selected, with the likelihood of selecting other depth levels being less than 20%. Compared to depths 2 and 3, depth 1 has a higher probability of being chosen. Moreover, when the ALV values are low, depth 2 is favored over depth 3 in terms of selection likelihood. However, when ALV is high, there is an increased likelihood of selecting other CU sizes, especially when ALV is exceptionally high, which necessitates evaluation. This analysis helps in decision-making about the CU depth and size based on the complexity of the region, which can lead to more efficient video encoding.

To address the significant increase in computational complexity due to the recursive splitting process in coding units (CUs) for 3D-HEVC depth map encoding, an adaptive CU level decision algorithm has been devised to alleviate the computational burden. This approach employs adaptive algorithms and machine learning techniques to create an adaptive CU size decision model that adjusts the encoding parameters based on the video content and characteristics and sets appropriate thresholds to achieve better performance and lower complexity in different scenarios. For training the Gradient Boosting Machine (GBM) model, the paper selects five training video sequences featuring various resolutions (non-CTC sequences). These sets of video sequences include “Akko & Kayo” and “Rena” with a resolution of 640 × 448 and “Dog”, “Champagne tower”, “Microworld”, and “Pantomime” with a resolution of 1220 × 960. The training dataset was compiled using the initial 24 frames from each video sequence. Average Local Variance (ALV) for each CU was calculated using formula (5), and ALV was extracted as a feature for the GBM model to make split decisions. The depth map CU datasets were arranged for depth levels 0, 1, and 2. Each dataset comprises elements including

A L V_{c u}

and a split flag, labeled accordingly. Each CU has two possible states, split or not split. Consider 0 to denote non-split mode and 1 to indicate split mode. If a CU is encoded in its current size, the split decision flag is set to “0”; if a CU is split into smaller sizes, the split decision flag is “1”. The decision to split a current CU depends on the depth view

S_{c}

defined as follows:

S_{c} = \{\begin{matrix} 0 & if A L V_{c u} \leq T H_{Depth} \\ 1 & Otherwise \end{matrix}

(6)

Here,

T H_{D e p t h}

represents a specific adaptive-parameter-determination threshold. The quantization parameter (QP) and depth level both influence the threshold variation, and QP also affects the overall encoding quality and mode of splitting. The depth parameter corresponds to CU sizes of 64 × 64, 32 × 32, and 16 × 16 for values of 0, 1, and 2, based on their sequence.

T H_{D e p t h}

varies with the depth level and quantization parameter. If

A L V_{c u}

is less than or equal to

T H_{D e p t h}

, the current CU division can be halted prematurely, without any additional operations. If not, the current CU will undergo division into the subsequent depth level.

The method flow is illustrated in Figure 6, starting with calculating each pixel’s local variance LV using Formula (4), followed by the depth map CU-checking process. For each depth map CU, depth map values

T H_{0}

,

T H_{1}

, and

T H_{2}

are used as thresholds. Data mining techniques are then employed to derive ALV features from the training sequences and sort them by CU size (64 × 64, 32 × 32, 16 × 16), initiating the splitting process and calculating

A L V_{c u}

according to Formula (5). A Gradient Boosting Machine (GBM) model is constructed and the split values extracted. Ultimately, the thresholds extracted by the GBM model are utilized to execute the algorithm, with results compared against HTM16.1.

Table 2 presents a detailed breakdown of the depth map thresholds derived from the GBM model, and the corresponding threshold curves and modeling functions are shown in Figure 7. It is evident that, as the quantization parameter (QP) escalates, the thresholds also tend to increase. To ensure that this model’s decision-making can adapt to different QPs, mathematical modeling of the thresholds based on the depth map’s QP has been performed, generating three thresholds for each QP. Formula (7) represents the mathematical expression for predicting the corresponding thresholds for the depth map, where 0, 1, and 2 correspond to the depth level values.

Here are the corresponding mathematical expressions for the thresholds:

\{\begin{matrix} T H_{0} (Q P) = 0.2396 Q P^{3} - 26.119 Q P^{2} + 952.1 Q P - 11527 \\ T H_{1} (Q P) = 0.1209 Q P^{3} - 12.382 Q P^{2} + 423.25 Q P - 4791 \\ T H_{2} (Q P) = 0.9468 Q P^{2} - 66.374 Q P + 1167.4 \end{matrix}

(7)

3.2. Fast Rate–Distortion-Optimization Algorithm

3.2.1. Observations and Analysis

In the 3D-HEVC encoding framework, the RDO process stands out as the most time-intensive phase, consuming more than half of the total encoding time. The design of the RDO criteria is a complex and critical step in video encoding, aimed at achieving an optimal balance between the bitrate used for encoding and the quality distortion of the reconstructed signal, to ensure the best compression results while maintaining perceptual quality. The steps in designing RDO criteria typically include designing a bitrate model, a distortion model, selecting Lagrange multipliers, and analyzing the bitrate–distortion curve. The RDO criteria applied to depth maps in 3D-HEVC closely resemble those utilized in conventional video encoding and can be articulated as follows:

J = D + λ R

(8)

where J is the RD cost, representing the overall loss as a weighted sum of bitrate and distortion.

λ

is the Lagrange multiplier used to balance the bitrate and distortion. R signifies the bitrate indicating the quantity of bits needed for encoding, and D represents the distortion introduced during encoding and decoding, for which in-depth encoding must consider both the depth distortion and the synthetic distortion. In 3D-HEVC, D is designated as:

D = w_{d} D_{d} + w_{s} D_{s}

(9)

In 3D-HEVC encoding,

w_{d}

and

w_{s}

are the weight factors for the depth map and synthetic view distortion, used to balance their impact on overall distortion.

D_{d}

represents the easily measurable depth distortion between the original and reconstructed depth blocks, reflecting errors in image depth information.

D_{s}

is a criterion for evaluating SVD, which is crucial for ensuring end-to-end viewing quality. Specifically,

D_{s}

can be determined via rendering operations, denoted as

D_{s, r e n d e r}

or obtained from an estimation model

D_{s, m o d e l}

, but its accuracy is generally low and often applied in rough mode decision stages. Additionally, it can also be defined according to the change in synthetic view distortion (SVDC). The two

D_{s, r e n d e r}

values before and after encoding of the depth block, labeled as

D_{s, r e n d e r, a}

and

D_{s, r e n d e r, b}

, respectively, are then proposed to use

D_{s, r e n d e r, a} - D_{s, r e n d e r, b}

to measure the SVD caused by depth distortion. This measurement standard is known as SVDC, also marked as

D_{s, s v d c}

. Due to its high precision,

D_{s, s v d c}

is incorporated into the RDO process of 3D-HEVC. This is essentially a way to optimize depth map encoding to minimize depth information errors caused by encoding and improve overall video quality. Specifically,

D_{s}

in Formula (9) can be represented as:

D_{s} = \frac{1}{K} \sum_{k = 1}^{K} D_{s, s v d c (k)}

(10)

where K represents the cumulative number of depth blocks and

D_{s, s v d c (k)}

denotes the change in synthesized view distortion caused by the k-th depth block.

The synthetic view obtained by rendering the original texture video and original depth map is represented as:

V_{0} = Render (T, D)

(11)

where T stands for the original texture video and D stands for the original depth map, respectively,

R e n d e r (.)

denotes the rendering operation performed, and

V_{0}

is the synthetic view obtained through the rendering operation of T and D. This operation produces the best synthetic video, free from encoding distortion. For depth encoding in 3D-HEVC, the texture video must be reconstructed before encoding, represented as T. The nth depth block located at the center of the depth map is considered the current block, and the synthetic view constructed from the reconstructed texture video and depth map results in a poorer outcome. The depth map blocks before encoding the nth block (current block) are defined as we will define the depth map

{\hat{D}}_{n - 1}

before encoding the current block as:

{\hat{D}}_{n - 1} = \{{\hat{D}}_{1}, \dots, {\hat{D}}_{n - 1}, D_{n}, D_{n + 1}, \dots, D_{N}\}

(12)

where N represents the total number of blocks and

{\hat{D}}_{i}, i = 1, 2, \dots, n - 1

are the blocks reconstructed before the current block.

D_{n}

represents the original depth of the current block,

D_{j}, j = n + 1, \dots, N

represents the original blocks after the current block. The current block and subsequent blocks are not encoded, but filled with original depth intensity. The corresponding synthetic view

V_{n - 1}

can be obtained by rendering

\hat{T}, {\hat{D}}_{n - 1}

as:

V_{n - 1} = Render (\hat{T}, {\hat{D}}_{n - 1})

(13)

The sum of squared errors (SSE) between

V_{0}

and

V_{n - 1}

can be represented as:

D_{s, r e n d e r}^{n - 1} = {∥V_{n - 1} - V_{0}∥}_{2}^{2}

(14)

where

D_{s, r e n d e r}^{n - 1}

is the SVD produced by the texture distortion and encoding distortion of the first

n - 1

blocks of the depth map. For the nth block, given each candidate mode, the corresponding depth distortion

{\hat{D}}_{n}

is derived and filled into the nth block. Thus, the corresponding depth map can be represented as:

{\hat{D}}_{n} = \{{\hat{D}}_{1}, \dots, {\hat{D}}_{n - 1}, {\hat{D}}_{n}, D_{n + 1}, \dots, D_{N}\}

(15)

Thus, the synthetic view

V_{n}

can be obtained as follows:

V_{n} = Render (\hat{T}, {\hat{D}}_{n})

(16)

The sum of squared errors (SSE) between

V_{0}

and

V_{n}

is calculated as:

D_{s, render}^{n} = {∥V_{n} - V_{0}∥}_{2}^{2}

(17)

This signifies the SVD resulting from texture distortion, distortion arising from previously encoded depth blocks, and distortion from the nth depth block. Ultimately,

D_{s, s v d c}

is characterized as:

D_{s, s v d c} = D_{s, render}^{n} - D_{s, render}^{n - 1}

(18)

In 3D-HEVC, the RD cost for a particular encoding mode is determined by considering multiple cumulative factors, specifically represented as:

J = w_{d} + \frac{w_{s}}{K} \sum_{k = 1}^{K} D_{s, s v d c} (k) + λ R

(19)

Preliminary experiments were conducted, and the time distribution required for bitrate calculation, depth distortion, and

D_{s, s v d c}

was analyzed by encoding four sequences: Balloons, Kendo, Shark, and Poznan_Hall2 using four typical quantization parameters (QPs). Figure 8 shows the proportion of time consumed by the three different items during the RDO process. It is observed that, for these four test sequences, the calculation of the bitrate accounts for less than 1% of the total encoding time, the depth distortion process accounts for no more than 2% of the total encoding time, and

D_{s, s v d c}

is the most time-intensive process, accounting for 97.64% to 98.18% of the entire encoding duration. Therefore, if the calculation of the SVDC can be skipped while ensuring encoding performance during the RDO calculation process, it would significantly reduce the RDO calculation time and lower the encoding complexity.

3.2.2. RDO Process

In the encoding process, all candidate encoding modes must undergo the RDO process. First, the sum of squared errors (SSE) under the current prediction mode is calculated as the distortion amount during the RDO calculation process. Then, the calculation of the bits needed for encoding in the current prediction mode is performed, and ultimately, the encoding mode with the lowest RD cost is chosen as the optimal mode. This optimal mode is denoted as:

M^{o p t} = argminJ (M_{i})

(20)

where

M^{o p t}

represents the best encoding mode and

M_{i}

represents the ith encoding mode. The process starts by calculating the RD cost of the first candidate mode

M_{1}

and setting the minimum RD cost

J_{1}^{m i n}

to equal

J (M_{1})

. Subsequently, the RD cost for the second candidate mode

M_{2}

is assessed, and

J_{2}^{m i n}

is adjusted to the lesser value of either

J (M_{2})

or

J_{1}^{m i n}

. If

J (M_{2})

is less than

J_{1}^{m i n}

, then

M_{2}

becomes the new optimal mode; otherwise,

M_{1}

remains the optimal mode, and

J_{2}^{m i n}

is set to

J_{1}^{m i n}

. This process is repeated for each candidate mode, and

J_{i}^{m i n}

is updated based on the comparison between

J (M_{i})

and

J_{i - 1}^{m i n}

, ensuring that the mode with the smallest RD cost among all tested modes is selected. Finally, after evaluating all encoding modes, the global minimum RD cost

J^{m i n}

is determined. In the case of depth encoding, the RD cost calculation includes multiple items, and if the cumulative RD cost of a candidate mode

M_{i}

surpasses the minimum RD cost of the previous encoding mode

J_{i - 1}^{m i n}

, it indicates that

M_{i}

is unlikely to be the optimal mode. Hence, there is no need to continue calculating the RD cost for this mode since it is considered suboptimal, and an early termination process is conducted to avoid redundant calculations. This method simplifies the depth-encoding process by effectively identifying the optimal encoding mode without unnecessary calculations.

Based on this, we present a rapid rate–distortion-optimization (RDO) approach tailored for depth encoding, leveraging complexity cues to enhance decision-making throughout the encoding process. Our method introduces a swift depth RDO algorithm featuring an early termination module. It achieves encoding efficiency through meticulous bit-sensitive, depth-distortion-conscious, and SVDC-informed decision-making. Specifically, our complexity-driven RD cost framework for depth encoding is delineated as follows:

J = λ R + w_{d} D_{d} + \frac{w_{s}}{K} D_{s, s v d c} (1) + \frac{w_{s}}{K} D_{s, s v d c} (2) + \dots + \frac{w_{s}}{K} D_{s, s v d c} (K)

(21)

The cumulative rate–distortion (RD) cost for the ith encoding mode

M_{i}

is recorded as

J_{i}^{a c c u}

to track the intermediate results of Formula (21) in each calculation. Initially,

J_{i}^{a c c u}

is set to

λ R

. After obtaining the depth distortion,

J_{i}^{a c c u}

is updated to the sum of

λ R

and

w_{d}

D_{d}

. Next, we perform the SVDC process to obtain

D_{s, s v d c}

(1), and then add

\frac{w_{s}}{K} D_{s, s v d c}

(1) to

J_{i}^{a c c u}

. This process is repeated K times. When k reaches K, the RD cost calculation process ends, and we obtain

J (M_{i})

. During the RD cost calculation process in Formula (21), an early termination module is introduced whenever

J_{i}^{a c c u}

is updated. Specifically, our fast RDO method achieves more efficient computation by implementing early terminations through bit-aware decisions, depth-distortion-aware decisions, and SVDC-aware decisions.

The specific process flow is shown in Figure 9. For the candidate mode

M_{i}

, the depth RD-cost-calculation process is divided into three stages. A comparison module is introduced at each decision stage, represented by diamond blocks in Figure 9. First, bit-aware decisions are made; for a given encoding mode

M_{i}

, the value of

J_{i}^{a c c u}

is the product of encoding bits R and the Lagrange multiplier. We mark the minimum RD cost of the previous encoding mode (from

M_{i}

to

M_{i - 1}

) as

J_{i - 1}^{m i n}

. If

J_{i}^{a c c u} \geq J_{i - 1}^{m i n}

, the next two steps are skipped, and the current candidate mode

M_{i}

is immediately considered as the optimal mode. If this condition is not met, then depth-aware decisions are made, adding

w_{d}

D_{d}

to

J_{i}^{a c c u}

for a numerical update. Continue comparing

J_{i - 1}^{m i n}

with the updated

J_{i}^{a c c u}

. If

J_{i}^{a c c u} \geq J_{i - 1}^{m i n}

, then early termination is enacted, confirming the current mode as the optimal mode. If not met, continue to make SVDC-aware decisions, checking K predefined synthetic positions, calculating

D_{s, s v d c}

at each specific position, and accumulating it to

J_{i}^{a c c u}

. If, after any synthetic position calculation, the updated

J_{i}^{a c c u}

exceeds

J_{i - 1}^{m i n}

, immediate early termination is triggered, skipping the calculation of

D_{s, s v d c}

for other synthetic positions. This means that, once it is determined that the current synthetic position causes sufficient depth RD cost, we can stop the calculations early to avoid resource wastage and not compute

D_{s, s v d c}

for the remaining synthetic positions. For the comparison module, each time

J_{i}^{a c c u}

is updated, a comparison of

J_{i}^{a c c u}

with

J_{i - 1}^{m i n}

is made. If

J (M_{i})

consistently remains below

J_{i - 1}^{m i n}

, this indicates that the early termination condition will never be met. In this scenario, the full RD cost calculation process is executed to obtain

J (M_{i})

. Subsequently,

J_{i}^{m i n}

is updated to

J (M_{i})

, and

M_{i}

is identified as the optimal mode across all encoding modes. On the other hand, if

J_{i}^{a c c u} \geq J_{i - 1}^{m i n}

,

M_{i}

can be preliminarily determined as a non-optimal mode, and

J_{i}^{m i n}

is adjusted to

J_{i - 1}^{m i n}

. Thus, even without fully calculating the RD costs, an early judgment on the optimality of

M_{i}

can be made, optimizing computational efficiency.

4. Experimental Results

4.1. Analysis of Experimental Results

In order to precisely assess the efficacy of the proposed method, eight test sequences from Table 3 were used for experimental testing following the Common Test Conditions (CTCs) [14] mandated by JCT-3V, employing HTM-16.1 [30] as the reference software. The sequences were encoded with random configurations, and the comprehensive details of the video sequences are summarized in Table 3. The test platform was an 11th-Gen Intel(R) Core (TM) i5-1135G7@2.40 GHz, manufactured by Intel Corporation, Santa Clara, CA, USA. Equipped with 16 GB of RAM, and the “VSRS-1D-Fast” [31] was used for testing. The three-view scenarios are as follows, with QPs: (25, 34), (30, 39), (35, 42), and (40, 45). The Bjøntegaard Delta rate (BDBR) was employed to assess the compression performance of the proposed comprehensive approach, and the encoding complexity was evaluated using the TS metric, which represents the reduction rate in encoding time achieved by the proposed method. TS is defined as:

T S = \frac{T_{proposed} - T_{H T M}}{T_{H T M}} \times 100 %

(22)

where

T_{p r o p o s e d}

represents the time saved during encoding using the method proposed in this paper for depth maps and

T_{H T M}

represents the time saved during depth map encoding using the reference model HTM16.1.

The overall experimental results of the method are presented in Table 4; the results clearly indicate that, in contrast to HTM16.1, the proposed algorithm can achieve significant reductions in time required for 3D-HEVC depth map encoding, all while preserving strong performance across the test sequences. According to the experimental findings, the proposed algorithm decreases the encoding time by 49.82% to 55.97%, with an average reduction of 52.49% compared to HTM-16.1. Additionally, the BDBR exhibits an average increment of 0.98%. Therefore, the proposed method can optimize the RDO computation process and facilitate mode selection in inter-frame encoding of 3D-HEVC without unnecessary CU sizes, and the decline in coding efficiency is insignificant. Figure 10 illustrates the experimental outcomes (BDBR increment) of the comprehensive proposed algorithm for two typical sequences, “Kendo” and “Undo_Dancer”. The RD performance of the overall proposed algorithm is comparable to that of HTM16.1.

4.2. Comparison with Other Algorithms

To provide a comprehensive evaluation of the proposed method, it has been contrasted with the approaches presented by Bakkouri [19], Chen [23], and Zou [20]. The algorithm in Bakkouri [19] is related to CU partitioning; the one in Chen [23] pertains to mode decision; Zou’s [20] approach includes both CU partitioning and mode decision. Compared to the methods proposed by Bakkouri [19] and Chen [23], Zou’s [20] approach demonstrates better performance in reducing encoding complexity while achieving good coding performance. The main objective of the proposed method is to streamline the complexity associated with depth map encoding. The comparative results are comprehensively presented in Table 5, showing that the proposed method significantly reduces encoding complexity compared to related work. From Table 5, it is evident that all three comparative methods exhibit good coding performance and achieve encoding time savings to various extents. Specifically, Chen’s [23] method shows the smallest BDBR, while Bakkouri’s [19] method has the largest BDBR. Among the various methods, the proposed approach saves the most encoding time. Relative to the comparative methods, with the proposed method, further time savings of 1.29%, 18.3%, and 33.76% are achieved, respectively, with negligible BDBR loss. Compared to Bakkouri’s [19] method, the proposed method achieves an additional 18.3% encoding time savings. Relative to Chen’s [23] method, our proposed method shows a significant improvement in reducing encoding complexity. Chen’s [23] saves an average of 18.73% of time compared to HTM16.1, with a synthetic BD-BR increase of 0.17%. Zou’s [20] method, as presented, shows a decrease in encoding time by 51.20% compared to HTM-16.1, with a 1.07% increase in synthetic view BD-BR. Compared to the method proposed by Zou [20] and others, our proposed algorithm achieves nearly identical coding efficiency.

4.3. Discussion

The method proposed in this study aims to optimize the efficiency of 3D-HEVC encoding and reduce redundant computations. By extracting ALV features from each coding unit and constructing a GBM model to obtain the corresponding CU thresholds, combined with the RDO optimization algorithm, it achieves rapid optimization of the encoding process. The experimental results show that this method reduces complexity by 52.49% while maintaining video quality. However, this method has some potential limitations. First, the construction and training of the GBM model may require a large amount of data and computational resources, which could increase the algorithm’s complexity and cost. Second, the implementation of the fast RDO optimization algorithm may be constrained by hardware resources and real-time requirements, especially when processing large-scale video data. To address these issues, future work will focus on optimizing algorithm efficiency and hardware acceleration. By exploring more efficient methods for model construction and training, further reducing the algorithm’s computational complexity, and considering the use of dedicated hardware (such as GPUs) to accelerate the computation of the GBM model and the implementation of the fast RDO optimization algorithm, we can improve the algorithm’s real-time performance and efficiency. Additionally, further research into different machine learning algorithms and optimization techniques will be conducted to explore more effective methods for optimizing the 3D-HEVC-encoding process. Through continuous improvement of the algorithm and addressing potential limitations, we can better leverage this optimization algorithm to enhance the efficiency and quality of 3D-HEVC encoding, providing more possibilities for the future development of video encoding technology.

5. Conclusions

To effectively reduce encoding time, this paper introduces a rapid decision-making algorithm tailored to encoding 3D-HEVC video depth maps, comprising two distinct components: an adaptive CU-size-partitioning algorithm based on Gradient Boosting Machine (GBM) and a fast RDO-mode-decision algorithm. This method utilizes the GBM model to extract ALV features from each depth level for model training. It assesses if a current CU necessitates division through comparison with thresholds derived from the model and enhances the RDO calculation process to further minimize redundant RDO computations.The key contributions of this work are as follows: by leveraging GBM, our method adaptively partitions the CU sizes, significantly accelerating the decision-making process in depth map encoding; the proposed algorithm optimizes the RDO mode decision, further reducing the computational burden without compromising encoding quality; the use of ALV features enables effective model training and precise decision-making, contributing to the overall efficiency of the encoding process. The results of the simulation experiments demonstrate that the proposed method realizes a notable decrease in encoding complexity, reducing the average encoding time by 52.49%. Additionally, our algorithm exhibits a minimal rise in BDBR, making it a very minor consideration. Compared to other existing algorithms for depth map encoding, our method exhibits superior coding performance. The broader impact of this work lies in its potential applications in real-time video coding scenarios where fast encoding is crucial, such as live 3D video streaming, video conferencing, and augmented reality (AR) applications. By significantly reducing the complexity of depth map encoding in 3D-HEVC videos while maintaining encoding quality, our algorithm can facilitate more efficient and effective video coding solutions, paving the way for advancements in the field of video coding.

Author Contributions

Conceptualization, X.S. and Y.L.; methodology, X.S.; software, Y.L.; validation, X.S., Q.Z. and Y.L.; formal analysis, Y.L.; investigation, Y.L.; resources, Q.Z.; data curation, Y.L.; writing—original draft preparation, X.S.; writing—review and editing, X.S.; visualization, X.S.; supervision, Q.Z.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China No. 61771432 and No. 61302118, the Basic Research Projects of Education Department of Henan No. 21zx003, the Key projects Natural Science Foundation of Henan 232300421150, the Zhongyuan Science and Technology Innovation Leadership Program 244200510026, the Scientific and Technological Project of Henan Province 232102211014 and 232102211017, and the Postgraduate Education Reform and Quality Improvement Project of Henan Province YJS2023JC08.

Data Availability Statement

The data can be shared upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Müller, K.; Merkle, P.; Wiegand, T. 3-D video representation using depth maps. Proc. IEEE 2010, 99, 643–656. [Google Scholar] [CrossRef]
Chen, Y.; Vetro, A. Next-generation 3D formats with depth map support. IEEE Multimed. 2014, 21, 90–94. [Google Scholar] [CrossRef]
Tech, G.; Chen, Y.; Müller, K.; Ohm, J.R.; Vetro, A.; Wang, Y.K. Overview of the multiview and 3D extensions of high efficiency video coding. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 35–49. [Google Scholar] [CrossRef]
Boyce, J.M.; Doré, R.; Dziembowski, A.; Fleureau, J.; Jung, J.; Kroon, B.; Salahieh, B.; Vadakital, V.K.M.; Yu, L. MPEG Immersive Video Coding Standard. Proc. IEEE 2021, 109, 1521–1536. [Google Scholar] [CrossRef]
Paul, M. Efficient Multiview Video Coding Using 3-D Coding and Saliency-Based Bit Allocation. IEEE Trans. Broadcast. 2018, 64, 235–246. [Google Scholar] [CrossRef]
Zhu, C.; Li, S.; Zheng, J.; Gao, Y.; Yu, L. Texture-Aware Depth Prediction in 3D Video Coding. IEEE Trans. Broadcast. 2016, 62, 482–486. [Google Scholar] [CrossRef]
Alatan, A.A.; Yemez, Y.; Gudukbay, U.; Zabulis, X.; Muller, K.; Erdem, C.E.; Weigel, C.; Smolic, A. Scene Representation Technologies for 3DTV—A Survey. IEEE Trans. Circuits Syst. Video Technol. 2007, 17, 1587–1605. [Google Scholar] [CrossRef]
Fehn, C. Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In Proceedings of the Stereoscopic Displays and Virtual Reality Systems XI, San Jose, CA, USA, 18–22 January 2004; Volume 5291, pp. 93–104. [Google Scholar]
Zhang, Q.; Li, N.; Huang, L.; Gan, Y. Effective early termination algorithm for depth map intra coding in 3D-HEVC. Electron. Lett. 2014, 50, 994–996. [Google Scholar] [CrossRef]
Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
Liu, H.; Chen, Y. Generic segment-wise DC for 3D-HEVC depth intra coding. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 3219–3222. [Google Scholar] [CrossRef]
Bakkouri, S.; Elyousfi, A. An adaptive CU size decision algorithm based on gradient boosting machines for 3D-HEVC inter-coding. Multimed. Tools Appl. 2023, 82, 32539–32557. [Google Scholar] [CrossRef]
Li, Y.; Yang, G.; Zhu, Y.; Ding, X.; Sun, X. Adaptive Inter CU Depth Decision for HEVC Using Optimal Selection Model and Encoding Parameters. IEEE Trans. Broadcast. 2017, 63, 535–546. [Google Scholar] [CrossRef]
Lei, J.; Duan, J.; Wu, F.; Ling, N.; Hou, C. Fast Mode Decision Based on Grayscale Similarity and Inter-View Correlation for Depth Map Coding in 3D-HEVC. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 706–718. [Google Scholar] [CrossRef]
Huo, J.; Zhou, X.; Yuan, H.; Wan, S.; Yang, F. Fast Rate-Distortion Optimization for Depth Maps in 3-D Video Coding. IEEE Trans. Broadcast. 2023, 69, 21–32. [Google Scholar] [CrossRef]
Oh, B.T.; Oh, K.J. View Synthesis Distortion Estimation for AVC- and HEVC-Compatible 3-D Video Coding. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 1006–1015. [Google Scholar] [CrossRef]
Saldanha, M.; Sanchez, G.; Marcon, C.; Agostini, L. Fast 3D-HEVC Depth Map Encoding Using Machine Learning. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 850–861. [Google Scholar] [CrossRef]
Bakkouri, S.; Elyousfi, A. Early Termination of CU Partition Based on Boosting Neural Network for 3D-HEVC Inter-Coding. IEEE Access 2022, 10, 13870–13883. [Google Scholar] [CrossRef]
Bakkouri, S.; Elyousfi, A.; Hamout, H. Fast CU size and mode decision algorithm for 3D-HEVC intercoding. Multimed. Tools Appl. 2020, 79, 6987–7004. [Google Scholar] [CrossRef]
Zou, D.; Dai, P.; Zhang, Q. Fast Depth Map Coding Based on Bayesian Decision Theorem for 3D-HEVC. IEEE Access 2022, 10, 51120–51127. [Google Scholar] [CrossRef]
Li, Y.; Yang, G.; Qu, A.; Zhu, Y. Tunable early CU size decision for depth map intra coding in 3D-HEVC using unsupervised learning. Digit. Signal Process. 2022, 123, 103448. [Google Scholar] [CrossRef]
Wang, X. Application of 3D-HEVC fast coding by Internet of Things data in intelligent decision. J. Supercomput. 2022, 78, 7489–7508. [Google Scholar] [CrossRef]
Chen, J.; Wang, B.; Liao, J.; Cai, C. Fast 3D-HEVC inter mode decision algorithm based on the texture correlation of viewpoints. Multimed. Tools Appl. 2019, 78, 29291–29305. [Google Scholar] [CrossRef]
Shen, L.; Li, K.; Feng, G.; An, P.; Liu, Z. Efficient Intra Mode Selection for Depth-Map Coding Utilizing Spatiotemporal, Inter-Component and Inter-View Correlations in 3D-HEVC. IEEE Trans. Image Process. 2018, 27, 4195–4206. [Google Scholar] [CrossRef] [PubMed]
Song, W.; Dai, P.; Zhang, Q. Content-adaptive mode decision for low complexity 3D-HEVC. Multimed. Tools Appl. 2023, 82, 26435–26450. [Google Scholar] [CrossRef]
Zhang, Z.; Yu, L.; Qian, J.; Wang, H. Learning-Based Fast Depth Inter Coding for 3D-HEVC via XGBoost. In Proceedings of the 2022 Data Compression Conference (DCC), Snowbird, UT, USA, 22–25 March 2022; pp. 43–52. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgo, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Ali, Z.A.; Abduljabbar, Z.H.; Taher, H.A.; Sallow, A.B.; Almufti, S.M. Exploring the power of eXtreme gradient boosting algorithm in machine learning: A review. Acad. J. Nawroz Univ. 2023, 12, 320–334. [Google Scholar]
Lu, H.; Karimireddy, S.P.; Ponomareva, N.; Mirrokni, V. Accelerating gradient boosting machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; pp. 516–526. [Google Scholar]
Hamout, H.; Elyousfi, A. Fast 3D-HEVC PU size decision algorithm for depth map intra-video coding. J. Real-Time Image Process. 2020, 17, 1285–1299. [Google Scholar] [CrossRef]
Tung, L.V.; Le Dinh, M.; HoangVan, X.; Dinh, T.D.; Vu, T.H.; Le, H.T. View synthesis method for 3D video coding based on temporal and inter view correlation. IET Image Processing 2018, 12, 2111–2118. [Google Scholar] [CrossRef]

Figure 1. Example of quadtree division from CTU to CU.

Figure 2. Predictive mode diagram of 3D-HEVC.

Figure 3. Analysis of encoding unit complexity.

Figure 4. Process of local variance calculation.

Figure 5. Cumulative Distribution of CU sizes in depth maps for ALV.

Figure 6. Depicting the method flow based on the GBM model.

Figure 7. Threshold curves for CU sizes in 3D-HEVC and their modeling functions.

Figure 8. Proportion of time consumption for bitrate prediction, depth distortion calculation, and SVD calculation in different sequences.

Figure 9. Flowchart of fast RDO algorithm.

Figure 10. Experimental results of RD trajectory.

Table 1. CU size allocation of depth maps.

Sequence	QP	Depth 0 (%)	Depth 1 (%)	Depth 2 (%)	Depth 3 (%)
Kendo	34	87.83	8.94	2.34	0.89
	39	94.06	5.14	0.78	0.02
	42	96.97	2.73	0.29	0.01
	45	98.89	0.98	0.13	0.00
Balloons	34	84.98	10.60	4.19	0.23
	39	96.06	2.94	0.93	0.07
	42	98.17	1.39	0.42	0.02
	45	98.82	1.06	0.11	0.01
Shark	34	82.58	11.13	5.74	0.55
	39	92.98	5.77	1.04	0.21
	42	97.27	2.36	0.32	0.05
	45	99.13	0.73	0.13	0.01
Poznan_Hall2	34	96.47	2.48	1.05	0.00
	39	98.96	0.92	0.12	0.00
	42	99.72	0.25	0.03	0.00
	45	99.93	0.05	0.02	0.00
Average		95.18	3.59	1.10	0.13

Table 2. Depth map thresholds.

QP	Thresholds
QP	${TH}_{0}$	${TH}_{1}$	${TH}_{2}$
34	68.0744	37.7616	5.1848
39	90.7334	54.3951	18.8968
42	138.7688	100.8912	49.8472
45	260.075	198.7125	97.84

Table 3. Sequence details.

Sequence	Frames	Resolution
Balloons	300	1024 × 768
Newspaper	300	1024 × 768
Kendo	300	1024 × 768
GT_Fly	250	1920 × 1088
Shark	300	1920 × 1088
Poznan_Hall2	300	1920 × 1088
Poznan_Street	250	1920 × 1088
Undo_Dancer	250	1920 × 1088

Table 4. The overall experimental results of the proposed algorithm.

Sequence	BDBR (%)	BD-PSNR (db)	TS (%)
Sequence	BDBR (%)	BD-PSNR (db)	GBM	RDO	Overall
Kendo	0.68	−0.01	48.83	29.84	52.93
Balloons	0.83	−0.02	44.57	33.75	49.82
Newspaper	0.89	−0.03	47.68	34.58	53.38
GT_Fly	0.96	−0.02	46.93	33.52	51.85
Poznan_Hall2	1.27	−0.01	45.06	29.64	52.37
Poznan_Street	1.08	−0.01	46.71	28.43	49.85
Undo_Dancer	0.75	−0.02	49.63	30.96	53.74
Shark	1.36	−0.02	50.25	31.56	55.97
Average	0.98	−0.02	47.46	31.54	52.49

Table 5. Comparison of overall algorithm results with relevant works in 3D-HEVC.

Sequence	Bakkouri [19]		Chen [23]		Zou [20]		Proposed
Sequence	BDBR (%)	TS (%)	BDBR (%)	TS (%)	BDBR (%)	TS (%)	BDBR (%)	TS (%)
Balloons	0.9	36.20	0.49	18.50	1.02	51.7	0.83	49.82
Kendo	1.10	34.80	0.18	19.40	1.09	52.2	0.68	52.93
Newspaper	0.55	31.80	0.04	14.10	1.21	49.8	0.89	53.38
GT_Fly	0.56	36.50	−0.30	23.30	0.93	52.8	0.96	51.85
Poznan_Hall2	0.50	36.30	0.46	26.30	0.65	57.1	1.27	52.37
Poznan_Street	0.60	30.00	0.20	15.80	0.85	53.4	1.08	49.85
Undo_Dancer	0.88	33.50	0.01	14.15	1.36	45.9	0.75	53.74
Shark	1.18	34.40	0.28	18.32	1.47	46.5	1.36	55.97
Average	0.78	34.19	0.17	18.73	1.07	51.20	0.98	52.49

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Su, X.; Liu, Y.; Zhang, Q. Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine. Electronics 2024, 13, 2586. https://doi.org/10.3390/electronics13132586

AMA Style

Su X, Liu Y, Zhang Q. Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine. Electronics. 2024; 13(13):2586. https://doi.org/10.3390/electronics13132586

Chicago/Turabian Style

Su, Xiaoke, Yaqiong Liu, and Qiuwen Zhang. 2024. "Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine" Electronics 13, no. 13: 2586. https://doi.org/10.3390/electronics13132586

APA Style

Su, X., Liu, Y., & Zhang, Q. (2024). Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine. Electronics, 13(13), 2586. https://doi.org/10.3390/electronics13132586

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine

Abstract

1. Introduction

2. Related Work

3. Proposed Algorithm

3.1. GBM-Based Adaptive CU Partition

3.1.1. Observations and Analysis

3.1.2. Gradient Boosting Machines Algorithm

3.1.3. Feature Selection and Adaptive CU Size Decision Algorithm

3.2. Fast Rate–Distortion-Optimization Algorithm

3.2.1. Observations and Analysis

3.2.2. RDO Process

4. Experimental Results

4.1. Analysis of Experimental Results

4.2. Comparison with Other Algorithms

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI