Optimization of Non-Occupied Pixels in Point Cloud Video Based on V-PCC and Joint Control of Bitrate for Geometric–Attribute Graph Coding

Wang, Fengqin; Jia, Juanjuan; Zhang, Qiuwen

doi:10.3390/electronics14163287

Open AccessArticle

Optimization of Non-Occupied Pixels in Point Cloud Video Based on V-PCC and Joint Control of Bitrate for Geometric–Attribute Graph Coding

by

Fengqin Wang

,

Juanjuan Jia

and

Qiuwen Zhang

^*

School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(16), 3287; https://doi.org/10.3390/electronics14163287

Submission received: 20 July 2025 / Revised: 14 August 2025 / Accepted: 15 August 2025 / Published: 19 August 2025

Download

Browse Figures

Versions Notes

Abstract

As an important representation form of three-dimensional scenes, the point cloud contains rich geometry and attribute information. The video-based point cloud compression standard (V-PCC) divides and projects three-dimensional data directionally onto a two-dimensional plane. The generated geometric and attribute graphs contain occupied pixels obtained by projection and unoccupied pixels used for smooth filling. Among them, the non-occupied pixels have no practical effect on the reconstructed point cloud. However, in the process of encoding bitrate allocation, V-PCC still uses the original bitrate control method, resulting in insufficient bitrate utilization efficiency. To this end, this paper proposes a method for optimizing the unoccupied pixels of point cloud videos based on V-PCC and jointly controlling the coding rate of geometries and attribute graphs. For geometric graphs, this paper improves the allocation of bitrate weights based on whether the encoded blocks contain non-occupied pixels and the proportion of occupied pixels, and stops allocating bitrates to encoded blocks that are all non-occupied pixels. For the attribute graph, the input pixel improvement algorithm is designed by using the occupation map, and the invalid unoccupied pixel information is cavitation. Experiments show that under the All Intra configuration, compared with the original scheme, this method reduces the Geom.BD-GeomRate by an average of 15.67% and 16.68%, respectively, in the point-to-point D1 and point-to-face D2 metrics. The end-to-end BD-AttrRate is reduced by an average of 4.38%, 0.68%, and 1.74%, respectively. Overall, the average savings are 29.88%, 31.50%, 5.50%, 2.66%, and 3.34%, respectively, achieving bitrate optimization and effectively controlling encoding loss.

Keywords:

point cloud video; V-PCC; geometric graph; attribute graph; bitrate control; coding optimization

1. Introduction

With the rapid advancement of technology, various new technologies are emerging one after another, which bring people’s lives a richer and more convenient experience. The effective application of virtual reality, augmented reality, and other technologies in many fields [1] has not only realized the pursuit of realistic immersive visual effects by the majority of users but also put forward higher requirements for the efficient expression of 3D scenes and videos. Point clouds contain a large amount of geometric and attribute information of three-dimensional spatial points and are an important form of representation for presenting three-dimensional scenes. They can accurately locate the spatial position of three-dimensional data points and obtain their related physical feature information. Among them, dynamic point clouds, as a series of time-continuous point clouds, can reflect the changes of objects in movement and time. However, point clouds carrying a large amount of information often contain a huge number of data points. Their use requires a large amount of storage space and a relatively high transmission bandwidth [2]. Therefore, effective point cloud compression technology will play a more favorable role in the further application of point clouds.

Based on actual demands and research situations, the Moving Image Expert Group has determined two standards for point clouds, namely geometry-based point cloud compression (G-PCC) and video-based point cloud compression (V-PCC) [3,4]. Among them, G-PCC has a better processing effect on sparse point clouds, while V-PCC is more effective for point clouds in a moving state. In the face of the huge data volume and numerous storage representation information of point clouds, how to rationally utilize the limited bandwidth while ensuring the quality of video transmission is an important step in the video coding process and a crucial link for the role of bitrate control technology in video coding. If the bandwidth of the application scenario is limited, then when the bitrate of the transmitted video is too high, the video playback will experience lag or fewer frames, but if the bitrate is too low, then it will waste effective resources. Therefore, under the constraint of limited bandwidth, it is particularly important to achieve the effective utilization of transmission bitrate and ensure video quality by rationally allocating encoding bits and effectively controlling video distortion.

In the V-PCC method, the point cloud data in three-dimensional space is transformed into two-dimensional space by projection, and then the two-dimensional video coding method is used to process the generated geometric and attribute graphs. However, traditional two-dimensional video coding methods, such as advanced video coding H.264/AVC [5], high-efficiency video coding H.265/HEVC [6], and versatile video coding H.266/VVC [7], mainly target natural video frames in their bitrate control methods during the coding process. The characteristics of the geometric graph and attribute graph generated after the projection of point cloud video have not been fully considered. Compared with natural videos, geometric graphs and attribute graphs have a large number of unoccupied pixels filled and generated for smoothing during encoding. However, these unoccupied pixels do not have a substantial impact on point cloud reconstruction. In these traditional encoding methods, the bitrate control method uses the bitrate–Lagrange model at the coding tree unit level (CTU). That is, the R-λ model is used to represent the relationship between bitrate and distortion. Unoccupied pixels will be processed in the same way as other useful pixels. However, for the encoding blocks of point cloud videos, when there are encoding blocks containing unoccupied pixels, the distortion of the occupied pixels does not represent the distortion of the current encoding block. The distortion of the current encoding block is used for bitrate allocation, and bitrates are also assigned to the existing unoccupied pixels, which will reduce the encoding efficiency. For this purpose, this paper will consider this feature and utilize the HEVC encoding method under the V-PCC framework. By processing the non-occupied pixels in the geometric graph and the attribute graph, respectively, for bitrate control, the optimization of point cloud video encoding is achieved. The main contributions and innovative contents are as follows:

For the geometric graph, this paper considers whether the encoded block contains unoccupied pixels and the proportion of occupied pixels. By obtaining the relationship between the bitrate and distortion of the geometric graph and the geometric distortion of the reconstructed point cloud, an improved allocation algorithm for the bitrate weight of the geometric encoding unit is designed. Moreover, no bitrate is allocated to the encoded blocks that are all unoccupied pixels, thereby saving encoding bits.
For the attribute graph, the occupancy graph is used to obtain the block-based occupancy information representation. By generating the occupancy information sign OIS-N to void the invalid non-occupied pixel information of the attribute graph, an improved algorithm for attribute encoding input pixels is designed. Control is carried out from the encoding input end to avoid the waste of bitrate resources by non-occupied pixels and further save the bitrate.

The experimental results show that, under the All Intra configuration, compared with the original scheme, the method designed in this paper reduces the Geom.BD-GeomRate by an average of 15.67% and 16.68%, respectively, in the point-to-point D1 and point-to-face D2 metrics when processing different dynamic point clouds, and the end-to-end BD-AttrRate was reduced by an average of 4.38%, 0.68%, and 1.74%, respectively. The overall average savings were 29.88%, 31.50%, 5.50%, 2.66%, and 3.34%, respectively. Compared with other methods, the goal of optimizing the coding bitrate was achieved, and the coding loss could be better controlled at the same time.

The remaining part of this article is arranged as follows: Section 2 introduces the relevant work and reviews the progress of coding optimization in the field of point cloud video coding. Section 3 analyzes the existing problems, introduces the methods proposed in this paper, and briefly describes the design principles and ideas. Section 4 realizes experimental design through environmental configuration and organizes and analyzes the experimental results. Section 5 summarizes the methods proposed in this article and their implementation effects, and puts forward the directions that can be continuously pursued in the future.

2. Related Work

2.1. V-PCC Coding Method

V-PCC, as an effective coding method for processing dense and dynamic point clouds [8], uses the method of decomposing into patches to project the dynamic point cloud in three-dimensional space onto a two-dimensional plane, generating geometric graphs containing spatial position information and attribute graphs containing information such as color. At the same time, an occupancy map will be generated to indicate the pixels produced after the original patch is projected onto a two-dimensional image. Then, the generated occupancy map, geometric map, and attribute map will be processed by using the now mature two-dimensional video encoder, such as the HEVC method to be used in the method of this paper. Its coding framework is shown in Figure 1.

The main processes include generating projection patches, obtaining the occupancy map, geometric graph, attribute graph and their encoding and compression, and assisting in patch information processing and compression [9]. During this process, points in the original point cloud are clustered, segmented into three-dimensional patches, and then projected onto a two-dimensional plane. The three-dimensional position information of the projected patches generates geometric graphs, while the attribute information including color is correspondingly generated as attribute graphs. At the same time, occupation maps are generated to represent the occupied pixels obtained when the original patches form geometric and attribute graphs, and these pixels will be represented as “1” for occupation. In order to fill the blank areas in the geometric and attribute graphs so that the two-dimensional video encoder can perform more effective encoding, it is also necessary to smooth these areas through filling to reduce the high-frequency regions in the image. Among them, the encoded geometric graph still needs to be reconstructed to guide the generation of the attribute graph. Figure 2 shows the generated geometric graph and attribute graph. All the generated information will then be compressed, and the compressed bitstream information will be transmitted. At the decoding end, the geometric graph, attribute graph, and occupation map will be reconstructed through decompression, and after smoothing processing, the transmitted reconstructed point cloud will be obtained. For the occupied pixels generated after patch projection, the quality of their encoding will directly affect the reconstruction quality of the point cloud. The filled pixels in the blank area are mainly generated by the edge patch pixels, aiming to make the transition area smooth. However, they have no practical effect in the process of reconstructing the point cloud, so these filled unoccupied pixels will be discarded.

2.2. Bitrate Control Technology

After video is encoded, it needs to be transmitted. However, due to the limited actual transmission bandwidth, bitrate control technology becomes particularly important [10]. This is mainly reflected in the encoding process. Through the dynamic adjustment of encoding parameters, the situation between the actual bitrate used and the planned target bitrate is regulated, so that the video quality can remain stable while meeting the limited bandwidth to the greatest extent and avoiding the waste of resources. The application of bitrate control technology can effectively balance the utilization of bandwidth resources and the guarantee of video quality. In this process, it usually involves two aspects: one is bitrate allocation, and the other is bitrate control. At the encoding and decoding end of video coding, there are buffers to smooth the output bitstream. The bitrate allocation process is usually carried out in a hierarchical and sequential manner, that is, the target allocation bit will be gradually allocated with appropriate different bitrates for each group of pictures (GOP), each frame, and each coding tree unit (CTU) based on the buffer, the complexity of the transmitted video, the required bitrate situation, etc. Bitrate control, by adjusting the parameters in the encoding process in real time, ensures that the actual bitrate used in the encoding meets the allocated target bitrate to the greatest extent. Ultimately, under the condition of a limited total bitrate, the coding quality is guaranteed as much as possible and distortion is reduced by reasonably allocating the bitrate. Figure 3 shows the structure diagram of the code rate control principle.

2.3. Research on Coding Control

Bitrate control sets different parameters for different encoding regions to allocate bits. The main purpose is to ensure that the encoded bitstream meets the bandwidth and storage requirements during transmission and to guarantee the best encoding quality as much as possible. Therefore, bitrate control is also an important content that has attracted much attention in video coding research.

Li B et al. [11] found that as video coding schemes become more flexible, it becomes difficult for the existing bitrate control algorithms based on the R-Q (bitrate—quantization) model to design the quantization step size Q and establish a more accurate model relationship. Their research found that a more stable model could be established between the bitrate R and the Lagrange multiplier λ. Therefore, a rate control algorithm based on the R-λ model was proposed, and the target bitrate was achieved more accurately in the HEVC reference software, obtaining significant rate distortion (R-D) performance gain. Most traditional coding methods also take the R-λ model as the fundamental method for their bitrate control. Chen Z et al. [12] considered estimating the distribution of R-D model parameters, formulated the CTU-level rate allocation problem of HEVC as a decision problem and proposed a two-stage bisection method to solve the optimization problem. Li S et al. [13] proposed a new R-D estimation method for updating the relationship among bitrate, distortion, and Lagrange multiplier λ. Based on this, they designed the optimal bit allocation formula and iteratively solved it using the recursive Taylor expansion (RTE) method to achieve the optimal bit allocation and bit reallocation. Moreover, the rapid development of machine learning and deep learning has also provided more optimized solutions for bitrate control. Zhou, M et al. [14] utilized agents for sequence feature input and constructed a deep reinforcement learning model through interactive training with the coding environment to complete an effective bitrate control strategy for dynamic scenes. Marzuki I et al. [15] classified the complexity of different regions based on the extracted deep convolutional features and adjusted the encoding parameters through the formulated feature-driven strategy, achieving real-time dynamic adjustment of bitrate control. Chen S et al. [16] utilized the learned video data to intelligently seek the balance between bitrate and distortion. Through precise changes in encoding parameters during the decision-making process, the bitrate allocation was made more reasonable, and the encoding performance was improved. In addition, some scholars have focused on how to improve the bitrate control under the subjective quality of videos. As Zhou M et al. [17] first proposed to use the approximate value of the average pixel-level JND (Just detectable distortion) weight as the JND factor of the Coding Unit (CU), and conduct reasonable R-D modeling based on the JND factor, and the global optimal solution of the encoding parameters was obtained by using the KKT (Karush–Kuhn–Tucker) condition. Liu Z et al. [18] constructed a game model by dividing the video into ROI (such as the area attracting attention similar to a human face) and non-ROI, and expressed the bit allocation problem as a utility optimization problem. By solving this optimization utility, a reasonable allocation of encoded bits was achieved between ROI and non-ROI.

However, unlike traditional videos, the encoding structure of point clouds involves geometric videos and attribute videos, and their content is non-uniform. This special situation needs to be taken into account in bitrate control. Liu Q et al. [19] simultaneously considered the bit allocation of geometric and attribute information and proposed a model-based joint bit allocation method to optimize the overall compression performance and maintain the perceptual quality. Shen F et al. [20] proposed an optimization formula based on the quality dependency existing in projected videos to allocate target bits between geometric videos and color videos, and designed a two-pass method to improve the accuracy of bitrate control in HEVC. Li L et al. [21] proposed an algorithm for optimizing bit allocation between geometric videos and attribute videos. Zero bits were allocated to encoding units (BU) that only contain non-occupied pixels, and an auxiliary information was designed to find the corresponding BU in the previous frame, applying its model parameters to the model update scheme of the current BU. Chen C et al. [22] combined the geometric curvature information of the point cloud with the structural Similarity Index Measure (SSIM) framework to predict the distortion sensitivity of the local area. They allocated more bitrates for fine quantization in the highly sensitive area and reduced the allocation for coarse quantization processing in the low-sensitive area. They dynamically adjusted the QP through the preset bitrate constraints to give priority to ensuring the quality accuracy of the highly perceptive area. Wang T et al. [23] solved the optimal bitrate allocation through convex optimization algorithms, adopted model predictive control, and introduced distortion weight factors to conduct multi-frame-level dynamic bitrate allocation. Rolling optimization was used to ensure stable inter-frame quality. Zhang J et al. [24] considered the spatial non-uniformity of point cloud content. By analyzing the characteristics of geometric content (such as local density, curvature, edge complexity, etc.), they dynamically adjusted the bitrate allocation in different regions, achieving the priority guarantee of the reconstruction quality of key content when the overall bitrate is limited. Cai Z et al. [25] proposed a frame-level bit allocation method. By analyzing the distortion propagation model within the image group, the distortion propagation factor was introduced, the skip ratio of the minimum coding unit was used to predict, and the occupancy information was combined to correct, which effectively solved the problem of rate waste in local areas. Liu W et al. [26], through crack suppression preprocessing and adaptive quantitative parameter adjustment, allocated as much code rate as possible to the crack sensitive area, reduced invalid code rate allocation, and achieved precise allocation through perceptual optimization to improve subjective quality. These methods have played a good role in reducing bitrate redundancy and optimizing bitrate allocation, providing a research direction for improving bitrate control and enhancing encoding efficiency. However, there are relatively few studies on processing non-occupied pixels to reduce the impact of invalid information. This article will focus on this aspect for consideration.

3. The Proposed Method

3.1. Optimization of Geometric Graph Encoding Based on Occupied Pixels

3.1.1. Principle of Bitrate Control

When video is encoded and transmitted, it is subject to the limitation of transmission bandwidth. To ensure the quality of video transmission, bitrate control must be carried out within a certain channel bandwidth. Bitrate control mainly achieves this by selecting a series of appropriate encoding parameters during the encoding process, ensuring that the bitstream formed after encoding can meet the limited bitrate limit and minimize encoding distortion. For encoded images, bit allocation is carried out based on the content of the image. If the texture of the encoding unit is complex, larger bits will be allocated. If the content is flat, smaller bits will be allocated. In short, under the condition of a limited total number of bits, high-quality video transmission can be achieved through reasonable allocation. Therefore, the objective of bitrate control is: on the one hand, to reasonably allocate corresponding different bitrates for different coding units, so that the final total bitrate can meet the expected limited bitrate. On the other hand, it is necessary to ensure the encoding quality as much as possible and reduce image distortion.

3.1.2. Optimization Ideas for Geometric Graph Coding

At present, a commonly used bitrate control algorithm for video encoders is the R-λ algorithm [27], and this method is still adopted when performing geometric graph bitrate control in V-PCC. However, compared with the bitrate allocation method of natural videos, the geometric graph contains occupied blocks and unoccupied blocks. For encoding CU, there exist occupied blocks that simultaneously contain occupied pixels and unoccupied pixels. When encoding such occupied blocks that contain unoccupied pixels, in fact, the prediction distortion of CTU will consist of two parts, namely the distortion of occupied pixels and the distortion of unoccupied pixels. However, the occupied pixels are obtained from the point projection in the point cloud, while the non-occupied pixels are obtained by filling the occupied pixels at the patch boundary, mainly for smoothing. Therefore, in fact, what ultimately affects the quality of the reconstructed point cloud is the distortion of the occupied pixels. Figure 4 shows a frame in a sequence of Loot, for the CTU to be encoded, it contains the occupied pixels and the corresponding bitrate allocation effect diagram. It can be seen that there should be a correlation between the encoding allocation in CTU and the occupied pixels contained in the encoding block. Moreover, in the process of applying the R-λ algorithm, encoding bit allocation is still carried out for non-occupied pixels, which actually reduces the encoding efficiency.

Bitrate control is to allocate the appropriate encoding bits to the encoding blocks, so that the encoding parameter λ can be obtained from the obtained bits. Then, the quantization parameter QP can be calculated according to the formula. Therefore, in view of the above content, considering the encoding optimization of the geometric graph of point cloud video based on occupied pixels, the main idea is to perform R-λ optimization based on occupied pixels for occupied blocks, while no bitrate allocation is performed for non-occupied blocks, thereby achieving the purpose of optimizing the encoding bitrate.

3.1.3. R-λ Optimization Based on Occupied Pixels

In the original coding rate control method, for the CTU level, the prediction error is used to achieve the allocation of the rate, and its calculation method is:

T_{C T U} = \frac{T_{C u r r p i c} - {B i t}_{h}_{h} - {C o d e d}_{p i c}}{\sum_{A l l N o t C o d e d C T U} ω_{C T U}} \times ω_{C u r r C T U},

(1)

Among them,

T_{C u r r p i c}

is the bit allocation at the current frame level,

{B i t}_{h}

is the bit that has been used to encode the header information,

{C o d e d}_{p i c}

is the bit that has been used in the current frame, and the current CTU allocation weight is

ω_{C u r r C T U}

. The calculation method is as follows:

{M A D}_{C T U} = \frac{1}{N} \sum_{i} | {p r e d}_{i} - {o r g}_{i} |,

(2)

ω_{C u r r C T U} = {M A D}_{C u r r C T U}^{2},

(3)

{p r e d}_{i}

represents the pixel value of the pre-encoded image, and

{o r g}_{i}

is the original pixel value. When performing bitrate allocation, each encoded CU allocates bits based on the value of its own weight in proportion to the total weight of all unencoded CUs. As the prediction error increases, the allocated encoded bits also increase accordingly. The allocation weight coefficient can be expressed by

ω

as:

ω = \frac{ω_{C u r r C T U}}{\sum_{A l l N o t C o d e d C T U} ω_{C T U}},

(4)

For I-frames, in traditional methods, the bit allocation weights are calculated based on the SATD of CTU. However, for geometric graphs, this bitrate allocation method can only represent the overall distortion of the encoded block, but cannot highlight the distortion of the occupied pixels within it. Therefore, when optimizing the encoding of the geometric graph, it is considered to use the occupied pixel distortion to measure the weight of the occupied block bit allocation, thereby enhancing the calculation of the encoded block distortion. The method is as follows:

ω_{O C U} = D + δ D_{o},

(5)

Here, D represents the distortion of the current encoded block in the prediction, and

D_{o}

represents the distortion of the occupied pixels within it. An influence factor

δ

is set between the two. The distortion of occupied pixels is expressed by multiplying the average distortion of occupied pixels per frame by the number of occupied pixels contained in the current frame, that is:

D_{o} = \frac{D_{p}}{N_{p}} \times N,

(6)

In the formula,

D_{P}

represents the error of predicting this frame,

N_{p}

is the number of occupied pixels it contains, and the number of occupied pixels of the current encoded block is denoted by N. The total distortion of all unencoded CU can be expressed as:

\sum_{A l l N o t C o d e d O C U} ω_{O C U} = \sum D + δ \sum D_{o},

(7)

Then, the calculation method of the improved allocation coefficient

ω

of the optimized bitrate weight is expressed as:

ω = \frac{ω_{C u r r C T U}}{\sum_{A l l N o t C o d e d C T U} ω_{C T U}} = \frac{ω_{O C U}}{\sum_{A l l N o t C o d e d O C U} ω_{O C U}} = \frac{D + δ D_{o}}{\sum D + δ \sum D_{o}},

(8)

δ

as an influencing factor of occupied pixel distortion is calculated in relation to the occupied distortion of the encoded block and the number of occupied pixels it contains. In order to obtain

δ

, when encoding the geometric graph, combined with the R-D relationship between its bitrate and the generated distortion as well as the geometric distortion of the point cloud, the R-D connection in these two cases is established by occupying pixels for solution.

In previous methods, the relationship between bitrate and distortion in video coding was modeled through hyperbolic functions [11], and its formula is expressed as:

D (R) = C R^{- K},

(9)

Parameters C and K are related to the content in the video.

λ

serves as the slope of the R-D curve. Through differentiation, the relationship between D and λ can be obtained as:

λ = - \frac{\partial D}{\partial R} = CK \times R^{(- K - 1)} = α \times R^{β},

(10)

Then, for the optimized method, according to Formula (5), the distortion:

D (R) = D + δ D_{o},

(11)

Available at this time on the derivative:

λ_{δ} = - \frac{\partial (D + δ D_{o})}{\partial R} = - (\frac{\partial D}{\partial R} + δ \frac{\partial D_{o}}{\partial R}) = CK \times R^{(- K - 1)} = α_{δ} \times R^{β_{δ}},

(12)

the type of

λ_{δ}

represent a new parameter of the optimization method.

For the geometric graph encoding of point cloud video, by combining the point-to-point geometric distortion D1 in the point cloud reconstruction process and the R-D relationship formed between the encoding distortion D of the geometric graph and the bitrate, the following formula holds:

λ_{1} = - \frac{\partial D 1}{\partial R} = C K \times R^{(- K - 1)} = α_{1} \times R^{β_{1}},

(13)

λ_{2} = - \frac{\partial D}{\partial R} = C K \times R^{(- K - 1)} = α_{2} \times R^{β_{2}},

(14)

Among them,

λ_{1}

,

α_{1}

,

β_{1}

are parameters of the relationship between the geometric distortion of the point cloud and the bitrate, and

λ_{2}

,

α_{2}

,

β_{2}

are parameters of the relationship between the encoding distortion of the geometric graph and the bitrate. Let

λ_{δ}

=

λ_{1}

,

α_{δ}

=

α_{1}

,

β_{δ}

=

β_{1}

. From the above analysis, it is known that there is a positive correlation between the prediction distortion of the encoded block in the geometric graph and the distortion of the occupied pixels of the encoded block. By combining Formulas (12)–(14), we can obtain:

λ_{1} = λ_{2} (1 + δ),

(15)

Also, because the bitrate distortion Formula (10) represented by the hyperbolic function can also be expressed as:

λ_{2} = α \times {b p p}^{β},

(16)

Here, bpp represents the average number of target bits per pixel, and the calculation method is:

b p p = \frac{R}{f \times w \times h},

(17)

In the formula, R represents the target bitrate, f is the frame rate, and w and h are the width and height of the video frame encoding blocks, respectively. Combining Formulas (13)–(16), it can be known that (1 +

δ

) is a power function of bpp and can be expressed as:

(1 + δ) = \frac{α_{1}}{α_{2}} \times {b p p}^{β_{1} - β_{2}},

(18)

Based on the above analysis, we can conclude that there is a mutually convertible relationship between the geometric distortion of point clouds and the encoding distortion of geometric graphs. Then, by testing the point cloud sequence, obtaining statistical data according to Formulas (13) and (14), and using the results to fit to satisfy Formula (18), the values of parameters

\frac{α_{1}}{α_{2}}

and

β_{1} - β_{2}

can be obtained from the fitting results, thereby determining

δ

. For non-occupied blocks, the method of no longer allocating bitrate is directly adopted for processing. At this time, the QP can be set to the maximum (51) to reduce the demand for bitrate.

At this point, the optimized bitrate weight improvement distribution coefficient

ω

can be obtained as:

ω_{O C U} = \{\begin{matrix} 0, D = 0 o r N = 0 \\ D + δ \times \frac{D_{p}}{N_{p}} \times N, Others \end{matrix},

(19)

A schematic diagram of the algorithm flow is shown in Figure 5. Based on the ideas of the above methods, the algorithm flow of this stage is sorted out as shown in Table 1.

3.2. Optimization of Attribute Graph Encoding Based on Occupied Pixels

3.2.1. Obtain the Occupancy Information Sign OIS-N

It can be known from the coding process of V-PCC that while obtaining the geometric graph and attribute graph in the two-dimensional plane, the occupancy map is also generated. The occupied pixels contained therein will be represented in the form of being marked as “1” in the occupancy map, and these pixels, as useful pixels, play an important role in the reconstruction of point cloud videos. However, the non-occupied pixels formed by filling are mainly for smoothing and facilitating processing by the encoder and have no actual effective function. The two-dimensional video coding method HEVC adopts the block division approach during coding. Based on the above content, it is considered to process the pixel information of the coding blocks using the occupancy map. By representing the occupancy information in a block-based form, the occupancy information sign OIS-N is generated to void the invalid non-occupied pixel information of the attribute graph. Thus, an improved algorithm for attribute encoding input pixels is designed to control from the encoding input end to avoid the waste of bitrate resources caused by unoccupied pixels and further save bitrate.

As shown in Figure 6, the initial occupation map is divided into N × N (for example, N is 64) blocks to form small N × N blocks. During the division process, if the block contains a “1” marker pixel symbolizing occupation, all the pixels of the division block in the same area of the occupation information sign OIS-N are assigned “1” to indicate occupation. For area blocks that do not contain any occupying pixels, set “0”, indicating that the pixels are empty. Using different N × N for division will result in different occupancy information signs OIS-N, where N can take values of 64, 32, 16, and 8, which can adapt to any encoding block specification existing during encoding. Moreover, in the case of large N values, the area block is determined as “0”, when the N value becomes smaller, this area will still be a hollow area, and as the area division becomes more refined, the occupation information sign OIS-N will also be more in line with the actual situation of the occupation map, but this method is only used to indicate the algorithm and there will be no other additional processing or consumption.

3.2.2. Attribute Graph Encoding Input Pixel Improvement Algorithm

In the V-PCC encoding method, after obtaining the geometric graph and attribute graph, a two-dimensional video encoder is used for processing. However, in order to reduce the high-frequency areas in the image and facilitate more effective encoding by the encoder, the blank areas in the geometric and attribute graphs need to be smoothed by filling. However, as described in the previous content, these filled unoccupied pixels have no substantial effective effect on the reconstruction of the point cloud, and they will occupy additional bitrates during the encoding process, increasing the complexity of the encoding. For this purpose, considering the attribute graph, the blocks containing occupied pixels in the encoding block are identified by using the occupied information sign OIS-N. On this basis, the blocks containing only non-occupied pixels in the attribute graph input by the encoder are hollowed out to reduce the invalid information input at the early encoding input end, thereby reducing the occupation of encoding bitrate resources. As shown in Figure 7, after filling the attribute graph, the occupancy information sign OIS-N is used to perform block multiplication operations with the filled encoded frame to obtain a new empty processing image frame, which can then be used as a new input attribute graph for encoding processing.

During the process of hollowing processing, among the occupied pixel blocks marked as “1” for the occupied information sign OIS-N, there are some blocks that actually contain filled non-occupied pixels in the original attribute graph and still exist in the new input attribute graph. Moreover, in the attribute graph processed by this algorithm, when N gradually decreases, the new input image will be very close to the situation of the unfilled attribute graph, but these retained unoccupied pixels will help reduce the high-frequency elements that occur when the occupied blocks are not fully occupied, thereby playing a role in balancing the encoding efficiency. The cavitation processing of those unoccupied blocks will avoid the waste of bitrate and effectively reduce the encoding complexity. However, the effect of saving bitrate needs to obtain the optimal N value through experiments.

4. Experiments and Discussion

4.1. Experimental Setup

The content of this chapter first conducts experiments on different processing methods to obtain appropriate coding parameters and then integrates the overall methods to avoid possible influences among different effects. In order to obtain the encoding effect of the proposed method and evaluate its performance improvement, the V-PCC reference software TMC2-V18.0 [28], and the HM test software HM16.20-SCM8.8 [29] corresponding to the HEVC encoding method were selected for experimental verification. The encoding mode was configured as All Intra, and the lossy encoding of the V-PCC general test condition was set [30]. The method proposed in this paper will be compared with the V-PCC anchor point method. The dynamic point clouds defined for testing are Loot, RedAndBlack, Soldier, Queen, and Longdress [31,32]. The first 32 frames are selected as encoded frames and tested using five different bitrates. The corresponding relationships of the 5 pairs of QPs corresponding to the geometric graph and the attribute graph are shown in Table 2, where r1 represents low bitrate and r5 represents high bitrate.

The evaluation criterion of the experiment adopts the method in [33] to calculate the BDBR generated during the encoding of various aspects of the point cloud, which represents the percentage of the average additional consumption (positive) or savings (negative) of encoded bits under the condition of the same quality of the reconstructed point cloud. Among them, Geom.BD-GeomRate evaluates the geometric quality and bitrate of the reconstructed point cloud video. Geom.BD-TotalRate assesses the geometric quality and total bitrate of point cloud reconstructed videos, while end-to-end BD-AttrRate and end-to-end BD-TotalRate, respectively, evaluate the attribute quality and bitrate as well as the total bitrate. Meanwhile, in order to obtain the effect of the proposed method on bitrate optimization, the encoding bit ratio Ebr, encoding bit error

∆ R

, and encoding quality difference

∆ P S N R

are defined and calculated as follows:

Ebr = \frac{R_{p}}{R_{o}},

(20)

∆ R = \frac{| R_{T} - R_{p} |}{R_{T}},

(21)

∆ P S N R = {{P S N R}_{p} - P S N R}_{o},

(22)

In the formula,

R_{p}

represents the encoded bit generated by the proposed method,

R_{o}

is the encoded bit in the original method,

R_{T}

is the target bit, and

{P S N R}_{p}

and

{P S N R}_{o}

are the PSNR values of the proposed method and the original method, respectively. The encoding time consumption is calculated using the method in [34], and the formula is expressed as:

∆ T = \frac{T_{P} - T_{O}}{T_{O}},

(23)

where

T_{P}

and

T_{O}

represent the time required by this method and the time spent by the original method, respectively.

4.2. Obtain Appropriate Encoding Parameters

Firstly, the point cloud test sequence is encoded according to the processing method of the geometric graph, and the encoding bitrate and distortion data are statistically analyzed. The obtained data results are fitted by Formula (23) based on Formulas (18) and (19). Figure 8 shows the fitting effect. As can be seen from the data in the figure, in order to facilitate the processing of different point cloud sequences, the parameters

\frac{α_{1}}{α_{2}}

and

β_{1} - β_{2}

can be set to 9.5 and 0.6, respectively, by taking the average value. Then,

δ

in the actual encoding process can be determined as 9.5 ×

{b p p}^{0.6}

−1.

Table 3 shows the average BDBR of the processed attribute graph obtained by comparing it with the V-PCC anchor point method for different point clouds under the conditions of N = 64, 32, 16 and 8 in the processing method of the attribute graph. All the data in the table represent the average performance. It can be seen from the data in the table that when only the value of N is 64, the encoded bits in all data information decrease, and for smaller N values, there is an increase. This is because when N is small, the occupied blocks in the edge small areas may be more easily divided together with the empty blocks, leading to the introduction of high frequency. Moreover, for the smaller divided blocks, the number of encoded blocks increases during encoding, thereby improving the bitrate. Therefore, 64 is selected as the optimal value of the occupied information sign N.

4.3. Overall Experimental Results and Analysis

An overall experiment was conducted on the method proposed in this paper. Among them, the impact factor

δ

is set to 9.5 ×

{b p p}^{0.6}

−1.

B p p

is calculated and set by Formula (17) during the experiment according to different target bitrates in Table 2. Moreover, for dynamic point clouds of different motions or complexities, the influence factor of occupied pixels can be dynamically adjusted according to the bitrate requirements in actual applications to optimize the bitrate weight and improve the expression of the allocation coefficient. In addition, the optimal occupancy information flag N value is set to 64. Table 4 shows the experimental results after processing the different point cloud sequences. The data indicate that compared with the original coding method, the geometric graph bitrate distortion performance on D1 and D2 is reduced by an average of 15.67% and 16.68%, respectively, in terms of bitrate control. The bitrate distortion performance of the attribute graph represented by Luma, Cb, and Cr indicators decreased by 4.38%, 0.68%, and 1.74%, respectively. As for the overall performance result, in Geom.BD-TotalRate, the average decrease in D1 was 29.88% and that of D2 was 31.50%. Moreover, the Luma, Cb, and Cr in the end-to-end BD-TotalRate were saved by an average of 5.50%, 2.66%, and 3.34%, respectively. These data indicate that the method proposed in this paper has brought about relatively obvious effect changes on the encoding performance of geometric graphs and attribute graphs, and the overall RD performance has improved even more. This is because the geometric graph, after reconstruction, will be used to guide the encoding of the attribute graph. When the encoding performance of the geometric graph improves, it also helps to further enhance the encoding effect of the attribute graph. Moreover, compared with the rougher processing method for non-occupied pixels in the attribute graph, the more direct and precise approach in the geometric graph makes the utilization efficiency of the encoding bitrate more obvious, and the optimized processing of invalid bitrates also leads to better compression effects. As for the encoding time, it is slightly reduced. This is because the processing of some invalid information is avoided during the encoding process, reducing additional time consumption. However, time saving is not the main consideration of this method, so its effect is not obvious. To significantly reduce the encoding time and achieve a marked decrease in encoding complexity, it is necessary to further optimize and improve this method in future work to enhance its applicability and efficiency.

At present, research on the optimization of bitrate control is still insufficient, and the processing of CTU level is also relatively rare. In reference [21], there is a similar research section on the bit allocation algorithm of BU level. The effect of its allocation method when processing geometric graphs and attribute graphs compared with the original method is shown in Table 5 and Table 6. The data in the table indicates that this method has improved the attribute performance during the application process, but it has actually caused RD loss in geometric performance, its attribute effects are not particularly prominent compared to this method. However, in their overall approach, the BU-level bit allocation method for geometric graphs was ultimately not adopted either. Reference [20] considers the quality dependency relationship between geometric graphs and attribute graphs to optimize bit allocation. By establishing a model, it achieves the adjustment of bitrate allocation between the two. The overall effect of this method is shown in Table 7. It can be seen that when both geometric and attribute performance aspects are taken into account, the average improvement of the overall geometric performance of its algorithm on D1 and D2 was 9.27% and 6.46%, respectively, with some loss in attribute performance, while the average overall optimization performance was 8.73%. In contrast, the geometric performance of this method is significantly better than this effect, and it achieves a bidirectional improvement in both geometric and attribute performance. The frame-level bit allocation method proposed in reference [25], analyzes the image frames through the distortion propagation model and introduces the distortion propagation factor, achieving the effective utilization of the local region bitrate and reducing the BD-Rate of the geometric graph and the attribute graph to 0.82% and 4.83%, respectively. After correction in combination with the occupancy information, this further led to a reduction of 2.12% and 6.41%, respectively. The experimental data are presented in Table 8 and Table 9. Compared with this method, the algorithm proposed in this paper achieves far superior performance in geometric graphs, but its effect in attribute graphs is slightly inferior. However, this also demonstrates that when this method performs more direct and clear bitrate allocation processing on geometric graphs, it can achieve more effective utilization of bitrates.

During the point cloud testing process, the consumption of encoded bits and the changes in PSNR by the original method and this method in the experiment were statistically analyzed and sorted out as shown in Table 10 and Table 11. These data show that the method proposed in this paper has achieved the effect of saving encoding bits under different bitrates and has a certain degree of stability. It can achieve an average bit ratio of 94.32% during the encoding bit saving process. Moreover, due to the more efficient utilization of bitrate, it can be better allocated to important content during the encoding process, and the encoding quality has also been improved to a certain extent. Under the effect of the enhancement of geometric quality, the attribute quality has made more obvious progress. This once again indicates that when geometric graphs are used to guide the encoding of attribute graphs, the improvement of their quality is conducive to the optimization of the encoding quality of attribute graphs. Therefore, based on the above analysis, it can be determined that the application of this method can effectively enhance the overall performance of video coding, achieve savings in bitrate and improvement in utilization efficiency.

When conducting statistics on bit errors under different bitrate conditions (from r1 to r5), it was found that the allocation of encoding bitrates actually caused some errors in terms of precision control. Especially at high bitrates, a bit error of 24.12% occurred, and the encoding accuracy declined. The data situation is presented in Table 12. This is because during the encoding process, for the setting of some initial parameters, such as α and β, the original values are still adopted. These values are mainly obtained by fitting the content of traditional videos. However, the two-dimensional videos obtained after processing point cloud videos are different from traditional videos, thus causing certain errors during use. In this case, for the dynamic point cloud that changes in real time, the parameters

α

and

β

in the original method may not be updated in a timely manner during the calculation process. This will further lead to a decrease in the accuracy of bitrate allocation, thereby limiting the better application of this method. Adjustments need to be made under the requirements of real-time and intelligent development in the future. Make the setting of the relevant initial parameters of this method more in line with and adapted to the actual situation of point cloud videos. But in general, the method proposed in this paper has played a good role in improving the coding performance and achieved obvious results.

The RD curve obtained after encoding the point cloud Loot using this method is shown in Figure 9. It can be clearly seen that compared with the original method, the method proposed in this paper has a good performance. Figure 10 shows the point cloud effects obtained after the sequences Loot, Longdress, and RedAndBlack were reconstructed using the original method and the improved method proposed in this paper, respectively. It can be seen that the reconstruction quality of the point cloud under the improved method is better, which once again indicates that the improvement of this method has enhanced the utilization efficiency of encoded bits. While avoiding bitrate waste, the subjective quality experience is also improved.

5. Conclusions

In this paper, the non-occupied pixels contained in the geometric graph and the attribute graph are processed, respectively, and a coding optimization method for point cloud video is implemented. For the geometric graph, the bitrate weight is used to improve the allocation algorithm. By taking advantage of the relationship between the encoding bitrate and distortion, the encoding bit allocation method is optimized to avoid the waste of bit resources of non-occupied blocks and improve the effective utilization of the bitrate. At the same time, by obtaining the occupation information sign, the non-occupied pixel information in the attribute graph is cavitation processed. The improved algorithm design for the input pixels of the attribute graph was implemented, and the invalid information at the input end was controlled, further saving the bitrate. It can be known from the experimental results that, compared with the original method, the method proposed in this paper has achieved a significant improvement in encoding performance for the processing of different point clouds. The average overall savings in encoding bits are 29.88%, 31.50%, 5.50%, 2.66%, and 3.34%, respectively. At the same time, the encoding quality is also guaranteed to a certain extent, the effect on bitrate optimization is obvious. Moreover, regarding the errors existing in the bitrate allocation, further adjustments can be considered in future work. Designs that are more tailored to the characteristics of point cloud video content can be made to obtain more appropriate encoding parameters, thereby achieving more obvious results.

Author Contributions

Conceptualization, F.W. and J.J.; methodology, F.W.; software, Q.Z.; validation, J.J. and Q.Z.; formal analysis, F.W.; investigation, F.W.; resources, Q.Z.; data curation, J.J.; writing—original draft preparation, J.J. and Q.Z.; writing—review and editing, F.W. and Q.Z.; visualization, F.W.; supervision, Q.Z.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China No. 61771432, and 61302118, the Key Projects Natural Science Foundation of Henan 232300421150, Zhongyuan Science and Technology Innovation Leadership Program 244200510026, the Scientific and Technological Project of Henan Province 232102211014, 23210221101, and 242102211007.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

V-PCC	Video-based Point Cloud Compression
G-PCC	Geometry-based Point Cloud Compression
H.264/AVC	Advanced video coding
H.265/HEVC	High efficiency video coding
H.266/VVC	Versatile video coding
CTU	Coding tree unit
OIS	Occupancy information flag

References

Rauschnabel, P.A.; Felix, R.; Hinsch, C.; Shahab, H.; Alt, F. What is XR? Towards a Framework for Augmented and Virtual Reality. Comput. Hum. Behav. 2022, 133, 107289. [Google Scholar] [CrossRef]
Graziosi, D.; Nakagami, O.; Kuma, S.; Zaghetto, A.; Suzuki, T.; Tabatabai, A. An Overview of Ongoing Point Cloud Compression Standardization Activities: Video-Based (V-PCC) and Geometry-Based (G-PCC). APSIPA Trans. Signal Inf. Process. 2020, 9, e13. [Google Scholar] [CrossRef]
MPEG-PCC-TMC13: Geometry Based Point Cloud Com-Pression G-PCC. 2021. Available online: https://github.com/MPEGGroup/mpeg-pcc-tmc13 (accessed on 13 August 2025).
MPEG-PCC-TMC2: Video Based Point Cloud Compression VPCC. 2022. Available online: https://github.com/MPEGGroup/mpeg-pcc-tmc2 (accessed on 13 August 2025).
Richardson, I.E. The H. 264 Advanced Video Compression Standard; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J. Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
Tohidi, F.; Paul, M.; Afsana, F. Efficient Dynamic Point Cloud Compression Through Adaptive Hierarchical Partitioning. IEEE Access 2024, 12, 152614–152629. [Google Scholar] [CrossRef]
Chen, A.; Mao, S.; Li, Z.; Xu, M.; Zhang, H.; Niyato, D.; Han, Z. An introduction to point cloud compression standards. GetMobile Mob. Comput. Commun. 2023, 27, 11–17. [Google Scholar] [CrossRef]
Ahmad, I.; Swaminathan, V.; Aved, A.; Khalid, S. An Overview of Rate Control Techniques in HEVC and SHVC Video Encoding. Multimed. Tools Appl. 2022, 81, 34919–34950. [Google Scholar] [CrossRef]
Li, B.; Li, H.; Li, L.; Zhang, J. Lambda Domain Rate Control Algorithm for High Efficiency Video Coding. IEEE Trans. Image Process. 2014, 23, 3841–3854. [Google Scholar] [CrossRef]
Chen, Z.; Pan, X. An Optimized Rate Control for Low-Delay H. 265/HEVC. IEEE Trans. Image Process. 2019, 28, 4541–4552. [Google Scholar] [CrossRef]
Li, S.; Xu, M.; Wang, Z.; Sun, X. Optimal Bit Allocation for CTU Level Rate Control in HEVC. IEEE Trans. Circuits Syst. Video Technol. 2016, 27, 2409–2424. [Google Scholar] [CrossRef]
Zhou, M.; Wei, X.; Kwong, S.; Jia, W.; Fang, B. Rate control method based on deep reinforcement learning for dynamic video sequences in HEVC. IEEE Trans. Multimed. 2020, 23, 1106–1121. [Google Scholar] [CrossRef]
Marzuki, I.; Lee, J.; Wiratama, W.; Sim, D. Deep convolutional feature-driven rate control for the HEVC encoders. IEEE Access 2021, 9, 162018–162034. [Google Scholar] [CrossRef]
Chen, S.; Aramvith, S.; Miyanaga, Y. Learning-Based Rate Control for High Efficiency Video Coding. Sensors 2023, 23, 3607. [Google Scholar] [CrossRef] [PubMed]
Zhou, M.; Wei, X.; Kwong, S.; Jia, W.; Fang, B. Just Noticeable Distortion-Based Perceptual Rate Control in HEVC. IEEE Trans. Image Process. 2020, 29, 7603–7614. [Google Scholar] [CrossRef]
Liu, Z.; Pan, X.; Li, Y.; Chen, Z. A Game Theory Based CTU-Level Bit Allocation Scheme for HEVC Region of Interest Coding. IEEE Trans. Image Process. 2020, 30, 794–805. [Google Scholar] [CrossRef]
Liu, Q.; Yuan, H.; Hou, J.; Hamzaoui, R.; Su, H. Model-based joint bit allocation between geometry and color for video-based 3D point cloud compression. IEEE Trans. Multimed. 2020, 23, 3278–3291. [Google Scholar] [CrossRef]
Shen, F.; Gao, W. A rate control algorithm for video-based point cloud compression. In Proceedings of the 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 5–8 December 2021; pp. 1–5. [Google Scholar]
Li, L.; Li, Z.; Liu, S.; Li, H. Rate control for video-based point cloud compression. IEEE Trans. Image Process. 2020, 29, 6237–6250. [Google Scholar] [CrossRef]
Chen, C.; Jiang, G.; Yu, M. Depth-Perception Based Geometry Compression Method of Dynamic Point Clouds. In Proceedings of the 2021 5th International Conference on Video and Image Processing (ICVIP’21), Hayward, CA, USA, 22–25 December 2021; Association for Computing Machinery: New York, NY, USA, 2022; pp. 56–61. [Google Scholar]
Wang, T.; Li, F.; Cosman, P.C. Learning-based rate control for video-based point cloud compression. IEEE Trans. Image Process. 2022, 31, 2175–2189. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, J.; Ma, W.; Ding, D.; Ma, Z. Content-aware rate control for geometry-based point cloud compression. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 9550–9561. [Google Scholar] [CrossRef]
Cai, Z.; Gao, W.; Li, G.; Gao, W. Distortion Propagation Model-Based V-PCC Rate Control for 3D Point Cloud Broadcasting. IEEE Trans. Broadcast. 2025, 71, 180–192. [Google Scholar] [CrossRef]
Liu, W.; Yu, M.; Jiang, Z.; Xu, H.; Zhu, Z.; Zhang, Y.; Jiang, G. Cracks-suppression perceptual geometry coding for dynamic point clouds. Digit. Signal Process. 2024, 149, 104471. [Google Scholar] [CrossRef]
Li, B.; Li, H.; Li, L.; Zhang, J. Rate Control by R-Lambda Model for HEVC. ITU-T SG16 Contribution, JCTVC-K0103, 2012, 1–5. Available online: https://scholar.google.com/scholar?as_q=Rate+control+by+r-lambda+model+for+hevc&as_occt=title&hl=en&as_sdt=0%2C31 (accessed on 13 August 2025).
Point Cloud Compression Category 2 Reference Software, tmc2-18.0. 2025. Available online: https://github.com/MPEGGroup/mpeg-pcc-tmc2/tree/release-v18.0 (accessed on 13 August 2025).
High Efficiency Video Coding Test Model, hm-16.20+scm8.8. Available online: https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.20+SCM-8.8/ (accessed on 13 August 2025).
Schwarz, S.; Martin-Cocher, G.; Flynn, D.; Budagavi, M. Common Test Conditions for Point Cloud Compression; Document ISO/IEC JTC1/SC29/WG11 w17766; 3DG Group: Ljubljana, Slovenia, 2018. [Google Scholar]
Eon, E.; Harrison, B.; Myers, T.; Chou, P. 8i Voxelized Full Bodies, Version 2—A Voxelized Point Cloud Dataset. ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG) Input Document m40059 M 74006 (2017). Available online: https://plenodb.jpeg.org/pc/8ilabs (accessed on 13 August 2025).
Xu, Y.; Lu, Y.; Wen, Z. Owlii Dynamic Human Mesh Sequence Dataset. In ISO/IEC JTC1/SC29/WG11 m41658, 120th MPEG Meeting. 2017, Volume 1. Available online: https://mpeg-pcc.org/index.php/pcc-content-database/owlii-dynamic-human-textured-mesh-sequence-dataset/ (accessed on 13 August 2025).
Common Test Conditions for PCC; Document ISO/IEC JTC1/SC29/WG11 N18883; MPEG (Moving Picture Experts Group): Gothenburg, Sweden, 2019.
Fan, Y.; Chen, J.; Sun, H.; Katto, J.; Jing, M. A fast QTMT partition decision strategy for VVC intra prediction. IEEE Access 2020, 8, 107900–107911. [Google Scholar] [CrossRef]

Figure 1. The basic framework of the V-PCC encoding end.

Figure 2. Generated geometric and attribute graph. (a) Geometric diagram of the sequence Loot; (b) attribute diagram of the sequence Loot.

Figure 3. Rate control principle structure diagram.

Figure 4. The CTU in the geometric graph contains the occupied pixels and the corresponding bitrate allocation effect diagram.

Figure 5. Flowchart of the geometric graph encoding optimization algorithm based on occupied pixels.

Figure 6. Generate the occupancy information sign OIS-64 using the occupancy map.

Figure 7. Empty the attribute image using the occupancy information sign. (a) The filled attribute graph; (b) the generated occupancy information sign.

Figure 8. Encoding parameter fit. (a) Loot; (b) RedAndBlack; (c) Soldier; (d) Queen; (e) Longdress.

Figure 9. The RD curve of the sequence Loot. (a) Loot Geometry-D1; (b) Loot Geometry-D2; (c) Loot Attribute-Luma; (d) Loot Attribute-Cb; (e) Loot Attribute-Cr.

Figure 10. Comparison of the subjective reconstruction effects of the sequences Loot, Longdress, and RedAndBlack. (a) Anchor; (b) Our Method; (c) Anchor; (d) Our Method; (e) Anchor; (f) Our Method.

Table 1. Process of Geometric Graph encoding Optimization Algorithm Based on Occupied Pixels.

Step	Description
1. Input parameters	Obtain the current frame-level target Bit $T_{C u r r p i c}$ , the bits used for encoding the header information ${B i t}_{h}$ , the bits used for the current frame ${C o d e d}_{p i c}$ , the prediction distortion D of the current encoded block, the frame prediction error Dp, the number of pixels occupied by the frame Np, and the number N of pixels occupied by the current CTU.
2. Determine the pixel occupation situation in the encoding CU	If N = 0, no encoding bits are allocated. Set QP to 51 and proceed to step 7. If N is not 0, perform step 3 for R-λ optimization based on occupied pixels.
3. R-λ optimization based on occupied pixels	According to Formula (6), the occupied pixel distortion is calculated. Then, the occupied block bit allocation weight can be obtained from Formula (5). At this time, the bitrate weight improvement allocation coefficient is calculated and obtained by Formula (8).
4. Obtain the pixel distortion impact factor $δ$ occupied (combined with the experimental fitting parameters)	Statistical data are obtained based on Formulas (13) and (14), and the results are used for fitting to satisfy Formula (18). Then, the values of parameters $\frac{α_{1}}{α_{2}}$ and $β_{1} - β_{2}$ can be obtained from the fitting results, thereby determining $δ$ .
5. Calculate the optimized bitrate allocation	Update the bitrate weight improvement allocation coefficient in Formula (1) and calculate the optimized encoding bit allocation.
6. Calculate the quantitative parameter QP	The encoding parameter λ is obtained from the actual bits, and then the quantization parameter QP can be calculated according to the formula.
7. Perform coding	The QP value obtained through optimization is then subjected to subsequent encoding.

Table 2. Correspondence between Bitrate and QP.

Bitrate	Geometric QP	Attribute QP
r5	16	22
r4	20	27
r3	24	32
r2	28	37
r1	32	42

Table 3. Average coding performance obtained by OIS-N for different point clouds under different N values (Unit: %).

N		64	32	16	8
Geom.BD-GeomRate	D1	0.0	0.0	0.0	0.0
Geom.BD-GeomRate	D2	0.0	0.0	0.0	0.0
Geom.BD-TotalRate	D1	−0.7	0.0	1.6	5.7
Geom.BD-TotalRate	D2	−0.8	−0.1	1.3	4.9
End-to-End BD-AttrRate	Luma	−1.6	0.7	5.1	14.5
	Cb	−2.1	0.9	8.4	29.4
	Cr	−3.0	0.3	5.6	22.7
End-to-End BD-TotalRate	Luma	−0.9	−0.1	1.3	5.3
	Cb	−1.3	0.0	3.7	14.8
	Cr	−1.8	−0.4	1.6	10.6
∆T		−5.07	−5.73	−7.19	−6.45

Table 4. Coding performance of the proposed overall method tested on different point clouds (unit: %).

Point Cloud Sequence		Loot	RedAndBlack	Soldier	Queen	Longdress	Average
Geom.BD-GeomRate	D1	−22.7	−13.6	−18.9	−11.3	−12.1	−15.67
Geom.BD-GeomRate	D2	−23.5	−14.7	−19.3	−12.4	−13.5	−16.68
Geom.BD-TotalRate	D1	−39.8	−24.1	−43.6	−21.6	−20.3	−29.88
Geom.BD-TotalRate	D2	−41.2	−26.4	−46.2	−22.3	−21.4	−31.50
End-to-End BD-AttrRate	Luma	−3.6	−4.2	−8.1	−2.8	−3.2	−4.38
	Cb	−0.3	−1.1	−0.9	−0.5	−0.6	−0.68
	Cr	−1.7	−3.4	−1.5	−1.0	−1.1	−1.74
End-to-End BD-TotalRate	Luma	−6.0	−5.3	−7.7	−3.6	−4.9	−5.50
	Cb	−3.4	−2.6	−2.4	−2.2	−2.7	−2.66
	Cr	−4.1	−3.8	−2.6	−3.0	−3.2	−3.34
∆T		−4.76	−5.12	−6.92	−5.74	−6.41	−5.79

Table 5. Performance of BU-level bit allocation Method in Reference [21] on geometric graph processing (Unit: %).

Point Cloud Sequence		Loot	RedAndBlack	Soldier	Queen	Longdress	Average
Geom.BD-GeomRate	D1	0.13	−0.37	−0.08	2.05	−1.44	0.06
Geom.BD-GeomRate	D2	0.75	0.10	0.16	2.74	−0.52	0.65
Geom.BD-TotalRate	D1	0.22	−0.68	−0.15	3.62	−2.03	0.20
Geom.BD-TotalRate	D2	0.94	0.21	0.39	4.37	−0.93	1.00
End-to-End BD-AttrRate	Luma	−0.21	−0.14	0.11	0.31	−0.40	−0.07
	Cb	−0.16	−0.20	0.45	−0.00	−0.63	−0.11
	Cr	−1.22	−0.07	−0.30	−0.84	−0.37	−0.56
End-to-End BD-TotalRate	Luma	−0.64	−0.37	0.35	0.66	−0.74	−0.15
	Cb	−0.60	−0.52	1.12	−0.13	−1.05	−0.24
	Cr	−2.65	0.13	−0.81	−1.37	−0.62	−1.06

Table 6. Performance of Bu-level bit allocation method in reference [21] on attribute graph processing (Unit: %).

Point Cloud Sequence		Loot	RedAndBlack	Soldier	Queen	Longdress	Average
Geom.BD-GeomRate	D1	−0.03	0.12	−0.01	0.00	0.01	0.02
Geom.BD-GeomRate	D2	−0.04	0.20	0.02	−0.01	0.00	0.03
Geom.BD-TotalRate	D1	−0.11	0.26	0.00	0.00	0.00	0.03
Geom.BD-TotalRate	D2	−0.14	0.32	0.00	0.01	0.00	0.04
End-to-End BD-AttrRate	Luma	−1.22	−1.18	−0.61	−0.34	−1.22	−0.91
	Cb	−1.63	−2.23	−0.96	−0.75	−1.40	−1.39
	Cr	−1.71	−2.52	−0.88	−0.66	−1.57	−1.47
End-to-End BD-TotalRate	Luma	−3.15	−3.23	−1.10	−0.72	−2.61	−2.16
	Cb	−2.77	−4.32	−1.57	−1.38	−2.54	−2.52
	Cr	−2.80	−4.46	−1.43	−1.04	−2.76	−2.50

Table 7. Overall Effect Performance of Reference [20] (Unit: %).

Point Cloud Sequence		Loot	RedAndBlack	Soldier	Queen	Longdress	Average
Geom.BD-TotalRate	D1	−5.13	−7.08	20.94	−11.16	−43.92	−9.27
Geom.BD-TotalRate	D2	−3.20	−3.16	23.48	−9.24	−40.18	−6.46
End-to-End BD-TotalRate	Luma	1.58	1.88	−0.77	2.15	4.36	1.84
	Cb	1.67	2.07	−3.83	3.07	8.73	2.34
	Cr	2.82	3.01	−3.91	2.79	8.54	2.65
Overall.BD-TotalRate		−4.77	−6.24	18.62	−10.51	−40.73	−8.73

Table 8. Overall Effect Performance of Reference [25] (Unit: %).

Point Cloud Sequence		Loot	RedAndBlack	Soldier	Queen	Longdress	Average
Geom.BD-TotalRate	D1	−0.34	0.43	−0.95	−0.49	−0.79	−0.43
Geom.BD-TotalRate	D2	−1.64	0.45	−3.13	−0.14	−1.65	−1.22
End-to-End BD-TotalRate	Luma	−7.80	−5.10	−5.87	−0.21	−3.85	−4.57
	Cb	−10.96	−5.11	−6.10	1.75	−3.32	−4.75
	Cr	−10.96	−5.53	−5.48	0.08	−3.89	−5.16

Table 9. Overall effect performance after correction by introducing occupancy information in reference [25] (Unit: %).

Point Cloud Sequence		Loot	RedAndBlack	Soldier	Queen	Longdress	Average
Geom.BD-TotalRate	D1	−1.55	−0.62	−0.42	−2.04	−1.26	−1.18
Geom.BD-TotalRate	D2	−3.93	−1.87	−4.21	−1.93	−3.35	−3.06
End-to-End BD-TotalRate	Luma	−8.76	−6.19	−7.07	0.32	−5.35	−5.41
	Cb	−12.94	−7.53	−7.61	0.44	−6.73	−6.87
	Cr	−12.75	−7.27	−7.36	−0.36	−7.07	−6.96

Table 10. Encoding bit consumption of the proposed overall method tested on different point clouds (unit: kbps).

Point Cloud Sequence			Loot	RedAndBlack	Soldier	Queen	Longdress	Average	Ebr (%)
r1	Encoding bit	Anchor	884,379	1,752,416	2,153,842	1,695,473	3,048,637	1,906,949.4	94.26
r1		Our Method	857,164	1,631,473	2,071,936	1,532,078	2,894,605	1,797,451.2	94.26
r2		Anchor	1,973,657	2,917,834	3,978,514	2,784,506	6,013,756	3,533,653.4	95.18
r2		Our Method	1,742,832	2,763,185	3,841,029	2,537,891	5,931,581	3,363,303.6	95.18
r3		Anchor	3,752,618	4,896,771	8,707,536	5,731,084	13,756,492	7,368,900.2	92.27
r3		Our Method	3,543,265	4,653,217	8,342,793	5,527,603	11,930,784	6,799,532.4	92.27
r4		Anchor	7,135,672	9,437,865	16,075,369	8,760,294	23,185,946	12,919,029.2	95.77
r4		Our Method	6,951,763	8,746,592	15,427,937	8,631,507	22,106,357	12,372,831.2	95.77
r5		Anchor	14,031,792	16,356,219	28,943,556	15,796,803	42,180,976	23,461,869.2	94.13
r5		Our Method	12,157,329	15,478,366	27,459,630	14,337,964	40,985,783	22,083,814.4	94.13
Average									94.32

Table 11. Coding quality of different point clouds tested by the overall method proposed in this paper (unit: dB).

Point Cloud Sequence	Loot	RedAndBlack	Soldier	Queen	Longdress	Average
$∆ P S N R$ -D1	0.74	0.51	1.05	0.60	0.48	0.676
$∆ P S N R$ -D2	0.97	0.63	1.27	0.87	0.57	0.862
$∆ P S N R$ -Luma	2.02	1.75	2.16	1.83	2.33	2.018
$∆ P S N R$ -Cb	1.14	0.76	0.81	0.92	1.07	0.940
$∆ P S N R$ -Cr	1.30	1.62	0.89	0.75	1.24	1.160

Table 12. Encoding Bit Error (Unit: %).

Point Cloud Sequence	Loot	RedAndBlack	Soldier	Queen	Longdress	Average
r1	1.31	3.67	2.85	2.04	1.73	2.32
r2	12.64	10.93	12.77	11.72	13.26	12.26
r3	16.15	15.48	18.61	16.93	17.49	16.93
r4	18.37	18.94	20.13	19.45	21.10	19.60
r5	21.72	24.31	25.07	22.76	26.75	24.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, F.; Jia, J.; Zhang, Q. Optimization of Non-Occupied Pixels in Point Cloud Video Based on V-PCC and Joint Control of Bitrate for Geometric–Attribute Graph Coding. Electronics 2025, 14, 3287. https://doi.org/10.3390/electronics14163287

AMA Style

Wang F, Jia J, Zhang Q. Optimization of Non-Occupied Pixels in Point Cloud Video Based on V-PCC and Joint Control of Bitrate for Geometric–Attribute Graph Coding. Electronics. 2025; 14(16):3287. https://doi.org/10.3390/electronics14163287

Chicago/Turabian Style

Wang, Fengqin, Juanjuan Jia, and Qiuwen Zhang. 2025. "Optimization of Non-Occupied Pixels in Point Cloud Video Based on V-PCC and Joint Control of Bitrate for Geometric–Attribute Graph Coding" Electronics 14, no. 16: 3287. https://doi.org/10.3390/electronics14163287

APA Style

Wang, F., Jia, J., & Zhang, Q. (2025). Optimization of Non-Occupied Pixels in Point Cloud Video Based on V-PCC and Joint Control of Bitrate for Geometric–Attribute Graph Coding. Electronics, 14(16), 3287. https://doi.org/10.3390/electronics14163287

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization of Non-Occupied Pixels in Point Cloud Video Based on V-PCC and Joint Control of Bitrate for Geometric–Attribute Graph Coding

Abstract

1. Introduction

2. Related Work

2.1. V-PCC Coding Method

2.2. Bitrate Control Technology

2.3. Research on Coding Control

3. The Proposed Method

3.1. Optimization of Geometric Graph Encoding Based on Occupied Pixels

3.1.1. Principle of Bitrate Control

3.1.2. Optimization Ideas for Geometric Graph Coding

3.1.3. R-λ Optimization Based on Occupied Pixels

3.2. Optimization of Attribute Graph Encoding Based on Occupied Pixels

3.2.1. Obtain the Occupancy Information Sign OIS-N

3.2.2. Attribute Graph Encoding Input Pixel Improvement Algorithm

4. Experiments and Discussion

4.1. Experimental Setup

4.2. Obtain Appropriate Encoding Parameters

4.3. Overall Experimental Results and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI