Point Cloud Quality Assessment Using a One-Dimensional Model Based on the Convolutional Neural Network

Recent advancements in 3D modeling have revolutionized various fields, including virtual reality, computer-aided diagnosis, and architectural design, emphasizing the importance of accurate quality assessment for 3D point clouds. As these models undergo operations such as simplification and compression, introducing distortions can significantly impact their visual quality. There is a growing need for reliable and efficient objective quality evaluation methods to address this challenge. In this context, this paper introduces a novel methodology to assess the quality of 3D point clouds using a deep learning-based no-reference (NR) method. First, it extracts geometric and perceptual attributes from distorted point clouds and represent them as a set of 1D vectors. Then, transfer learning is applied to obtain high-level features using a 1D convolutional neural network (1D CNN) adapted from 2D CNN models through weight conversion from ImageNet. Finally, quality scores are predicted through regression utilizing fully connected layers. The effectiveness of the proposed approach is evaluated across diverse datasets, including the Colored Point Cloud Quality Assessment Database (SJTU_PCQA), the Waterloo Point Cloud Assessment Database (WPC), and the Colored Point Cloud Quality Assessment Database featured at ICIP2020. The outcomes reveal superior performance compared to several competing methodologies, as evidenced by enhanced correlation with average opinion scores.


Introduction
Recently, the utilization of 3D models has seen a significant expansion in various fields, including virtual and mixed reality, computer-aided diagnosis, architecture, and the preservation of cultural heritage.However, when these 3D models undergo operations such as simplification and compression, they can potentially introduce different distortion types that negatively impact the visual quality of 3D point clouds.To tackle this problem, there is a growing demand for robust methods to evaluate perceived quality.Traditionally, assessing distortion levels in 3D models has depended on human observers, which is timeconsuming and resource-intensive.To streamline this, objective methods have emerged as a practical solution [1].These methods involve the implementation of automated metrics that aim to replicate the judgments of an ideal human observer.These metrics can generally be categorized into three groups: full reference (FR) [2][3][4][5][6], reduced reference (RR) [7,8], and no-reference (NR) [9][10][11][12][13][14][15].Among these, blind methods, which do not rely on reference models, have gained particular significance, especially in real-world applications [16][17][18][19].
Three-dimensional point cloud is a collection of points, each characterized by geometric coordinates and potentially additional attributes such as color, reflectance, and surface normals.
Unlike 2D media, such as images and videos, which are organized in a regular grid, the points in 3D point clouds are scattered throughout space.Therefore, there is a need to explore methods for extracting effective features from these scattered points to assess quality.
To date, only a limited set of metrics for assessing the quality of point clouds without reference, known as NR-PCQA (No-Reference Point Cloud Quality Assessment), have been developed.Chetouani et al. [20] adopted an approach involving extracting hand-crafted features at the patch level and using traditional CNN models for quality regression.PQAnet [9] employs multi-view projection as a method for feature extraction.Zhang et al. [13] took a distinct approach by using various statistical distributions to estimate quality-related parameters based on the distributions of geometry and color attributes.Fan et al. [21] focus on inferring the visual quality of point clouds through the analysis of captured video sequences.Liu et al. [10] utilized an end-to-end sparse CNN to predict quality.Yang et al. [22] extended their efforts by transferring quality information from natural images to enhance the understanding of the quality of point cloud rendering images, employing domain adaptation techniques.
Recently, Convolutional Neural Networks (CNNs) have emerged as the predominant choice for various Computer Vision and Machine Learning tasks.CNNs are feedforward Artificial Neural Networks (ANN) characterized by their convolutional and subsampling layer arrangement.Deep 2D CNNs, with numerous hidden layers and millions of parameters, can learn intricate objects and patterns, particularly when trained on extensive visual datasets with ground-truth labels.When properly trained, this capability positions them as the primary tool for various engineering applications that involve 2D signals, such as images and video frames.
However, this strategy may not always be feasible for numerous applications dealing with 1D signals, particularly when the training dataset is constrained to a specific application.To address this challenge, 1D CNNs have been recently introduced and have rapidly demonstrated cutting-edge performance across multiple domains.These domains include personalized biomedical data classification and early diagnosis, structural health monitoring, anomaly detection, identification in power electronics, and detecting faults in electrical motors.Among the technical applications of the 1D CNNs we quote automatic speech recognition, vibration-based structural damage detection in civil infrastructure, and real-time electrocardiogram monitoring [23].
Despite the growing importance of 1D CNNs in various applications, there exists a current void in the literature regarding point cloud quality assessment using these networks.In this context, we introduce a novel method in this paper for evaluating the visual quality of 3D point clouds.Our method revolves around a transfer learning model grounded in a one-dimensional CNN architecture.The main contributions of this paper are summarized as follows: • The introduction of a novel methodology that adapts a one-dimensional Convolutional Neural Network (1DCNN) architecture for evaluating the visual quality of 3D point clouds.

•
The design of the 1DCNN network tailored for point clouds by transforming a 2D CNN model into a 1D variant.

•
The incorporation of transfer learning using a pre-trained ImageNet model to initialize the 1DCNN network for point cloud quality evaluation.
The rest of this paper is structured as follows: Section 2 provides an overview of the related work, Section 3 introduces the proposed method, Section 4 presents the experimental setup and the results of a comparative evaluation against alternative solutions, and finally, Section 5 concludes the paper.

Related Work
In the literature, the most current PCQA approaches can be broadly grouped into point-based, feature-based, and projection-based methods.A point-based quality metric directly compares the geometry or characteristics between the reference and distorted point clouds, assessing them point by point and establishing necessary point correspondences.The Point-to-Point (Po2Po) [24] and Point-to-Plane (Po2Pl) [25] metrics stand out as the most popular point-based geometry quality evaluation methods.In the Po2Po metric, each point in a degraded or reference point cloud is matched with its nearest corresponding point in the opposite cloud, and subsequently, the Hausdorff distance or Mean Squared Error (MSE) distance is computed for all point pairs.One significant limitation of these metrics is their failure to consider that point cloud points represent the surfaces of objects in the visual scene.Tian et al. [25] introduced Point-to-Plane (Po2Pl) metrics to address this issue.These metrics represent the underlying surface at each point as a plane perpendicular to the normal vector at that specific point.This approach yields smaller errors for points closer to the point cloud's surface, which is modeled as a plane.Currently, the MPEG-endorsed point cloud geometry quality metrics include Po2Po and Po2Pl and their corresponding Peak Signal-to-Noise Ratio (PSNR) [5].In addition, Alexiou et al. [26] proposed a Planeto-Plane (Pl2Pl) metric, which evaluates the similarity between the underlying surfaces associated with corresponding points in the reference and degraded point clouds.In this scenario, tangent planes are estimated for both the reference and degraded points, and the angular similarity between them is examined.
In their work [27], Javaheri et al. introduced a geometry quality metric that relies on the Generalized Hausdorff distance.This metric measures the maximum distance for a specific percentage of data rather than the entire dataset, effectively filtering out some outlier points.The Generalized Hausdorff distance is calculated between two point clouds, and it can be applied to both the Po2Po and Po2Pl metrics.Furthermore, in [28], Javaheri et al. proposed a Point-to-Distribution (Po2D) metric.This metric is based on the Mahalanobis distance between a point in one point cloud and its K nearest neighbors in the other point cloud.They compute the mean and covariance matrix of the corresponding distribution and employ it to measure the Mahalanobis distance between points in one point cloud and their respective set of nearest neighbors in the other point cloud.These distances are then averaged to determine the final quality score.In [29], they presented a joint color and geometry point-to-distribution quality metric.This metric leverages the scale-invariance property of the Mahalanobis distance.In [30], Javaheri et al. proposed resolution-adaptive metrics.These metrics enhance the existing D1-PSNR and D2-PSNR metrics by incorporating normalization factors based on the point cloud's rendering and intrinsic resolutions.
A feature-based point cloud quality approach computes a quality score by analyzing the differences in local and global features extracted from reference and degraded point clouds.In [31], Meynet et al. introduced the Point Cloud Multi-Scale Distortion metric (PC-MSDM).This metric serves as a measure of the geometric quality of point clouds, drawing its foundations from structural similarity principles and relying on the statistical examination of local curvature.
Viola et al. introduced a quality metric for point clouds using the histogram and correlogram of the luminance component [32].Then, they integrated the newly proposed color quality metric with the Po2Pl MSE geometry metric (D2) using a linear model.The weighting parameter for this fusion is determined through a grid search approach.
Diniz et al. introduced the Geotex metric, a novel approach based on Local Binary Pattern (LBP) descriptors developed for point clouds, particularly focusing on the luminance component [33].To implement this metric to point clouds, the LBP descriptor is computed within a local neighborhood corresponding to the K-nearest neighbors of each point in the other point cloud.The histograms of the extracted feature maps are generated for both the reference and degraded point clouds, and are used to calculate the final quality score employing a distance metric, such as the f-divergence [34].In [35], Diniz et al. presented an extension of the Geotex metric.This extension incorporates various distances, with a notable focus on the Po2Pl Mean Squared Error (MSE) for assessing geometry and the distance between Local Binary Pattern (LBP) statistics [33] for evaluating color.Additionally, Diniz et al. introduced a novel quality metric in their study [36], which calculates Local Luminance Patterns (LLP) based on the K-nearest neighbors of each point in the alternative point cloud.
Meynet et al. introduced the Point Cloud Quality Metric (PCQM) [3].It integrates geometric characteristics from a previous study [31] with five color-related features, including lightness, chroma, and hue.The PCQM is calculated as the weighted mean of the geometric and color attributes differences between the reference and degraded point clouds.In an another study, Viola et al. [7] presented the first reduced-reference quality metric, which concurrently evaluates geometry and color aspects.The authors extracted seven statistical features, including measures such as mean and standard deviation, from point clouds in reference and degraded states across various domains, including geometry, texture, and normal vectors.This process yielded a total of 21 features.The reduced quality score is calculated as the weighted average of the differences in all these features between the reference and degraded point clouds.
Inspired by the SSIM quality metric designed for 2D images, Alexiou et al. introduced a quality metric in [37] that utilizes local statistical dispersion features.These statistical characteristics are derived within a local neighborhood surrounding each point within the reference and degraded point clouds, including four distinct attributes: geometry, color, normal vectors, and curvature information.The final quality metric is derived by aggregating the differences in feature values between corresponding points in the reference and degraded point clouds.In [6], Yang et al. proposed a quality metric based on graph similarity.They identify key points by resampling the reference point cloud and construct local graphs centered at these key points for both the reference and degraded point clouds.Several local similarity features are then computed based on the graph topology, with the quality metric value corresponding to the degree of similarity between these features.Additionally, in [38] A quality metric for point clouds that relies on projection involves mapping the 3D reference and degraded point clouds onto specific 2D planes.The quality score is then determined by comparing these projected images using various 2D image quality metrics.The first projection-based point cloud quality metric was introduced by Queiroz et al. in [39].This metric begins by projecting the reference and degraded point clouds onto the six faces of a bounding cube that includes the entire point cloud.It combines the corresponding projected images and evaluates the 2D Peak Signal-to-Noise Ratio (PSNR) between the concatenated projected images from the degraded and reference point clouds.In [40], Torlig et al. introduced rendering software for visualizing point clouds on 2D screens.This software accomplishes the orthographic projection of a point cloud onto the six faces of its bounding box.Then, a 2D quality metric is applied to the projected images obtained by rendering, both for the reference and degraded point clouds.The final quality score is determined by averaging the results from the six projected image pairs.In [41], Alexiou et al. investigated how the quantity of projected 2D images impacts the correlation between subjective and objective assessments in projection-based quality metrics.The study reveals that even a single view can yield a reasonable correlation performance.Furthermore, they proposed a projection-based point cloud quality metric that assigns weights to the projected images based on user interactions during the subjective testing phase.In [42], the quality metric proposed in [40] is evaluated using different parameters, such as the number of views and pooling functions, to establish benchmarks and assess its performance under various conditions.
In [9], Liu et al. introduced a no-reference quality metric based on deep learning named the Point Cloud Quality Assessment Network (PQA-Net).This method begins by projecting the point cloud into six distinct images which undergo feature extraction through a convolutional neural network.These features are then processed by a distortion-type identification network and a quality vector prediction network to derive the final quality score.In [43], Bourbia et al. utilized a multi-view projection in 2D, which is segmented into patches, in combination with a deep convolutional neural network for evaluating the quality of point clouds.
In [44], Wu et al. introduced two objective quality metrics based on projection: a weighted view projection-based metric and a patch-projection-based metric.In both cases, 2D quality metrics are employed to assess the quality of texture and geometry maps.In particular, the patch-projection-based metric demonstrates a significant performance advantage over the weighted view projection-based metric.In [45], Liu et al. proposed a quality metric for point clouds that leverages attention mechanisms and the principle of information content-weighted pooling.Their proposed metric involves translating, rotating, scaling, and orthogonally projecting point clouds into 12 different views, and it evaluates the quality of these projected images using the IW-SSIM [46] 2D metric.
Point-based methods often prioritize geometry at the point level, neglecting color information.This limitation can be a drawback in situations where color details are significant.Focusing only on geometry may lead to an incomplete evaluation, especially when color plays a crucial role in the overall quality of the content.
The quality of feature-based methods heavily relies on the effectiveness of feature extraction techniques.Inaccurate or inadequate features can result in biased assessments.
Projection-based methods encounter the challenge of unavoidable information loss during the projection process.This loss can affect the accuracy of quality assessment, especially when critical details are compromised.Furthermore, the quality of projected images may be influenced by the angles and viewpoints employed in the projection.This sensitivity can lead to variations in the assessment results based on different projection configurations.

Proposed Method
The proposed approach employs transfer learning using 1D CNN to evaluate the quality of a point cloud.Transfer learning allows leveraging the knowledge of weights and layers from a pre-existing model to speed up the learning process of a new, untrained model.We transform the 2D CNN model into a 1D CNN variant in a first time.Then, we fit the ImageNet weights of the 2D CNN model to the formed 1D CNN model.After, we use this model to produce robust features for quality regression.Finally, fully connected (F_C) layers are used as the regression model.The overall structure of the proposed method is depicted in Figure 1 and the architecture of the 1D CNN Model is illustrated in Figure 2.

Geometry-Based Features
We have selected a set of relevant geometric features to assess the quality of point clouds.These features utilize eigenvalues and eigenvectors, which are calculated for each 3D point based on its neighbors within a specified radius.The associated covariance matrix C m , given a point Pc and its neighborhood P Ngm , is represented by the following notation: where P ci and Pc are vectors of dimension 3 × 1, C m is a matrix of dimension 3 × 3, K denotes the size of the neighborhood P Ngm .The eigenvector for the covariance matrix C m can be represented as: where the relevant eigenvectors are represented by (V 1 , V 2 , V 3 ) and the eigenvalues are denoted by (λ 1 , λ 2 , λ 3 ), such as For each point cloud PC = {points}, we extract the set of geometric features de- fined by: where Feat geom (pc i ) signifies the geometry projection function, and pc i ∈ {points}.The geometric feature formulations [47] are indicated as follows: • Linearity: is the characteristic that denotes the degree of resemblance to a straight line: • Planarity: is employed to assess the resemblance or similarity to a planar surface: • Anisotropy: is employed to demonstrate discrepancies in geometrical characteristics across various directions: • Sphericity: is the metric used to quantify the degree of resemblance between the shape of an object and that of a perfect sphere: • Omnivariance: is a geometric descriptor used to quantify point cloud data's overall variability or diversity in three-dimensional space.It captures the dispersion of points in all directions and provides a measure of the spatial distribution of the points: • Eigenentropy: is a mathematical measure that quantifies the level of disorder or randomness in a dataset, particularly in the context of analyzing 3D point clouds or spatial distributions.It assesses how dispersed or organized the data points are within a given neighborhood or region: • Sphere-fit: is often employed in various applications such as computer graphics, computer-aided design (CAD), and computer vision, where finding an accurate and robust approximation of a sphere to a set of scattered points is essential.It plays a significant role in many fields involving 3D point cloud data analysis and manipulation.

Perceptual-Based Features
For each point cloud, we extract the set of perceptual features F perc defined by: where Feat perc (pc i ) indicates the perceptual projection function.We have chosen color, curvature (Curv), and saliency (Sal) for perceptual features.
• Color: This plays a crucial role in evaluating visual quality.In a colored point cloud, each point's color is directly derived from its color information.Typically, the color information in 3D models is stored using RGB channels.However, the RGB color space has shown limited correlation with human perception.We choose to use the LAB color transformation for color feature projection as a solution.This method has been widely embraced for numerous quality assessment applications [48].• Saliency: This is a crucial aspect within the human visual system, involving allocating human attention or eye movements in a given scene.Identifying these remarkably perceptible areas is important in computer vision and computer graphics.

•
Curvature: This refers to the amount by which a curve, surface, or object deviates from being perfectly straight or flat at a specific point:

Feature Encoder
We convert the 2D CNN architecture into a 1D CNN version and adjust the weights derived from ImageNet for the 1D CNN model.Then, we produce the features through the following process: where F pc ∈ F geom , F perc , F i represents the produced features for each F pc , while 1DCNN(.)refers to the module responsible for robust features production using the 1D CNN architecture.

Convert the 2D CNN Model to a 1D CNN Model
Deep 2D CNNs have played a crucial role in computer vision tasks and have achieved remarkable success in various applications, including image recognition, object detection, facial recognition, and more.The pre-trained models (e.g., VGG, ResNet, MobileNet) have been used to tackle specific tasks, depending on the dataset and problem.
Leveraging the 2D CNN pre-trained model enables the acceleration of deep model training for other tasks through transfer learning [49,50].This advantage lies in its ability to achieve satisfactory results without requiring large amounts of labeled data or extensive computational resources.
Converting a 2D CNN model to a 1D CNN model typically involves modifying the fully connected layers of the model to handle 1D data instead of 2D data.The original 2D CNN architecture processes 2D images, where each image has height, width, and color channels (e.g., RGB).In contrast, a 1D model processes sequential data, such as text or time series data, with only one dimension.To convert 2D CNN to 1D CNN, we need to adapt the fully connected layers to work with the 1D input.Here is an overview of the followed steps: 1.
Remove the last few layers: In a 2D CNN model, the final layers are usually fully connected layers responsible for image classification.We need to remove these layers since they are designed for 2D data.

2.
Flatten the output: Since 1D data have only one dimension, we need to flatten the output of the last convolutional layer to convert it into a 1D format.

3.
Add new fully connected layers: After flattening, we add new fully connected layers designed to handle 1D data.These layers should have an appropriate number of neurons and activations suitable for the specific task you want to solve.

4.
Adjust input data: The input data fed into the model should also be converted to 1D format to match the new architecture.

5.
Adjust Output Layer: Finally, the output layer of the 2D model may need to be modified to match the desired output for the 1D model.For example, for regression tasks, it may need to output a single value, while for classification tasks, it may need to produce class probabilities.6.
Transfer Weights: Once the architecture is adjusted, the weights of the 2D model can be transferred to the corresponding layers in the 1D model.However, since some layers may have been removed or modified, care must be taken to ensure the weights are transferred appropriately.
It is important to note that converting a model from 2D to 1D is not always straightforward, especially if the model was initially designed for 2D image.The success of such a conversion heavily depends on the nature of the problem you want to solve with the 1D model.

Quality Prediction
The feature extraction step permits us to obtain a feature vector that captures the distinctive attributes of the 3D model.For our experiment, we suggest utilizing a fourlayer fully connected (F_c), the associated hyperparameters are presented in Table 1.We integrate the aforementioned features to make predictions about perceptual quality.Finally, the estimated quality scores Q_s can be calculated as follows: We minimize the regression loss during each training batch using the Mean Squared Error (MSE).This later serves the purpose of maintaining the proximity of predicted values to the quality labels, and this relationship can be expressed as follows: where n represents the number of distortions present in a given database.The database provides Mean Opinion Scores (MOS i ) that define the subjective quality assessment, while Qs i represents the objective quality score obtained through a specific method.In our study, we use three of the most popular in the field of point cloud quality assessment that are: the subjective point cloud assessment database (SJTU-PCQA) [51], the Waterloo point cloud assessment database (WPC) [45], and the ICIP2020 point cloud assessment database (ICIP2020) introduced in [52].

•
The ICIP2020 database [52] comprises six reference point clouds that incorporate both texture and geometry information.Additionally, it includes 90 distorted versions obtained using 3 compression methods: G-PCC Octree, G-PCC Trisoup, and VPCC, each at 5 different quality levels ranging from low to high.

Implementation Parameters
To assess the proposed approach against other learning-based NR-PCQA metrics, we partitioned ICIP2020, SJTU-PCQA, and WPC datasets into training and testing sets.
The division of reference point clouds for these three datasets ensured that 80% of the samples were allocated for training, leaving 20% for testing purposes.The Adam optimizer is utilized throughout the training phase, initializing with a learning rate of 1 × 10 −4 , while maintaining a batch size of 10.Furthermore, the model is trained over 50 epochs.We performed the test using a computer equipped with an Intel (R) Core (TM) i7-11800H @ 2.30 GHz, 32 GB of RAM, and an NVIDIA GeForce RTX 3060 Laptop GPU on the Windows platform.

Evaluation Metrics
Four evaluation criteria are employed to find the relation between the predicted scores and MOSs.These criteria are the Spearman Rank Correlation Coefficient (SRCC), Kendall's Rank Correlation Coefficient (KRCC), Pearson Linear Correlation Coefficient (PLCC), and Root Mean Squared Error (RMSE).These criteria are defined as follows: where n j represents the number of distortions present in a given database and n cj and n dj indicate the total number of consistent and inconsistent in the database.The dataset provides Mean Opinion Scores (MOS j ) that define the subjective quality assessment, while Qs j represents the objective quality score obtained through a specific method.

Experimental Results
This section deals with our experimental results, including the performance analysis of the studied networks, the ablation study, results achieved through comparisons with state-of-the-art methods, and cross-database evaluations.

Network Performance
Convolutional Neural Networks (CNNs) come in various architectures and configurations, each designed for specific tasks and use cases.To explore how CNNs impact performance quality, we execute experiments using six distinct CNN architectures pretrained on the ImageNet dataset: MobileNet, ResNet, DenseNet, ResNeXt, SE-ResNet, and VGG.The characteristics of these networks are illustrated in Table 1 and their corresponding results can be found in Tables 2 and 3. We evaluate the impact of network architecture and depth on the proposed method's performance.Residual Networks (ResNets) demonstrate progressive improvement with increasing depth, with ResNet34 achieving better inter-database correlation than ResNet18.However, ResNeXt50 exhibits superior performance on SJTU and ICIP2020 datasets, but struggles on the WPC dataset.MobileNetV2 stands out for its consistent performance across all datasets despite having fewer parameters and lower memory footprint.DenseNet201 suffers from performance degradation and higher Root Mean Squared Error (RMSE) across databases.SE-ResNet50 emerges as the top performer with exceptional metrics such as Pearson Correlation Coefficient (PLCC), Spearman Rank Correlation Coefficient (SRCC), Kendall Rank Correlation Coefficient (KRCC), and consistently low RMSE across all datasets.Surprisingly, VGG16 and VGG19, known for strong performance, excel on all metrics, demonstrating remarkable generalizability, particularly on SJTU and ICIP2020 datasets.
Results obtained from Tables 1-3 reveal a classic complexity-accuracy trade-off within the evaluated DCNN architectures.While deeper models such as VGG-16 and VGG-19 achieve higher accuracy, they exhibit significantly larger numbers of parameters and require more computational resources for inference compared to lightweight models such as MobileNetV2.ResNet-101, SEResNet-50, ResNeXt-50, and DenseNet-201 offer a potential middle ground, balancing accuracy with computational efficiency.However, these models come at the cost of increased memory footprint and processing power compared to shallower architectures such as ResNet-18 or ResNet-34.Our evaluation of the impact of varying depth and parameter counts across MobileNet, ResNet, DenseNet, and VGG models suggests that these factors may not significantly influence performance outcomes.This finding warrants further investigation to determine the optimal architecture for specific tasks considering the application's resource constraints and target accuracy requirements.
Our experiments demonstrate that the choice of network architecture significantly impacts model performance, even more so than increasing the depth or number of parameters in the model.When selecting a model, it is crucial to consider the application's specific requirements.If prioritizing real-time performance or running on mobile devices is essential, then a lightweight model like MobileNetV2 might be preferable.On the other hand, if achieving the highest possible accuracy is the primary goal, then deeper models such as VGG16 or SE-ResNet50 are more suitable options.To evaluate our method's effectiveness against leading benchmarks, we opted to leverage the pre-trained VGG16 convolutional neural network (CNN).The training and validation curves for VGG16 on all three databases are shown in Figures 4-6.Since the suggested approach functions directly from the 3D model, we also pay attention to computational efficiency.The results of execution time on the SJTU-PCQA database are presented in Figure 7.The proposed method has an average time cost of 29.27 s, compared to approximately 39.14 s for PCQM.Although the FR method for PCQM needs to load both deformed and reference point cloud models simultaneously, while the proposed NR method only needs to load the deformed point cloud model, our method demonstrates a lower average time cost.This suggests that our approach achieves relatively significant computational efficiency.

Ablation Study
To assess the effectiveness and contributions of various types of features perceptual and geometry, we conducted individual performance tests for each feature.This allowed us to analyze the contributions of features by assessing their performance within various combinations.Tables 4-6 present the performance outcomes of the ablation study.Where 'Anis', 'Lin', 'Plan', 'Sph', 'Omni', 'Sph_fit', 'Eigen', 'Curv', 'Sal', 'All_geom', and 'All_perceptual' correspond to Anisotropy, Linearity, Planarity, Sphericity, Omnivariance, Sphere-fit, Eigenentropy, Curvature, Saliency, all geometry features, and all perceptual features, respectively.Additionally, 'L', 'A', and 'B' represent the luminance and chrominance channels within the LAB color space.
For perceptual features, 'Curv' is the best performer, while 'L', 'A', 'B', and 'Sal' have less individual impact.Merging 'L', 'A', and 'B' followed by 'Curv' integration results in substantial gains, emphasizing the importance of their collective influence.The 'All_perceptual' set, regrouping all perceptual features, demonstrates significant performance, suggesting a complementary interaction between these features.
Finally, integrating geometric and perceptual features, the 'All' combination outperforms all individual and grouped sets across all metrics.This comprehensive set not only achieves optimal performance but also underscores the efficacy of combining complementary features for enhanced point cloud quality assessment.
Analyzing the SJTU dataset (Table 5) reveals interesting insights into feature performance.Among geometric features, 'Plan' exhibits the strongest overall performance across all metrics (PLCC, SRCC, KRCC).'Sph' follows closely, particularly excelling in PLCC and SRCC compared to 'Plan'.While 'Anis' demonstrates decent performance, particularly in SRCC, it falls behind 'Plan' and 'Sph'.'Omni' achieves the highest PLCC but struggles with SRCC and KRCC compared to 'Sph'.'Sph_fit' stands out with significant improvements across all metrics, surpassing other geometric features in a performance jump.However, 'Engein' performs similarly to the moderate 'Anis'.Combining all geometric features in 'All_geom' significantly improves individual or partial combinations, highlighting the benefit of feature fusion.For perceptual features, 'L', 'A', and 'B' exhibit moderate performance, with 'B' slightly edging out the others.'Curv' demonstrates improvement over this group, while 'Sal' substantially gains performance.As with geometric features, combining these perceptual features into 'All_perceptual' leads to further metric improvements.
Finally, the 'All' combination, integrating geometric and perceptual features, outperforms all individual and partial combinations across all metrics.This comprehensive feature set stresses the importance of incorporating diverse feature information for optimal point cloud quality assessment.The observed incremental improvements with feature additions, the impact of combining features on metrics, and the context-dependent selection of features all emphasize the effectiveness of a holistic approach in this task.
Across the three evaluated datasets, the results indicate that geometry features play a more significant role in determining the final quality score.This observation could be attributed to the fact that three databases contain a greater variety of geometry distortions than perceptual distortions, and human perception of point clouds tends to place a higher emphasis on geometry-related information.
The experimental outcomes of PCQA using the SJTU-PCQA, WPC, and ICIP2020 databases are presented in Tables 7-9.The top-performing results in each column are highlighted in bold.Across all three databases, it is evident that the FR-PCQA methods (PSNR [5], M-p2po [24], M-p2pl [4], H-p2po [24], and H-p2pl [4]) demonstrate comparatively lower performance.This can be attributed to their reliance solely on geometric structure without incorporating color information.In contrast, superior performance is observed in metrics such as MMD [28], PSNRYUV [40], PCQM [3], GraphSIM [6], PointSSIM [37], and TCDM [59], which includes color information for assessing point clouds.However, it is important to note that evaluating these methods relies on reference information, a component often unavailable in practical applications.Regarding RR methods, The PCMRR metric produces poor results for all correlation metrics.This might be explained by the extensive use of features within their method, which could make it less generalized to different types of degradation.performance drops from the SJTU-PCQA and ICIP2020 databases to the WPC database because the latter contains more complex distortion parameters, which are more difficult for PCQA models.Moreover, within the SJTUPCQA database, distorted point clouds contain mixed distortions, while the WPC database introduces a single type of distortion to individual point clouds.Point clouds with mixed distortions appear to exhibit greater quality distinguishability when subjected to similar distortion levels.Furthermore, the WPC database contains twice the number of reference point clouds compared to the SJTU-PCQA database.Our approach exhibits a relatively small drop in performance compared to most other methods.For instance, when moving from the SJTU-PCQA database to the WPC database, our method demonstrates a decrease of 0.03 in PLCC and 0.02 in SRCC values.Top-performing PCQA methods, except for PQA-Net, exhibit a larger performance decline of 0.15 and 0.14 in both PLCC and SRCC values, respectively.Therefore, it is clear that our approach is more robust to more complex distortions.The overall effectiveness may not accurately reflect the performance for specific distortion types.Consequently, we assess how FR, RR, and NR metrics perform in the face of various point cloud distortions across the three databases.Evaluation measures such as PLCC and SRCC scores are presented in Tables 10-12.The top performance for each distortion type within each database is highlighted in bold, indicating the best results among all competing metrics.
On the ICIP2020 database (Table 10), our method surpasses all the compared methods across various distortion types (VPCC, G-PCC Trisoup, and G-PCC Octree), demonstrating superior performance across the entire database.Within the SJTU-PCQA database (Table 11), our method shows the strongest correlation coefficient outcomes across all distortions, outperforming the state-of-the-art metrics in both PLCC and SRCC correlation coefficients.It is important to highlight that the correlation values of our method and all other state-of-the-art methods are lower in the SJTU-PCQA dataset compared to the ICIP2020 dataset.This difference could be explained by the various types of degradation present in the two databases.While the ICIP2020 database mostly features compression-related distortions, the SJTU database presents more difficult degradation types such as acquisition noise, resampling, and their combinations (Octree-based compression (OT), Color noise (CN), Geometry Gaussian noise (GGN), Downsampling (DS), Downscaling and Color noise (D + C), Downscaling and Geometry Gaussian noise (D + G), and Color noise and Geometry Gaussian noise (C + G)).We conduct a comparison of PLCC and SRCC values for each of the seven degradation types.As depicted in Table 11, our model demonstrates robust performance across all degradation types, exhibiting strong correlations with the subjective quality scores.evaluating point cloud quality, indicating its potential for more accurate quality evaluations compared to existing NR-PCQA methods such as 3D-NSS, PQA-net, and MM-PCQA within this context.

Conclusions
In this paper, we have introduced a novel methodology for assessing the quality of 3D point clouds using a one-dimensional model based on the Convolutional Neural Network (1D CNN).Through extensive experiments and evaluations, we have demonstrated the effectiveness of our approach in predicting subjective point cloud quality under various distortions.Our model consistently outperformed all competing methods by leveraging transfer learning and focusing on geometric and perceptual features.
The results of our evaluations across different distortion types and databases provide valuable insights into the performance of the proposed method.Our model achieves robust performance across all distortion types within each database in the ICIP2020 and WPC databases by recording top correlation coefficient results across all distortions.
The success of our approach can be attributed to its ability to effectively capture and analyze geometric and perceptual features in 3D point clouds, enabling accurate quality assessment without the need for reference information.The model's generalization capability, as demonstrated in cross-database evaluations, further highlights its potential for real-world applications.
In conclusion, the proposed method is a promising solution for automated point cloud quality assessment, offering enhanced accuracy and reliability compared to existing techniques.By combining advanced deep learning strategies with transfer learning, our approach advances the field of point cloud quality assessment and opens up new possibilities for improving visual quality evaluation in diverse domains.
, Diniz et al. extracted local descriptors that capture geometry-aware texture information from the point clouds.These descriptors include the Local Color Pattern (LCP) and various adaptations of the Local Binary Pattern (LBP) descriptor.The statistics of these descriptors are computed and used to determine the objective quality score.

Figure 1 .
Figure 1.A suggested model based on transfer learning using 1D CNN.

Figure 2 .
Figure 2. The architecture of the 1D CNN Model.
contains 420 point cloud samples derived from 10 reference point clouds.Each reference point cloud undergoes seven common types of distortions at six different levels.More precisely, the distortions are acquired using compression based on Octree (OT), noise in color (CN), noise in Gaussian geometry (GGN), Downscaling (DS), Downscaling and Color noise (D + C), Downscaling and Geometry Gaussian noise (DCG), and noise in color combined with Gaussian geometry (C + G).However, only 9 reference point clouds and their corresponding distorted samples are publicly available, resulting in 378 (9 × 6 × 7) point cloud samples for our experiment.Mean Opinion Scores (MOSs) are provided within the range of [0,1].•

Figure 4 .
Figure 4. Training and validation loss of 1D CNN VGG16 on the ICIP2020 dataset.

Figure 5 .
Figure 5. Training and validation loss of 1D CNN VGG16 on the SJTU dataset.

Figure 6 .
Figure 6.Training and validation loss of 1D CNN VGG16 on the WPC dataset.

Figure 7 .
Figure 7.The results of the time cost of point clouds using 1D CNN VGG16 on the SJTU dataset.

Table 1 .
The characteristics of the employed networks.

Table 2 .
Performance of different architectures on the ICIP2020.The best two performances for each database is emphasized in bold.

Table 3 .
Performance of different architectures on the SJTU-PCQA and WPC.The best two performances for each database is emphasized in bold.

Table 4 .
Results of the ablation study's performance on ICIP2020.The optimum performance for each database is highlighted in bold.

Table 5 .
Results of the ablation study's performance on SJTU.The optimum performance for each database is highlighted in bold.

Table 6 .
Results of the ablation study's performance on WPC.The optimum performance for each database is highlighted in bold.

Table 7 .
Performance comparison with the state-of-the-art on the ICIP2020.

Table 10 .
Performance comparison with the state-of-the-art on each distortion type on ICIP2020.

Table 13 .
Evaluation across different databases, where the label "WPC→SJTU" denotes that the model undergoes training on the WPC database and then validation using the standard testing configuration of the SJTU database.