Point Cloud Upsampling Algorithm: A Systematic Review

: Point cloud upsampling algorithms can improve the resolution of point clouds and generate dense and uniform point clouds, and are an important image processing technology. Signiﬁcant progress has been made in point cloud upsampling research in recent years. This paper provides a comprehensive survey of point cloud upsampling algorithms. We classify existing point cloud upsampling algorithms into optimization-based methods and deep learning-based methods, and analyze the advantages and limitations of different algorithms from a modular perspective. In addition, we cover some other important issues such as public datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future research directions and open issues that should be further addressed.


Introduction
The point cloud is the standard output of 3D scanning. As a compact 3D data representation and an effective means of processing 3D geometric figures, point clouds have become more and more popular [1]. However, due to hardware limitations, 3D sensors such as LiDAR usually produce sparse, noisy, and non-uniform point clouds, especially for small objects or objects far away from the camera. This has also been proven in various public benchmark datasets, such as KITTI [2], SUN RGB-D [3], and ScanNet [4]. Although 3D sensing technology has made significant progress in recent years, it is still an expensive and time-consuming task to obtain a point cloud with high density and complete details. The sparsity and noise of point clouds affects their application in various fields, such as 3D shape classification, 3D object detection, and 3D object segmentation. Clearly, it is necessary to amend raw point cloud data.
Point cloud upsampling means that the given sparse, noisy, and non-uniform point cloud is upsampled to generate a dense, complete, and uniform point cloud [5]. Under current conditions, this problem is very challenging. Unlike the image data representation in a computer, which usually encodes the spatial relationship between pixels, point cloud data are represented by a set of disordered data points. Therefore, there are many difficulties in upsampling point clouds, and there are few related works. With the development of deep learning, models such as PointNet [6] have provided new solutions for processing point cloud data. As a result, point cloud upsampling tasks have gradually attracted the attention of researchers.
In this paper, we provide a comprehensive review of point cloud upsampling. We introduce optimization-based point cloud upsampling and deep learning-based point cloud upsampling, and focus on deep learning-based point cloud upsampling. Figure 1 shows the taxonomy of point cloud upsampling covered in this review in a hierarchically structured way. The main contributions of this work are as follows: (1) We provide a comprehensive review of point cloud upsampling, including benchmark datasets, evaluation metrics, optimization-based point cloud upsampling, and deep learning-based point cloud upsampling. To the best of our knowledge, this is the first survey paper that comprehensively introduces point cloud upsampling. (2) We provide a systematic overview of recent advances in deep learning-based point cloud upsampling in a component-wise manner, and analyze the strengths and limitations of each component. (3) We compare the representative point cloud upsampling methods on commonly used datasets. (4) We provide a brief summary, and discuss the challenges and future research directions. The structure of this paper is as follows. Section 2 introduces the datasets and evaluation metrics for point cloud upsampling. Section 3 reviews the methods for point cloud upsampling based on optimization. Section 4 reviews the point cloud upsampling based on deep learning from the perspective of components. Section 5 compares and analyzes representative point cloud upsampling methods. Section 6 discusses future directions and open issues.

Datasets
At present, there are various point cloud datasets used to evaluate the performance of point cloud upsampling algorithms in different applications. These datasets have great differences in sample number, quality, resolution, and diversity. We list some typical datasets used for point cloud upsampling in Table 1.
During experiments, researchers usually downsample the ground truth and then upsample the downsampled point cloud to compare the generated point cloud with the ground truth, and then evaluate the quality. Commonly used downsampling methods include Poisson disk sampling, random downsampling, curvature-based downsampling, and farthest point sampling. Some researchers choose to build their own datasets for training and testing of point cloud upsampling models. These datasets are designed specially to train and test the upsampling algorithms, and the number of sample data inside are normally less than those of the typical datasets listed in Table 1. Some datasets constructed by researchers for point cloud upsampling are listed in Table 2.

Evaluation Metrics
The current evaluation metrics mainly evaluate the quality of the upsampled point cloud from two aspects: the deviation between the ground truth and the generated point cloud and the uniformity of the generated point cloud.
Surface Deviation (SD) [5]. This is defined to find the closest point y on the mesh for each predicted point x; the distance between them is then calculated, and the mean and standard deviation are finally computed over all the points.
Chamfer Distance (CD) [16]. This is the sum of positive distances, which is defined as an unsigned distance function, usually the distance between two curves or two 2D images. Applying CD to 3D space, it is defined as follows: where S 1 , S 2 represent two sets of 3D point clouds respectively. The formula represents the sum of the minimum distance from any point x to S 2 in S 1 , plus the sum of the minimum distance from any point y to S 1 in S 2 .
Hausdorff Distance (HD) [17]. HD measures the distance between proper subsets in the metric space. A proper subset is defined as a finite (possibly infinite) set of numbers of elements (points). HD distance can be viewed as the maximum value of the shortest distance from a point set to another, and is defined as follows: where sup and inf define the supremum and infimum calculations. Earth Mover's Distance (EMD) [18]. EMD is the histogram similarity measure based on the efficiency of transportation problems. It is a normalized minimum cost of changing from one distribution to another. It measures, in a certain feature space, the difference between two multi-dimensional distributions. It is defined as follows: where φ : S 1 → S 2 is the bijection mapping. The formula finds a bijection φ between the point sets S 1 and S 2 , which are one-to-one corresponding, so that the sum of Euclidean distances calculated by them is the smallest. Point to Surface (P2F). Evaluation indicators such as CD and HD evaluate the deviation from point to point, whereas P2F measures the distance of the generated point from the surface of the original point cloud. It is the distance between each point and its closest plane. Unlike CD, HD, and other indicators that can be calculated under the XYZ format, P2F requires raw data in mesh format.
In addition to evaluating the deviation between point clouds, the uniformity of point clouds is also an important evaluation indicator.
Normalized Uniformity Coefficient (NUC) [5]. PU-Net defines NUC, which randomly places D disks of equal size on the surface of the generated point cloud, calculates the standard deviation of the points in the disk, and then normalizes each object's density and calculates the overall uniformity of the set of points across all objects in the test dataset. Define NUC using the percentage of disk area p: where n k i is the number of points within the i-th disk of the k-th object. N k is the total number of points on the k-th object. K is the total number of test objects. p is the percentage of the disk area over the total object surface area.
Uniform metric in PU-GAN [13]. NUC ignores the local clutter of points and cannot distinguish between different disks containing the same number of points. Another evaluation metric for evaluating the uniformity of point clouds is proposed in PU-GAN to avoid this problem, and is defined as follows: where M is obtained by farthest sampling of the generated point cloud Q. S j is the point set obtained using a ball query for each point in M with radius r d .n =Q × r 2 d is the expected number of points in S j . d j,k is the distance from each point in S j to its k nearest neighbors.
is the expected distance of the point in the uniform point cloud to its k nearest neighbors. The deviation of S j fromn, d j,k fromd is evaluated using a chi-squared model. F-Score [19]. The above two mainstream methods are susceptible to the influence of outliers. AR-GCN [20] uses the F-score to evaluate the quality of generated point clouds by manipulating the upsampling of point clouds as a classification problem. It evaluates the precision by checking the percentage of points in the generated point cloud or ground truth that can find a neighbor from the other dataset within a certain threshold τ. Then, it calculates the F-score as the harmonic mean of precision and recall. For this metric, larger is better.

Optimization-Based Point Cloud Upsampling
In 2003, Alexa et al. [1] proposed the first algorithm for point cloud upsampling. They upsample a point set by interpolating points as vertices of a Voronoi diagram on the moving least squares (MLS) surface. It takes any three points on the plane and draws a Voronoi diagram. Each Voronoi vertex is the center of a circle that touches three or more of the points without any point inside. After obtaining the Voronoi diagram, it selects the center of the circle with the largest radius and projects it onto the MLS surface. The result is an upsampled point. This process is repeated until the radius of the largest circle is smaller than the specified threshold. The local approximation method is chosen to improve the calculation efficiency.
Subsequently, Lipman et al. [21] proposed a non-parameterized point resampling and surface reconstruction method and applied it to point cloud upsampling. The locally optimal projection operator (LOP) is introduced to approximate the surface from the point set data, which can be used to project any set of points onto the input point cloud. After performing multiple LOP iterations on the point set, the initial point set can be upsampled. The operator is non-parameterized and does not rely on estimating local normals, fitting local planes, or using any other local parameter representation. This method works well in situations in which the orientation is not clear and the geometry is complex. Huang et al. [22] made modifications and extensions based on LOP. They proposed a weighted locally optimal projection (WLOP), which adds local adaptive density weights to LOP to make the original point cloud distribution more even. The irregular particle distribution produced by the original LOP operator may cause some closed-cell defects when generating the surface, and WLOP can improve this problem. Later, Preiner et al. [23] proposed a WLOP operator based on a Gaussian mixture describing the input point density, called Continuous LOP (CLOP). The Gaussian mixture model was used to describe the point cloud density's geometric maintenance method, making it suitable for more compact and continuous point cloud representation. Compared with WLOP, CLOP adopts more particles than input points, generating better point cloud upsampling results.
None of the above solutions consider how to deal with sharp features, and some methods require reliable normals as part of the input. Thus, Huang et al. [24] proposed a resampling method, edge-aware resampling (EAR), which relies on the median to deal with noisy and possibly outlier point sets in an edge-aware manner. It resamples from the edge so that a reliable normal can be calculated at the sampling point, based on which the orientation point is inserted and projected onto the potential surface, which is an unknown base surface defined by the input point set. Then, it determines the bottom surface, direction, and distance of the projection. To correctly handle the sharp features, the position information and normal information are added in the above steps to give the projection operator bilateral and edge perception. Repeating the above-mentioned upsampling process and incrementally filling the gaps along the edge, singular points can reconstruct sharp features while increasing or decreasing the point density.
Wu et al. [25] defined the concept of deep points and proposed a consolidation method based on deep points. Based on EAR, new samples close to the input data are projected onto the basic surface using bilateral projection. This can effectively restore small geometric details. In addition, bilateral normal smoothing can be performed to adjust the surface points' normals to better retain clear features on the merged surface.
Compared with LOP, the above two edge-sensing upsampling methods made specific improvements but still have a certain degree of smooth surface transition. Dinesh et al. [26] proposed a 3D point cloud super-resolution local algorithm based on the graph total variation (GTV). For each point set, to promote piece-wise smoothness in reconstructed 2D surfaces while preserving the original point coordinates, the GTV target adjacent to the surface normal was designed and the point cloud upsampling problem was defined as the minimum GTV problem. The authors used part of the Stanford 3D scanning repository data to verify the algorithm, and selected two evaluation criteria, point-to-point and pointto-plane, to quantitatively evaluate the algorithm model.
In general, although optimization-based methods can achieve the purpose of upsampling point clouds to a certain extent, they are not data driven and have significant limitations. They rely on priors, such as normal estimation and the hypothesis of smoothness surfaces with fewer features. These methods also struggle with the preservation of multiscale structures.

Deep Learning-Based Point Cloud Upsampling
With the introduction of network models such as PointNet [6], PointNet++ [27], and DGCNN [28], irregular point clouds can be directly used for training. To benefit from this approach, the application of deep learning to point clouds has gradually become a popular research topic, and point cloud upsampling models with deep learning have also achieved a variety of results. Deep learning-based point cloud upsampling can be divided into supervised point cloud upsampling and unsupervised point cloud upsampling.

Feature Extraction Components
The first step in point cloud upsampling using deep learning models is to extract point cloud features. Several different feature extraction components are introduced here.

PointNet-Based Feature Extraction
Yu et al. [5] proposed the first deep learning model for point cloud upsampling, PU-Net, in which two feature learning strategies are used, hierarchical feature learning and multi-level feature aggregation. For hierarchical feature learning, PU-Net adopts the hierarchical feature learning mechanism proposed in PointNet++ [27] as the frontal part of the network. In order to obtain more of the local context, PU-Net specifically uses a relatively small grouping radius in each layer. For multi-level feature aggregation, PU-Net first uses the interpolation method in PointNet++ to upsample the downsampled point features in hierarchical feature learning and restore all original point features. Then it uses convolution to reduce the dimensionality of the interpolated features at different levels to the same dimension. Finally, the features of each level are concatenated as embedded point features. DensePCR [29] and EC-Net [12] also use similar feature extraction strategies.
Zeng et al. [30] proposed spatial feature extractor block (SFE block) to replace Point-Net++ to extract local features. Compared with PU-Net's point feature embedding, SFE block exploits local point relationships to extract rich local details. In particular, each point in the local region has different effects on the local spatial features, which represent different spatial distributions of the local geometry. After combining point-to-point features, the extracted features from the point cloud and local geometry can be captured more accurately.
The feature extraction method based on PointNet can combine global and local features, but requires an additional point set downsampling and interpolation process, which consumes more computing resources.

Graph Convolution-Based Feature Extraction
Wang et al. [16] proposed a multi-step point cloud upsampling network (MPU), which was inspired by dynamic graph convolution to define local neighborhoods in feature space. Point features are extracted from local neighborhoods through a k-nearest neighbors (kNN) search based on feature similarity. This method does not require point set subsampling to obtain long-range and non-local information. Specifically, the feature extraction unit consists of a dense sequence of blocks, where the MPU converts the input into a fixed number of features, uses a feature-based kNN to group the features, refines each grouped feature through a tightly connected multilayer perceptron (MLP) chain, and finally obtains point features through maxpooling. The MPU introduces dense connections within and between blocks. This connection style enables explicit information reuse, which improves the reconstruction accuracy while significantly reducing the model size. This feature extraction method has applications in PU-GAN [13] and PU-EVA [31]. GC-PCU [32] simplifies this feature extraction method into a shallow-and-wide structure; only two extraction blocks are involved, and the number of channels is increased before activation.
Qian et al. [14] proposed a graph convolutional network-based point cloud upsampling model, PU-GCN, which is a new Inception DenseGCN feature extractor, and integrates the densely connected GCN (DenseGCN) module from DeepGCNs [33] into the Inception module of GoogLeNet [34]. Inception DenseGCN first compresses features through a set of MLPs to reduce the amount of computation, then passes the compressed features into two parallel DenseGCNs and a global pooling layer, and finally splices to obtain multi-scale feature information.
AR-GCN [20] also uses graph convolution blocks to extract features. Unlike MPU and PU-GCN, it introduces residual connections between different convolution blocks instead of dense connections. PUGeo-Net [15] adds a feature re-calibration module on the basis of using DGCNN to extract features. Multi-scale features are recalibrated through one layer of MLP and one layer of softmax. Zhao et al. [35] introduced a channel attention mechanism in PUI-Net to extract features. They calculate the feature mean of each channel, and control the features of each dimension through two fully connected layers, which are spliced with the extracted features to form the output features.
The feature extraction method based on graph convolution can extract local and global features more effectively, has fewer parameters and is easy to train, and has been widely used.

Upsampling Component
The main function of the upsampling component is to expand the feature space, which is equivalent to expanding the number of points, because points and features are interchangeable. Figure 3 shows several common upsampling component frameworks.
Multi-branch upsampling. As the first deep learning-based point cloud upsampling model, PU-Net [5] uses a multi-branch feature expansion module to expand features through multiple parallel sub-pixel convolutional layers. SPE-Net [30] adopts a similar upsampling strategy.
This approach can lead to agglomeration of points around the original point location, which needs to be mitigated by introducing a repulsion loss. GC-PCU [32] introduces perturbation learning to try to solve this problem. It applies an MLP learning 2D perturbation to each set of features after feature expansion. Different convolutions are used for each set of features, so that the weighting parameters are not shared across the MLPs. In this manner, the resulting perturbation depends on the shape of the input point cloud, and thereby guarantees the geometric consistency. These perturbations are then appended to the duplicated features for further residual learning. In residual learning, three convolution operations are performed to map input features to residual values. Then the input features are added to these residuals using skip connections to further refine the features, resulting in expanded features. Multi-step upsampling. Multi-step supervision is a common practice in image superresolution. The MPU [16] introduces this mechanism into point cloud upsampling. The MPU uses an upsampling unit that consists of a feature extraction unit and a feature expansion unit (in Section 4.1.1). For the feature expansion component, the MPU first duplicates the features, then assigns each duplicated feature a 1D code, with a value of −1 or 1, to transform them to different locations. Finally, MPU compresses the duplicated features using a set of MLPs as residuals, and adds residuals to input coordinates to generate output points. The MPU introduces inter-level skip-connections between upsampling units for features extracted with different scopes of the receptive fields. AR-GCN [20] adopts the same upsampling strategy. An upsampling unit is formed using the residual graph convolution block and the unpooling block to progressively upsample the point cloud. The unpooling block predicts the residuals of the input and output point clouds through a graph convolutional layer. This exploits the similarity between the input and output point clouds, resulting in faster convergence and better performance.
This multi-step upsampling method has better geometric detail and lower noise, but is computationally expensive, and requires more data to supervise the mid-term output of the network.
NodeShuffle. PixelShuffle has achieved success in the field of image super-resolution, and inspired PU-GCN [14] and led to the proposal of NodeShuffle. NodeShuffle uses graph convolution layers to expand features, and rearranges the expanded features through shuffle operations. NodeShuffle employs graph convolutions instead of CNNs to expand features, enable the upsampler to encode spatial information from point neighborhoods, and learn new points from the latent space, rather than simply duplicating the original points. PU-GACNet [36] improves NodeShuffle and proposes edge-aware NodeShuffle (ENS). The ENS module can not only smoothly expand local point features but also properly emphasizes local edge features with graph convolution operations.
Up-and-down sampling. Li et al. [13] introduced the mechanism of up-and-down sampling in PU-GAN. It upsamples the point feature to obtain the expanded feature, which is then downsampled to compute the difference between the features before and after upsampling. The difference is upsampled and added to the first-step expanded feature to self-correct the expanded feature. For the up-feature operator, PU-GAN first duplicates the input features and adopts the 2D grid mechanism in FoldingNet [37] to generate a unique 2D vector for each feature-map, and appends this vector to each point feature. A self-attention unit and a set of MLPs are then used to generate output upsampled features. The down-feature operator consists of a reshape operation and a set of MLPs. Up-UNet [38] also applies upsampling operations. Up-UNet first upsamples the point features through the up-feature operator, which can adjust the point features according to the adjacent point features through the channel attention operator while extracting the local point features. Then, in order to keep the consistency of the guided point cloud, the first N point features are split from the upsampling features. The first down-feature operator only conducts the sampling operation without changing the number of points to extract adjacent features, which extracts neighboring information and builds the relation of closing points. The second down-feature operator performs real downsampling to extract key point and important point features. Then, through continuous upsampling operations, together with extension paths, the network can propagate context information and reconstruct extended features.
This approach is better able to mine the deep relationship between the generated point cloud and the original point cloud, thus providing higher quality upsampling results.
Disentangled refinement. Li et al. [39] proposed a network model for disentanglement refinement, Dis-PU, which divides upsampling into two steps, a feature expansion unit and a spatial refinement unit. The feature expansion unit first expands the features through regular expansion operations and generates rough point sets through a set of MLPs. The spatial refiner is used to further fine-tune the spatial position of each point in the coarse point set and generate a high-quality dense point set Q with uniform distribution. Coarse but dense point clouds and associated features are fed into local and global refinement units. The two outputs generated by the two refinement units are added to obtain the refined feature map. Finally, residual learning is used to regress the offset Q at each point.
This upsampling strategy allows each sub-network to better focus on its specific sub-goal, while complementing each other in the upsampling task.
Meta upscale. The previous methods need to predefine the upsampling factor, such as training different upsampling modules for different factors, which is inefficient and limits the application of the model. Ye et al. [40] proposed a model that can be sampled at any scale, Meta-PU. Its backbone is based on a graph convolution network, which consists of several residual graph convolution blocks. It dynamically adjusts the weight of the residual graph convolution block by learning the meta-subnetwork. Then meta-convolution uses these weights to extract features, adaptively customize the scale factor, and jointly train multi-scale factors under the same model. PU-EVA [31] decouples the upsampling rate from the network structure, and adopts an approximate solution based on edge vectors to generate new points by encoding neighboring connectivity, enabling arbitrary upsampling rates in one-shot training.
Others. The above several methods achieve the purpose of upsampling by extending the features of the feature space, and other upsampling methods exist.
Wang et al. [41] proposed an interpolation-based point cloud upsampling model. Firstly, the dense point cloud is obtained using an interpolation algorithm based on dynamic point expansion, and then the coordinates of the insertion points are adjusted through two network models. Interpolation algorithms often result in some side effects, such as computational complexity, noise amplification, and blurred results. Therefore, the current trend is to replace interpolation-based methods with learnable upsampling layers.
PUGeo-Net [15] achieves point cloud upsampling through a purely geometric sampling method. It represents the 3D surface as a 2D parameter space, and then samples from the parameter space, and finally uses the learned Jacobian matrix and normal displacement to remap the amplified 2D parameter samples to the 3D surface. Considering that computing and learning global parameters consume huge computational resources, the researchers simplified the approach to a local parameterization problem for each point.
This method takes into account the geometric features of the input shape. However, their method requires additional supervision in the form of normals, which many point clouds do not have, such as those produced by LiDAR sensors.

Point Set Generation
The point set generation component is the last step of the upsampling model, and reconstructs the expanded features into 3D features. Compared with the feature extraction and feature expansion components, this component is simpler in structure and usually consists of one or more MLP layers, such as PU-Net [5], MPU [16], or PUI-Net [35]. Although the component is simple in structure, there is still a requirement for improvement. Through an edge distance regression component, EC-Net [12] learns the perturbation of the position of the generated point cloud relative to the original point cloud to obtain distance features. The distance feature and the extended feature are connected and input into the point set generation component, and the point coordinates are obtained through two MLPs. This helps to supplement missing edge points on the surface of regular objects. PU-GAN [13] applies farthest point sampling after one layer of MLP, which can further improve the uniformity of point set distribution. PU-EVA [31] obtains the regression displacement error through the learned neighborhood features, adds it to the point coordinates obtained by the MLP layer, and calculates the final coordinates of the output point.

Loss Function
The loss function is used to guide model optimization, resulting in higher quality point clouds.
Reconstruction loss. The reconstruction loss constrains the geometry of the generated points so that they are underlying on the target surface. Commonly used reconstruction loss functions are Chamfer distance, earth mover's distance, and Hausdorff distance. They can measure the similarity between two point clouds. Their definitions are given in Equations (1)-(3).
Uniform loss. To make the point cloud distribution more uniform, PU-GAN [13] proposes a uniform loss. The uniform loss assumes that the neighboring points are hexagonal, and the specific definition is given in Equation (6).
Repulsion loss. The point cloud generated by feature expansion is often located near the original point. In order to solve this problem, a repulsion loss is proposed in PU-Net [5], which is defined as follows: where S 2 is the number of output points, K(i) is the index set of the k-nearest neighbors of point x i , and ||·|| is the L2-norm. η(r) = r is called the repulsion term, which is a decreasing function to penalize x i if x i is located too close to other points in K(i). To penalize x i only when it is too close to its neighboring points, PU-Net adds two restrictions: (i) only consider points x i in the k-nearest neighborhood of x i , and (ii) use the fast-decaying weight function w(r) in the repulsion loss, w(r) = e −r 2 /h 2 . Adversarial loss. In recent years, GANs [42] have received extensive attention due to their powerful learning ability. A GAN consists of a generator and a discriminator. The discriminator takes the generator output and ground truth as input, and distinguishes whether each input is the generator output. The generator and discriminator are optimized alternatively during GAN training. There are currently many models that use adversarial learning to assist in training upsampling models, such as PU-GAN [13], AR-GCN [20], PUSA-GAN [43], and CM-Net [44]. Usually, a least-squares loss [45] is used as an adversarial loss: where D(S 1 ) is the confidence value predicted by the discriminator from generator output S 1 . During the network training, the generator aims to generate S 1 to fool the discriminator by minimizing L G , while the discriminator aims to minimize L D to learn to distinguish S 1 from S 2 . Researchers usually combine multiple loss functions using weights to form a compound loss to train the model.

Unsupervised Upsampling
Existing deep learning-based upsampling methods mainly focus on supervised learning. However, because it is difficult to collect point clouds of the same object with different resolutions, the low-resolution point clouds in the training set are often obtained by downsampling the real point clouds. Therefore, a trained point cloud upsampling model inevitably learns the reverse process of downsampling. To learn upsampling without introducing manual downsampling priors, researchers have increasingly focused on unsupervised upsampling models. We briefly introduce several existing unsupervised point cloud upsampling models.
To learn the point cloud's entire structure and local structure simultaneously, Liu et al. [46] proposed a new autoencoder, local to global autoencoder (L2G-AE). The benefit of local to global reconstruction design is that L2G-AE can be applied to the application of unsupervised point cloud upsampling. This was the first method to use deep neural networks for unsupervised upsampling. Unlike the results of PU-Net and EC-Net obtained from the input upsampling in a supervised manner, L2G-AE obtains the local reconstruction result and downsamples it to the target level. L2G-AE is not as effective as the first two networks in some categories, due to its unsupervised learning method and the inability to see ground truth labels.
Although L2G-AE can perform unsupervised upsampling by reconstructing overlapping local areas, it focuses on capturing global shape information through local to global reconstruction. Limiting the network to capture the inherent upsampling mode generates a high-quality upsampling point set. For the shortcomings in L2G-AE, Liu et al. [47] proposed a new self-supervised point cloud upsampling model, SPU-Net. Its framework includes two main parts: point feature extraction and point feature expansion from coarse to fine. In point feature extraction, the self-attention module is combined with the graph convolutional network. The context information within and between the local regions is captured at the same time. In the point feature expansion, a hierarchical and learnable folding strategy is introduced to generate an upsampled point set with a learnable two-dimensional grid. To further optimize the noise points in the generated point set, the author proposes a new self-projection optimization, which is associated with joint loss, reconstruction loss, and uniform loss as a joint loss to promote self-supervised point cloud upsampling. SPU-Net does not require 3D ground truth dense point cloud supervision and can repeatedly upsample from the downsampled patch, is not limited by paired training data, and can retain the original data distribution.

Other Methods of Point Cloud Upsampling Models
In addition to the mainstream upsampling models mentioned above, other techniques can further improve point cloud upsampling models.
Zhang et al. [48] proposed a point cloud upsampling model that uses the entire object model as the input and can learn potential features in the point cloud belonging to different object categories. They studied the effects of random downsampling and curvature-based downsampling, in addition to upsampling at different magnifications. Similarly, this model also has some limitations. It cannot effectively process defective point cloud data because it learns the entire object's features, which limits its application to low-resolution inputs.
Naik et al. [49] proposed a network structure that can learn point cloud normals and color features. Its network structure is constructed as a variant of PU-Net, and other features and coordinates of the point cloud are used as model inputs. Although the network's sampling effect is not outstanding, adding the normal and color of the point cloud to the model for training can retain features other than the shape, which is a research direction having high potential.
Wang et al. [50] proposed a sequential point cloud upsampling framework to generate fine-grained and temporally consistent upsampling results for dynamic point cloud sequences. They extract features from multiple low-resolution point clouds (such as previous/current/subsequent frame) and fuse the features to perform an upsampling operation. The model can capture multi-scale information of dynamic sequences and improve the upsampling effect. This model also has significant limitations, including requiring a continuous point cloud input and consuming a large quantity of computing resources.

Algorithm Comparison and Analysis
Since there is currently no recognized benchmark dataset, researchers typically choose to collect 3D models from existing public datasets for algorithm training and testing. In addition, there is no consensus on which evaluation metrics to use. This makes it difficult to compare different models. We selected the dataset provided by PU-GAN [13], which is relatively widely used. The dataset contains 147 3D models, in which 120 models were randomly selected for training, and the rest were used for testing. For EAR, we employed the released demo code to generate the results. For other deep learning-based upsampling models, we used their public code and retrained their networks with the dataset provided by PU-GAN. We conducted experiments on a NVIDIA 2080Ti GPU. We chose CD, HD, P2F and uniformity as evaluation metrics to perform a simple comparative analysis among different algorithms. The smaller the evaluation metrics, the better. The quantitative comparison results are shown in Table 3. For uniformity, the results of PU-Net are worse than those of EAR, which is due to its simple structure. Although the PointNet-based feature extraction method can combine global and local features, it does not perform well, and a simple multi-branch feature extension component also causes the generated points to be too similar to the input. At the same time, the point set generation component does not do much more than a set of MLPs. These factors lead to the poor uniformity of point clouds generated by PU-Net. The MPU introduces a multi-step progressive upsampling and uses dynamic graph convolution to extract point cloud features, which greatly improves the performance of the algorithm. PU-GAN further improves the performance of the upsampling algorithm in generating uniform point clouds by introducing uniform loss and adversarial loss. However, the GAN network is difficult to train and the design of a suitable discriminator is challenging. Currently, the best performer is PU-EVA, which interpolates new points by endowing geometric information of the target objects, and obtains the best results on the uniformity evaluation metric.
The three metrics P2F, CD, and HD were used to evaluate the difference between the generated point cloud and the original point cloud. PU-GAN achieves the best performance on P2F, whereas PU-GCN achieves the best performance on CD and HD. The excellent performance of PU-GAN on P2F benefits from the up-and-down sampling structure and the adversarial training strategy, which makes the generated points closer to the surface of the object. NodeShuffle adopted by PU-GCN makes the generated point cloud closer to the original point cloud. In particular, for Dis-PU, although the results are not outstanding, the proposed disentangled refinement framework has great potential, and further research on this basis may achieve better results.
Considering the computation, PU-Net uses the most parameters, but does not perform well in terms of various evaluation metrics. MPU achieves better performance by introducing dynamic graph convolution and multi-step upsampling, and greatly reduces parameter usage. PU-GAN introduces adversarial loss to improve performance, but also uses more parameters. PU-GCN uses a feature extraction block, Inception DenseGCN, and an upsampling module, NodeShuffle, based on graph convolution; this achieves excellent performance with the fewest parameters and the best results on CD and HD. This demonstrates the superiority of graph convolution in point cloud upsampling tasks.
Although it is not as good as the models mentioned above in experimental results, SPU-Net, which is an unsupervised learning algorithm, still has merits. SPU-Net is not constrained by supervised information, does not need to obtain the label information of the dataset, and can directly obtain the characteristics of the data from the data itself and then complete the upsampling task.

Conclusions and Future Work
In this paper, we conduct an extensive survey of point cloud upsampling algorithms. We mainly introduce the algorithms based on optimization and those based on deep learning. Although some achievements have been made in point cloud upsampling, there are still many unsolved problems. The point cloud upsampling algorithm needs further research and improvement. Future research can focus on the following aspects: (2) Loss function. In addition to designing a good network structure, improving the loss function can also improve the performance of the algorithm. The loss function establishes constraints between the low-resolution point cloud and the high-resolution point cloud, and optimizes the upsampling process according to these constraints. Commonly used loss functions include CD, EMD, and Uniform, which are often weighted and combined into a joint loss function in practical applications. For point cloud upsampling, exploring the potential relationship between low-resolution and high-resolution point clouds and seeking a more accurate and effective loss function is a promising research direction. For example, current point cloud upsampling algorithms have difficulties in filling large holes. Exploring suitable inpainting loss functions to constrain the generated point clouds to fill holes is a promising research direction. (3) Dataset. At present, there is no universally recognized benchmark dataset. The datasets used by researchers for training and testing are very different, which is not conducive to the comparison between various models and subsequent improvement. Although very difficult, it is important to propose a high-quality benchmark dataset. (4) Evaluation metrics. Evaluation metrics are one of the most basic components of machine learning. If performance cannot be accurately measured, it will be difficult for researchers to verify improvements. Point cloud upsampling is currently facing such a problem and requires more accurate metrics. At present, there is no unified and applied evaluation metrics for point cloud upsampling. Thus, more accurate metrics for evaluating upsampling quality are urgently needed. (5) Unsupervised upsampling. As mentioned in Section 4.2, it is difficult to collect point clouds of the same object at different resolutions, and the low-resolution point clouds in the training set are often obtained by downsampling the real point clouds. Supervised learning may learn the inverse process of downsampling. Therefore, unsupervised upsampling of point clouds is a promising research direction. (6) Applications. Point cloud upsampling can assist other point cloud deep learning tasks. For example, SAUM [51] uses a point cloud upsampling module to achieve point cloud completion, and HPCR [52] uses point cloud upsampling to improve the point cloud reconstruction effect. GeoNet [53] learns geodesic-aware representations and achieves better results by integration with PU-Net. PointPWC-Net [54] uses an upsampling method to effectively process 3D point cloud data and estimate the scene flow from the 3D point cloud DUP-Net [55] uses an upsampling network to add points to reconstruct the surface smoothness to defend against adversarial attacks from other point cloud datasets. Varriale et al. [56] applied point cloud upsampling to cultural heritage analysis, which reduced hardware equipment costs and improved data accuracy. Applying point cloud upsampling to more specific scenes, such as target tracking, scene rendering, video surveillance, and 3D reconstruction, will attract increasing research attention.