Robust Semi-Supervised Point Cloud Registration via Latent GMM-Based Correspondence

Zhang, Zhengyan; Lyu, Erli; Min, Zhe; Zhang, Ang; Yu, Yue; Meng, Max Q.-H.

doi:10.3390/rs15184493

Open AccessArticle

Robust Semi-Supervised Point Cloud Registration via Latent GMM-Based Correspondence

by

Zhengyan Zhang

¹

,

Erli Lyu

^2,*

,

Zhe Min

³,

Ang Zhang

⁴,

Yue Yu

⁵ and

Max Q.-H. Meng

⁶

¹

School of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen 518000, China

²

Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China

³

Department of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, UK

⁴

Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong 999077, China

⁵

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China

⁶

Department of Electronic and Electrical Engineering, Southern University of Science and Technology, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(18), 4493; https://doi.org/10.3390/rs15184493

Submission received: 24 July 2023 / Revised: 24 August 2023 / Accepted: 25 August 2023 / Published: 12 September 2023

(This article belongs to the Special Issue LiDAR and Point Cloud Processing for Digital Surface Modelling and 3D Scene Reconstruction)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Due to the fact that point clouds are always corrupted by significant noise and large transformations, aligning two point clouds by deep neural networks is still challenging. This paper presents a semi-supervised point cloud registration (PCR) method for accurately estimating point correspondences and handling large transformations using limited prior datasets. Firstly, a modified autoencoder is introduced as the feature extraction module to extract the distinctive and robust features for the downstream registration task. Unlike optimization-based pairwise PCR strategies, the proposed method treats two point clouds as two implementations of a Gaussian mixture model (GMM), which we call latent GMM. Based on the above assumption, two point clouds can be regarded as two probability distributions. Hence, the PCR of two point clouds can be approached by minimizing the KL divergence between these two probability distributions. Then, the correspondence between the point clouds and the latent GMM components is estimated using the augmented regression network. Finally, the parameters of GMM can be updated by the correspondence and the transformation matrix can be computed by employing the weighted singular value decomposition (SVD) method. Extensive experiments conducted on both synthetic and real-world data validate the superior performance of the proposed method compared to state-of-the-art registration methods. These experiments also highlight the method’s superiority in terms of accuracy, robustness, and generalization.

Keywords:

point cloud registration; deep learning methods; Gaussian mixture model; semi-supervised learning

1. Introduction

Point cloud registration (PCR) is a critical task in computer vision [1,2], robotic perception [3,4], medical image analysis [5], computer-integrated intervention [6,7], and aircraft measurement [8]. It is a problem to estimate the rigid transformation matrix to align two point clouds. Solving this problem involves two steps: obtaining the optimal mapping from a point in the source point cloud to its corresponding point in the target point cloud, and then computing the optimal transformation matrix based on the point correspondence. Various methods utilize the optimization or deep neural network to find robust and accurate point correspondence. In this paper, we focus on using learned embeddings to find the optimal point-to-GMM correspondence matrix; we also use the correspondence matrix and latent GMM components to estimate the rigid transformation.

In our previous PCR methods [9,10,11,12], the registration problem was cast into a maximum likelihood estimation problem via the hybrid mixture model (HMM), where positional and orientation features are incorporated to estimate the correspondence by Bayes’ theorem. Like other optimization-based PCR methods, our previous methods were highly sensitive to initialization and failed to handle large transformations well. In real-world applications, there are often large transformations and noise between point clouds, so it is a challenge to make the point cloud registration algorithm converge even in the case of noise and large transformations [13,14]. In recent years, several deep learning architectures utilizing point clouds have demonstrated remarkable performance in various 3D vision tasks, including classification, segmentation, and detection. Various deep learning-based architectures for PCR have also been widely studied. The focus of most existing point cloud registration methods mainly revolves around descriptor extraction [15,16] or supervised learning [17,18,19,20], which heavily rely on large annotated datasets for training. Consequently, it is essential to introduce semi-supervised or unsupervised learning approaches to reduce the dependency of point cloud registration networks on datasets.

This paper presents a semi-supervised method for solving the PCR problem by minimizing the Kullback-Leibler divergence (KL-divergence) between two probability distributions generated by a shared latent GMM. The proposed architecture consists of three main modules: the feature extraction module, the latent GMM module, and the transformation estimation module. The feature extraction module utilizes a modified autoencoder module to learn how to extract distinctive and robust pose-attentive features from point clouds. The latent GMM module estimates the correspondences between point clouds and latent GMM components and subsequently updates the parameters of the latent GMM components. The transformation estimation module employs the weighted singular value decomposition (SVD) to compute the rigid transformation between two point clouds. The overview of the proposed method is shown in Figure 1. The main contributions can be summarized as follows.

(1): A new semi-supervised method for point cloud registration is proposed. The point cloud-based autoencoder is modified to serve as the feature extraction module, which is conducive to extracting distinctive and pose-attentive features from the limited data.
(2): An augmented regression network is proposed for generating the correspondence between the point cloud and latent GMM components.

The rest of this paper is organized as follows: Section 1 summarizes the related point cloud registration methods and reviews the traditional GMM-based PCR method. Section 2 establishes the formulation of the probabilistic point cloud registration task and illustrates the architecture of the proposed method in detail. Section 3 presents and discusses the experimental results. Section 4 analyzes the advantages of the proposed method compared to other methods based on the experimental results. Section 5 concludes this paper.

1.1. Related Work

1.1.1. ICP Method and Its Variants

The iterative closest point (ICP) method [21] and its variants [22,23,24] are the most popular optimization-based registration methods in the field. These methods conduct one-to-one correspondence estimation and transformation updating iteratively. Because ICP adopts a hard alignment strategy, this method depends on a good initial transformation and easily falls into local optima. Moreover, these methods are vulnerable to outliers and noise. Numerous variants of the ICP method have been proposed to address the aforementioned limitations. Agamennoni et al. [25] proposed an improvement, where each point in one point cloud is associated with a set of points in the other point cloud, and these associations are weighted to form a probability distribution. To accelerate the convergence of ICP, Low [26] introduced the linear approximation method to solve the point-to-plane error metric. Chetverikov et al. [27] proposed the trimmed ICP (TrICP) to improve the robustness, where the least trimmed squares (LTS) approach was consistently used in all phases of the operation. Yang et al. [28] introduced the branch-and-bound theory, which searches the entire SE(3) space to obtain the globally optimal solution to PCR.

1.1.2. Probabilistic Registration Methods

The probabilistic registration method is another popular kind of optimization-based method to solve PCR. In this category, the one-to-many correspondence confidences are represented by probabilities, and the PCR is considered the maximum likelihood estimation problem. GMM and HMM are commonly employed in estimating the correspondence between point clouds. Subsequently, the expectation maximization (EM) algorithm is utilized to iteratively update the parameters of the mixture model and estimate the rigid transformation. The coherent point drift (CPD) method, proposed by Myronenko et al. [29], treats one point cloud as a probability distribution generated from the other, representing a GMM. The expectation conditional maximization for point registration (ECMPR) method, proposed by Horaud et al. [30], replaces the traditional M-step with a sequence of conditional maximization steps and introduces a general covariance matrix for PCR with anisotropic covariance. Evangelidis et al. [31] proposed the joint registration multiple PC (JRMPC) method, considering multiple point clouds as realizations of a latent GMM and transforming PCR into a clustering problem. Compared with GMM-based methods, HMM-based methods [9,10,11,12] used von Mises–Fisher (VMF) to model the orientational data. Ravikumar et al. [32] combined VMF and Student’s t distributions to model the orientation and position of the point cloud. Min et al. [9] proposed a rigid HMM-based registration method for computer-assisted orthopedic surgery, where one point cloud was considered as an HMM composed of GMM and VMF distributions. Then, this framework was extended to rigid and nonrigid PCR with anisotropic noise [10,11]. Considering that the estimation of normal vectors may introduce errors to the preoperative and intraoperative stages, we recently proposed the reliable HMM [12], where curvatures were introduced as indexes to evaluate the reliability of normal vectors. Due to the soft alignment strategy, the probabilistic method achieved better robustness to noise and outliers. However, the probabilistic method cannot handle a PCR task with large transformations.

1.1.3. End-to-End Learning-Based Registration Methods

The key idea of the end-to-end learning-based registration method is to input two point clouds into a neural network and estimate the transformation matrix by optimization or regression. PointNetLK [33] used PointNet [34] to extract global features from input point clouds and estimated the rigid transformation matrix by minimizing the differences in features. FMR [35] further enhanced PointNetLK by incorporating an autoencoder framework and a Chamfer distance loss, thereby reducing the reliance on labels. DCP [36] employed DGCNN [37] to extract features from input point clouds and utilized Transformer [38] to estimate the correspondence matrix between two point clouds. Subsequently, the rigid transformation was computed using SVD. However, both FMR and DCP attempt to extract feature differences or compute correspondences on a point-to-point basis. However, it is infeasible to apply these methods when the point cloud is sparse and contains noise and outliers, as such elements negatively affect the point-to-point relationship. Recently, DeepGMR [39] first proposed an end-to-end learning-based probabilistic registration that integrated neural networks into GMM-based registration. However, the neural network only extracted the global features by PointNet, and the combined features only contained the features of each isolated point; the topological features of point clouds were ignored. Hence, the efficiency and generalization of the feature extraction module will decrease when the prior labeled dataset is insufficient.

1.2. Recap of Traditional GMM-Based Point Cloud Registration

Given the source point cloud

X \in R^{3 \times N}

and the target point cloud

Y \in R^{3 \times M}

, ideally, the source point cloud

X

can be obtained by applying rotation transformation

R

and translation

t

to the target point cloud

Y

. The objective of rigid PCR is to estimate the rigid transformation

(R, t)

to align

Y

to

X

, as shown below:

(\hat{R}, \hat{t}) = \underset{R, t}{arg min} {∥X - (RY + t)∥}^{2}

(1)

where

\hat{R} \in S O (3)

and

\hat{t} \in R^{3}

denote the optimal rotation matrix and translation vector.

We first review a traditional GMM-based method, like CPD, to solve the PCR problem, where the iterative EM algorithm is adopted. In CPD, the points in

Y

are assumed as GMM centroids, while the points in

X

are considered as points generated by the GMM. Then, given the mth target point

y_{m}

, the probability density function (PDF) of the nth source point

x_{n}

can be computed as follows:

p (x_{n}| z_{n} = m; θ) = \frac{1}{{(2 π σ^{2})}^{\frac{3}{2}}} e^{- \frac{∥x_{n} - y_{m}∥}{2 σ^{2}}}

(2)

where

1 \leq n \leq N, 1 \leq m \leq M

and

z_{n} = m

denote correspondence between the nth source point and the mth target point. Therefore, given the model parameters, the PDF of the nth point can be described as follows:

p (x_{n}| θ) = \sum_{m = 1}^{M + 1} P (z_{n} = m) p (x_{n}| z_{n} = m; θ)

(3)

where

m = M + 1

represents the outliers. Moreover, the additional uniform distribution

p (x_{n}| z_{n} = M + 1) = 1 / N

is introduced to account for outliers.

P (z_{n} = m) = 1 / N

are assumed for residual target points

m = 1, \dots, M

. The GMM centroids are represented by a set of parameters

θ = \{R, t, σ^{2}\}

and estimated by minimizing the negative log-likelihood function as follows:

Q (θ) = - \sum_{n = 1}^{N} log \sum_{m = 1}^{M + 1} P (z_{n} = m) p (x_{n}| z_{n} = m)

(4)

Then the above optimization problem can be solved by the EM framework. In the E-step, given initialized or previous parameters, the correspondence between the nth source point

x_{n}

and the mth target point

y_{m}

can be estimated by Bayes’ theorem as follows:

p (z_{n} = m| x_{n}; θ^{o l d}) = \frac{P (z_{n} = m) p (x_{n}| z_{n} = m; θ^{o l d})}{p (x_{n}| θ^{o l d})}

(5)

Then the minimization of the negative log-likelihood function (4) can be rewritten as follows:

Q (θ) = \frac{1}{2 σ^{2}} \sum_{n = 1}^{N} \sum_{m = 1}^{M} p_{m n} {∥x_{n} - (R y_{m} + t)∥}^{2} + \frac{3 N_{p}}{2} log σ^{2}

(6)

In the M-step, the new posteriors and previous model parameters are used to update the transformation matrix

[R, t]

and new model parameters by minimizing the objective function in (6). Then the above-mentioned E-step and M-step will iterate alternatively until convergence.

2. Methods

2.1. Problem Statement

In this paper, two point clouds are assumed as two probability distributions generated by the same latent GMM. Therefore, the PCR problem can be solved by minimizing the KL divergence between two probability distributions. The latent source and target GMM distributions are denoted as

Θ = \{π, μ, σ^{2}\}

and

\bar{Θ} = \{\bar{π}, \bar{μ}, {\bar{σ}}^{2}\}

respectively, where

\{π, μ, σ^{2}\}

denote the mixing coefficients, and the means and covariances of GMM components. Then, according to the definition of KL-divergence [40], (1) can be transformed as follows:

\begin{matrix} (\hat{R}, \hat{t}) & = \underset{R, t}{arg min} K L ((R \bar{Θ} + t)| Θ) \\ = \underset{R, t}{arg max} \int_{Y} p (y| (R \bar{Θ} + t)) ln p (y| Θ) d y \end{matrix}

(7)

For the target point cloud, the PDF of the j-th Gaussian distribution can be written as follows:

p (y_{m}| {\bar{Θ}}^{j}) = \frac{α_{j}}{(2 π {\bar{σ}}_{j}^{2})} e^{- \frac{1}{2 {\bar{σ}}_{j}^{2}} {∥y_{m} - {\bar{μ}}_{j}∥}^{2}}

(8)

where

1 \leq j \leq J

, and

\sum_{j = 1}^{J} α_{j} = 1

denotes the weight of each Gaussian distribution. Then, we minimize the KL divergence between

R \bar{Θ} + t

, and

Θ

is converted to maximize the expected log-likelihood of data that follow the distribution of

R \bar{Θ} + t

under

Θ

. Under the assumption that the transformed target point cloud

(RY + t) = {\{R y_{m} + t\}}_{m = 1}^{M}

is independently and identically sampled by the probability distribution

p (y| R \bar{Θ} + t)

, (7) can be expanded as follows by using the law of large numbers:

\begin{matrix} (\hat{R}, \hat{t}) & = \underset{R, t}{arg max} \frac{1}{M} \sum_{m = 1}^{M} ln p (R y_{m} + t| Θ) \\ = \underset{R, t}{arg max} \sum_{m = 1}^{M} ln \sum_{j = 1}^{J} p (R y_{m} + t| Θ^{j}) \end{matrix}

(9)

In this paper, we first introduce the semi-supervised method into the optimization of (9). The detailed components and the training framework are presented in the next section.

2.2. Feature Extraction Module

The aim of the feature extraction module is to generate distinctive and efficient features of both the source and target point clouds. The modified autoencoder module is used as the feature extraction module, which is inspired by the dynamic graph CNN (DGCNN) [37] but with a modified encoder and three fully connected layers as the decoder. The proposed method can extract local and topological features to represent both two point clouds; it is different from the feature extraction module (PointNet) used in DeepGMR [39]. Compared to PointNet++ [41], DGCNN constructs a dynamic graph for each point cloud and uses the EdgeConv operation to compute features, allowing the network to adapt better to the geometric structures and capture more effective feature representations. Therefore, we chose the modified DGCNN as the encoder of the proposed feature extraction module. Following [42,43], the principle of designing the feature extraction module is to minimize the learning space and generate features containing the pose difference. Hence, we normalize the two point clouds by subtracting the means of

X

and

Y

before feeding them into the network. The encoder generates an augmented latent representation, and the desired output of the decoder is a reconstructed point cloud, reflecting the rigid transformation difference between the two point clouds.

Figure 2 shows the architecture of the proposed feature extraction module inspired by DGCNN. The shared feature extraction module is used for two point clouds. Hence, we denote

N u m = m a x \{M, N\}

to fix the point number of the input and output of the autoencoder. When

N \neq M

, we can upsample the point cloud with less point numbers to meet the above assumption. For the target point cloud,

Y = \{y_{m} \in R^{p}| m = 1, \dots, M\}

represents the 3D space or the arbitrary feature space. For each

y_{m}

point, a k-nearest neighbor graph is constructed. In this graph, the k-nearest points

\{y_{m i}| i = 1, \dots, k\}

are used to calculate the edge features

\{y_{m}, (y_{m} - y_{m i})\}

.

The EdgeConv operation contains an edge function and a vertex-wise aggregation. The shared multi-layer perceptron (MLP) is used as the edge function, and the max-pooling is used as the vertex-wise aggregation. We concatenate edge features of three layers and use an MLP to generate the common feature. Then the max-pooling and average-pooling are both applied to the common feature to augment the global feature.

The encoder outputs two global representations with a dimension of 1024. Then we concatenate two global representations as augmented representations for the source or target point cloud. The decoder contains three fully connected layers with dimensions of 2048, 1024, and Num × 3, with the activation being LeakyReLU. The feature extraction module can be trained in an unsupervised manner. This training framework can help the encoder generate the pose-attentive features from the source or target point clouds, which can improve the success of training an efficient feature extractor for PCR when prior datasets are limited.

2.3. Latent GMM Module

The principle of designing the latent GMM module is to replace the original E-step in the EM algorithm by utilizing the features of the source and target point cloud. The latent GMM module contains the augmented regression network and the parameter estimation module. The augmented regression network contains the four-layer MLP with output dimensions of 1024, 256, 128, and J to generate the point-to-GMM correspondence, where J denotes the number of Gaussian distributions. The augmented global features with a dimension of 2048 are concatenated by max-pooling and average-pooling. The local features with the dimension of

N u m \times 1024

are generated by the MLP in the modified encoder. Then, we can estimate the correspondence

\bar{γ} \in R^{N u m \times J}

between the target point cloud and target latent GMM, where

\sum_{j = 1}^{J} γ_{i j} = 1

and

1 \leq i \leq N u m

. Similarly,

γ \in R^{N u m \times J}

denotes the correspondence between the source point cloud and source latent GMM. According to [44], the effective number of points corresponds to the j-th Gaussian distribution and the mixing coefficient for the j-th component can be computed as follows:

\begin{matrix} γ_{j} & = \sum_{i = 1}^{N u m} γ_{i j}, π_{j} = \frac{γ_{j}}{N u m} \\ {\bar{γ}}_{j} & = \sum_{i = 1}^{N u m} {\bar{γ}}_{i j}, {\bar{π}}_{j} = \frac{{\bar{γ}}_{j}}{N u m} \end{matrix}

(10)

Then the derivatives of the objective function in (9) with respect to the means

μ_{j}

of the latent Gaussian components are set to zero:

- \sum_{m = 1}^{M} [\frac{p [(R y_{m} + t)| Θ^{j}]}{\sum_{j = 1}^{J} p [(R y_{m} + t)| Θ^{j}]} \cdot \frac{(R y_{m} + t) - μ_{j}}{σ_{j}^{2}}] = 0

(11)

We replace the posterior probabilities with the correspondence matrix

γ

estimated by the neural network. Ideally, the transformed target point cloud

R Y + t

is equivalent to the source point cloud

X

. Then the means

μ_{j}

of the latent source GMM can be computed as follows:

μ_{j} = \frac{1}{{\bar{γ}}_{j}} \sum_{i = 1}^{N u m} {\bar{γ}}_{i j} (R y_{i} + t) \approx \frac{1}{{\bar{γ}}_{j}} \sum_{i = 1}^{N u m} {\bar{γ}}_{i j} x_{i}

(12)

Then, by setting the derivatives of the objective function in (9) with respect to the covariances

σ_{j}^{2}

of the latent source GMM to zero, we can obtain:

\begin{matrix} σ_{j}^{2} & = \frac{1}{{\bar{γ}}_{j}} \sum_{i = 1}^{N u m} {\bar{γ}}_{i j} [(R y_{i} + t) - μ_{j}] {[(R y_{i} + t) - μ_{j}]}^{T} \\ \approx \frac{1}{{\bar{γ}}_{j}} \sum_{i = 1}^{N u m} {\bar{γ}}_{i j} (x_{i} - μ_{j}) {(x_{i} - μ_{j})}^{T} \end{matrix}

(13)

Similarly, the means

{\bar{μ}}_{j}

and the covariances

{\bar{σ}}_{j}^{2}

of the latent target GMM can be computed as follows:

\begin{matrix} {\bar{μ}}_{j} & \approx \frac{1}{γ_{j}} \sum_{i = 1}^{N u m} γ_{i j} y_{i} \\ {\bar{σ}}_{j}^{2} & \approx \frac{1}{γ_{j}} \sum_{i = 1}^{N u m} γ_{i j} (y_{i} - {\bar{μ}}_{j}) {(y_{i} - {\bar{μ}}_{j})}^{T} \end{matrix}

(14)

After the above derivation, the correspondence between point clouds and latent GMMs, as well as the parameters of the latent GMMs, can be directly estimated through the proposed neural network without the need for multiple optimization iterations.

2.4. Weighted SVD Module

Given the correspondence matrix and GMM components, we can compute the lower bound over (9) as follows:

\begin{matrix} (\hat{R}, \hat{t}) = \underset{R, t}{arg max} \sum_{i = 1}^{N u m} \sum_{j = 1}^{J} {\bar{γ}}_{i j} ln p [(R y_{i} + t)| Θ^{j}] \\ = \underset{R, t}{arg max} \sum_{i = 1}^{N u m} \sum_{j = 1}^{J} {\bar{γ}}_{i j} ln [\frac{α_{j}}{(2 π σ_{j}^{2})} e^{- \frac{1}{2 σ_{j}^{2}} {∥(R y_{i} + t) - μ_{j}∥}^{2}}] \\ = \underset{R, t}{arg min} \sum_{i = 1}^{N u m} \sum_{j = 1}^{J} \frac{{\bar{γ}}_{i j}}{σ_{j}^{2}} {∥(R y_{i} + t) - μ_{j}∥}^{2} \end{matrix}

(15)

We assume the point cloud transformation to be a form of the GMM component transformation, where each GMM component varies with respect to the transformation. This simplification saves the complexity of the optimization and neural networks. Then the objective function in (15) can be written as follows:

(\hat{R}, \hat{t}) = \underset{R, t}{arg min} \sum_{j = 1}^{J} \frac{{\bar{γ}}_{j}}{σ_{j}^{2}} {∥(R {\bar{μ}}_{j} + t) - μ_{j}∥}^{2}

(16)

The weighted SVD method is used to compute the transformation matrix in closed-form. The centers of two latent GMM components are computed as follows:

μ_{c} = \sum_{j = 1}^{J} π_{j} μ_{j}, {\bar{μ}}_{c} = \sum_{j = 1}^{J} {\bar{π}}_{j} {\bar{μ}}_{j}

(17)

Then the cross-covariance matrix is defined as follows:

H = \sum_{j = 1}^{J} \frac{π_{j}}{{\bar{σ}}_{j}^{2}} (μ_{j} - μ_{c}) {({\bar{μ}}_{j} - {\bar{μ}}_{c})}^{T}

(18)

The rigid transformation can be computed according to SVD:

\begin{matrix} \hat{R} & = U d i a g (1, 1, det (U V^{T})) V^{T} \\ \hat{t} & = μ_{c} - \hat{R} {\bar{μ}}_{c} \end{matrix}

(19)

where

U

and

V

can be computed by the SVD of

H = U S V^{T}

.

2.5. Loss Definition

The primary goal of the training process is to ensure that the modified autoencoder module and the augmented correspondence regression effectively extract distinctive features and establish accurate correspondences. Our proposed method employs two loss functions, specifically Chamfer loss and transformation loss, as part of a semi-supervised approach.

2.5.1. Chamfer Loss

The objective of the Chamfer loss function is to train the modified autoencoder module in an unsupervised manner. The Chamfer distance is utilized to quantify the dissimilarity between input point clouds

\{X, Y\}

and reconstructed point clouds

\{X_{r e c}, Y_{r e c}\}

. Specifically, the Chamfer loss between

X

and

X_{r e c}

can be written as follows:

\begin{matrix} χ_{C D}^{s r c} = & \frac{1}{N} \sum_{p \in X} min_{q \in X_{r e c}} {∥p - q∥}_{2}^{2} \\ + \frac{1}{N} \sum_{q \in X_{r e c}} min_{p \in X} {∥p - q∥}_{2}^{2} \end{matrix}

(20)

where p and q indicate points in

X

and

X_{r e c}

.

2.5.2. Transformation Loss

The objective of PCR is to estimate the transformation matrix that aligns the target point cloud

X

with the source point cloud

Y

. Therefore, the transformation loss is employed to minimize the discrepancy between the ground-truth transformation and the estimated transformation:

\begin{matrix} χ_{R} & = {∥{\hat{R}}^{T} R_{g} - I∥}_{2}^{2} \\ χ_{t} & = {∥\hat{t} - t_{g}∥}_{2}^{2} \end{matrix}

(21)

where the ground-truth transformation matrix is composed of

R_{g}

and

t_{g}

.

2.5.3. Total Loss

The total loss function used for training the autoencoder module and the augmented feature regression module involves the combination of the Chamfer loss and the transformation loss:

χ_{t o t a l} = α (χ_{C D}^{s r c} + χ_{C D}^{t g t}) + β χ_{t} + χ_{R}

(22)

where

α, β

are scalar parameters and

χ_{C D}^{t g t}

denotes the Chamfer loss between

Y

and

Y_{r e c}

.

3. Results

In this section, we train and evaluate our method using the synthetic ModelNet40 dataset [45] and the real-world 7Scene dataset [46]. First, we present the evaluation metrics and provide implementation details. Next, we compare the accuracy and robustness of our method with state-of-the-art approaches, including ICP [21], CPD [29], FGR [47], DeepGMR [39], PointNetLK [33], DCP [36], and FMR [35].

3.1. Evaluation Metrics

The registration performance is evaluated by the rotational and translational errors as follows:

\begin{matrix} ε_{R} & = arccos \frac{t r ({\hat{R}}^{T} R^{g}) - 1}{2} \\ ε_{t} & = {∥\hat{t} - t^{g}∥}_{2} \end{matrix}

(23)

where rotational and translational errors are measured in degrees and meters, respectively. The root means squared error (RMSE) and the mean absolute error (MAE) of registration results are reported for all the methods mentioned above.

3.2. Implementation Details

We trained DCP, PointNetLK, DeepGMR, FMR, and our method on the synthetic ModelNet40 dataset. The feature extraction module utilized the modified DGCNN as the encoder, with a k value set to 10 in the dynamic KNN graph. In the augmented regression network, the number of Gaussian components was set as 20. The learning-based methods mentioned above were conducted by using their official implementation. The other optimization-based PCR methods (CPD, ICP, FMR) were conducted by the Probreg library [48] and the Open3D implementation [49]. For all experiments, the proposed method was trained for 250 epochs and Adam [50] was utilized as the optimizer of the neural network. The learning rate was initially set to

0.001

and decreased by a factor of 10 at epochs 75, 150, and 200, resulting in a total training duration of 250 epochs. All experiments were conducted in PyTorch and trained on two NVIDIA Titan XP GPUs.

3.3. Registration Results in the ModelNet40 Dataset

In this experiment, we evaluated the registration method performances on the ModelNet40 dataset, composed of 3D CAD models from 40 different categories. To assess the generalization ability of various methods, we divided the dataset into training and testing sets, ensuring an equal distribution of categories. We trained learning-based methods using the first 20 categories for training and evaluated them on the last 20 categories for testing. The optimization-based methods were also evaluated using the last 20 categories. In total, 5088 pairs of point clouds were used to train the network, while 1266 pairs of point clouds were used for testing. During the experiment, each model’s surface was randomly sampled to obtain 1024 points, which served as the source point cloud. The point cloud was then normalized and adjusted to fit within a unit box at the origin

{[0, 1]}^{3}

. The target point cloud was generated by randomly applying rigid transformations to the source point cloud. Throughout the training and testing processes, a rotation matrix was randomly generated using three Euler rotation angles in the range of

{[0, 45]}^{\circ}

, and the translation vector ranged from 0 to 0.5 m on each axis.

3.3.1. Robustness to Noise

The results of various algorithms on clean data are presented in Table 1. The comparison results demonstrate that our algorithm achieves state-of-the-art performance on the clean dataset. In particular, our algorithm achieves the highest accuracy in terms of translation. In the absence of noise, DCP can accurately compute the point-to-point correspondence between two point clouds, thus obtaining more accurate rotation estimation results. After multiple iterations, CPD can also obtain accurate point-to-point correspondences in noise-free experiments, so that it can iteratively optimize and estimate the rotation matrix more accurately. In addition, CPD uses an iterative optimization method with well-defined mathematical constraints, which gives it better generalization for unknown scenes.

To assess the noise robustness of the proposed method, experiments were conducted using a dataset containing noise. We adopted the trained model in the clean dataset and tested it on four levels of noise. The noises were sampled from

N (0, σ^{2})

and clipped to

[- 0.05, 0.05]

on each axis, where

σ^{2}

ranges from 0.01 to 0.04. Note that the Gaussian noise is independently injected into each point of the target point cloud, which eliminates the point-to-point correspondence. Figure 3 shows the comparison results under different levels of noise. Compared with PointNetLK, FMR, and DeepGMR, the proposed method achieves more robust performance on the noisy dataset, even when the noise is very large (

σ^{2}

= 0.04). DCP is more sensitive to noise because the noise breaks the one-to-one correspondence. As an extension of the learning-based probabilistic registration method, the proposed method aligns more accurately than DeepGMR due to the efficiency of the latent feature and augmented regression network. The qualitative results on the noisy dataset are presented in Figure 4, where the target point cloud is corrupted by Gaussian noise with a ratio of 0.04.

3.3.2. Robustness to Sparse Data

To demonstrate that the proposed method has significant robustness to sparse data, we downsampled the clean source point cloud at different density levels and evaluated the registration performance against other methods. Note that the density level of the source point cloud ranges from

50 %

to

100 %

, while the point number of the target point cloud is still 1024. Figure 5 summarizes the comparison results on different density levels. Based on the analysis presented in Figure 5, it is observed that the proposed method exhibits robustness to sparse data, even when the density level of the source point cloud is as low as 50%. Notably, DCP struggles to handle the registration when only 90% of the source point cloud is available. Additionally, both FMR and our method demonstrate superior performance compared to other methods. This finding supports the effectiveness of the semi-supervised learning approach in extracting features that capture the pose differences. The qualitative results are shown in Figure 6, where the density levels of the source point cloud are 50% and 70%.

To better compare the robustness of different algorithms to the sparse point cloud, the source and target point cloud were both downsampled with 512 points. In these experiments, the covariance of the noise

σ^{2}

applied to the target point cloud was 0.04. The experimental results are shown in Table 2. When the point cloud becomes sparse, the feature extraction modules of PointNetLK and DeepGMR also have difficulty extracting effective point cloud features, which can lead to poorer registration accuracy compared to FMR and the proposed method. Due to the autoencoder-based feature extraction module and augmented point cloud global features, our algorithm maintains good registration accuracy in the case of sparse point clouds. Our algorithm achieves accurate registration even with sparse point clouds by incorporating an autoencoder-based feature extraction module and augmented point cloud global features.

3.3.3. Robustness to Different Transformation

To test the robustness of each method to different transformations, three rotation angles and the range of translation vectors were expanded to

{[0, 60]}^{\circ}

and

[- 0.5, 0.5]

m. In these experiments, the covariance of Gaussian noise applied to the target point was 0.01

(σ^{2} = 0.01)

. The comparison results are presented in Table 3. As the spatial transformation range increases, the accuracy of the registration algorithm decreases, especially for learning-based registration algorithms. This is because different rotational transformations make it difficult for the feature extraction module to accurately extract point cloud features in a new space. Based on the findings displayed in Table 3, it can be concluded that the proposed algorithm effectively tackles registration challenges in this particular scenario. We believe that there are two reasons that can explain this advantage. Firstly, as a pre-processing step, point cloud normalization can effectively reduce the learning space of the neural network, thus ensuring the effectiveness of the feature extraction module. In addition, an autoencoder-based feature extraction module was employed to enhance the neural network’s capability to extract point cloud features across various learning domains. Moreover, augmented point cloud features were utilized to describe the point cloud.

3.3.4. Robustness to Partial Data

In order to evaluate the robustness of each method to partial overlap, a portion of each target point cloud was randomly removed. Consequently, in each registration experiment, only 90% of two point clouds exhibited overlap. The registration task involving the partial target point cloud and the full source point cloud is referred to as partial-to-full registration. It is important to note that these experiments were conducted using the model that was previously trained, without any retraining specifically for this registration scenario. Table 4 summarizes the error results of each algorithm in the partial-to-full registration experiment. For FMR and PointNetLK, their test models are obtained based on full-to-full registration. In the partial-to-full registration testing scenario, the absence of point cloud features derived from missing point cloud structures results in errors when calculating the difference between the features of the two point clouds. In contrast, registration methods based on the Gaussian mixture model demonstrate more robustness in this particular registration scenario. The proposed method can accurately estimate the rotation transformation of point clouds in comparison to DeepGMR, owing to the advantages of the feature extraction module and the latent GMM module. The proposed method also achieves similar accuracy to other state-of-the-art methods in estimating translation transformation. Figure 7 qualitatively demonstrates the registration results of CPD, DeepGMR, and the proposed algorithm in this registration scenario. In Figure 7, the point numbers of the source and target point clouds are 1024 and 921, respectively. According to Figure 7, CPD has poor accuracy in predicting the translation vector, and DeepGMR has poor accuracy in predicting the rotation matrix. The proposed method achieves better prediction accuracy in both rotation and translation.

3.4. Registration Results in the 7Scene Dataset

7Scene is a collection of the real-world indoor dataset, which contains seven scenes recorded by an RGB-D camera. In this experiment, we demonstrated the generalization ability and accuracy of our algorithm to handle the real-world scenes generated by different sensors. We adopted the trained model on the clean ModelNet40 dataset and tested it on the real-world 7Scene dataset. It means that we used 355 real-world scenes to validate the trained model. The initial rotation and translation were the same as in Section 3.3. In the real-world application, the point cloud obtained by LIDAR or RGB-D was always sparse. To test the performance on sparse data, we randomly sampled the source point cloud at different levels. The number of sampled source point clouds ranges from 1 k to 10 k.

3.4.1. Robustness to Sparse Data

Table 5 provides a summary of the comparison results from various methods on the real-world dataset. In the experiment described in Table 5, both point clouds consist of 1K points. The point cloud utilized in this experiment is the sparsest in comparison to other experiments conducted on the real-world dataset. The proposed method demonstrates superior performance compared to both classic optimization-based methods and more recent learning-based methods. The observation can be concluded that the autoencoder-based feature extraction module can extract distinctive and efficient features to represent the point cloud after training. Furthermore, compared with the point-to-point alignment strategies (ICP, DCP) and the feature correspondence strategies (FGR, PointNetLK, FMR), our point-to-GMM method achieves more accurate performance. Due to the difficulty in obtaining more accurate point cloud representations through the feature extraction module for sparse point clouds in large scenes, there are errors in the point-to-GMM correspondences. Therefore, the FMR algorithm, which directly compares the global feature differences between the two point clouds, achieved a smaller MAE result. The qualitative registration results of two scenes are shown in Figure 8, where the number of points is 1 k. According to Figure 8, the ICP, PointNetLK, and FMR algorithms fail to converge in both scenarios, rendering the registration algorithms ineffective. Compared to other algorithms, our algorithm exhibits lower rotation and translation errors in these two scenarios.

Figure 9 presents the registration results of different density levels in the real-world dataset. The results of the semi-supervised methods (FMR, ours) outperform supervised methods (PointNetLK, DeepGMR) in terms of rotational errors. This implies that the semi-supervised method can extract rotation-attentive features in two point clouds, effectively reflecting the rotation difference in the transformation estimation. Furthermore, the GMM-based methods (DeepGMR, ours) achieve more accurate translational results than other methods. We can observe that our algorithm obtains better accuracy and robustness under different numbers of input points.

3.4.2. Robustness to Partial Overlap

Similar to the experiments in Section 3.3.4, we conducted partial-to-full registration experiments on the 7Scene dataset. The point numbers of two point clouds were 5 k and 4.5 k, respectively. Table 6 summarizes the results of different registration methods. The proposed method shows better accuracy than other methods. As the point cloud density increases, the registration accuracy of DCP correspondingly increases, thanks to the use of transformer-based point-to-point correspondence estimation. However, this requires significant computational memory and time consumption. The qualitative comparison results are shown in Figure 10. In Figure 10, the point numbers of two point clouds are 5 k and 4.5 k, respectively. The DeepGMR algorithm exhibits the highest rotation error in this point cloud scene compared to CPD and our method. While the CPD algorithm has better registration accuracy than the DeepGMR algorithm in this scene, it still has rotation errors. In contrast, our algorithm achieves the best registration accuracy. Therefore, our algorithm demonstrates better robustness to overlap data.

3.5. Ablation Study

Feature extraction is the key module for the PCR. Several ablation studies were conducted to discuss the advantages of feature extraction modules in different methods. Table 7 shows quantitative results under the noisy ModelNet40 dataset for different methods. The first row shows the results of the original DeepGMR. In the Our-V1 experiment, we replaced PointNet with modified DGCNN as the feature extraction module. As shown in the second row, this method improves the rotational accuracy but fails to improve the translational accuracy. The Ours-V1 method achieves better results than the original DeepGMR in terms of rotational errors, which highlights the advantage of our modified DGCNN encoder. In the Our-V2 experiment, we add the decoder to guarantee the modified encoder to extract distinctive and efficient features, which contain pose-attentive information. As shown in the third row, adding the decoder can improve the efficiency of the feature extraction module, which drastically improves the registration performance. Note that the proposed method with the modified autoencoder as the feature extraction module achieves better performance in terms of rotational and translational accuracy.

3.6. Runtime Analysis

Table 8 compares the average runtime for each registration in different methods. In these experiments, the spatial transformation and noise applied to the point clouds in the registration experiments were the same. In the experiment on ModelNet40, the point numbers of the source and target point cloud were 1024. Although the proposed method adopts a more complicated neural network, the proposed method does not consume much runtime compared to DeepGMR in each registration experiment. Moreover, the proposed algorithm has some advantages in terms of runtime efficiency compared to algorithms such as PointNetLK and FMR. In order to test the influences of different sparsity levels of point clouds on the runtime efficiency of the registration algorithm, we conducted three comparative experiments on the 7Scene dataset. In these three experiments, the point numbers in the point clouds were 1 k, 5 k, and 10 k, respectively. Due to the more complex point cloud structures in real datasets, more runtime is required for each registration. Although the proposed method is slightly more time-consuming compared to other learning-based methods, considering the improvement in registration accuracy, we believe the slight increase in runtime is reasonable.

Due to the introduction of an autoencoder as a feature extraction module and enhanced point cloud features as input to the regression network, our method does indeed increase memory consumption. However, since the proposed method does not require computing correspondences between points, it does not result in significant memory usage. During the training process, even with a point cloud containing up to 5 k points, the GPU memory usage is still below 10G, whereas the DCP algorithm already exceeds 12G of memory usage. In scenarios involving large-scale point cloud registration, we suggest downsampling or extracting key points fromthe point cloud and using themas input to the registration network.

4. Discussion

This paper introduces a semi-supervised framework for solving the probabilistic PCR problem. This framework incorporates modified feature extraction and augmented GMM component estimation modules. These modules are particularly suitable for registering point clouds that are sparse and corrupted by significant noise. Extensive experimental results demonstrate that our algorithm achieves superior accuracy and robustness to both noise and sparsity when compared with state-of-the-art methods. The benefits of incorporating semi-supervised learning and a correspondence strategy are evident when comparing the results of the aforementioned learning-based PCR methods in various experimental settings. It is important to emphasize that the registration performance significantly benefits from both the semi-supervised learning framework and the augmented point-to-GMM correspondence module under various experimental conditions. To better evaluate the superiority of our algorithm compared to other learning-based PCR methods, we provide a comparison of the learning framework and correspondence strategy.

4.1. Supervised Learning or Semi-Supervised Learning

In the above-mentioned PCR methods, FMR and our method adopt the semi-supervised learning framework, while other methods adopt a supervised learning framework. According to the comparison results in Section 3.4, PCR methods with a semi-supervised learning framework achieve more accurate registration results. Moreover, FMR and our method have better robustness to sparse data and noise. These experimental results show that the semi-supervised learning framework can have better generalization from synthetic data to real-world data, which can have a wide application when the prior datasets are absent. We can conclude that the semi-supervised learning framework can extract the distinctive, efficient, and pose-attentive features from the limited dataset. These features help to estimate the correspondence matrix or feature matrix to achieve better final registration performance.

4.2. Correspondence Strategy

Among the learning-based PCR methods in the experiments, there are three different correspondence strategies. PointNetLK and FMR solve the PCR by minimizing the difference between the features of two point clouds. DCP adopts the transformer module to estimate the point-to-point correspondence. DeepGMR and our method both estimate the point-to-GMM correspondence to align two input point clouds. According to the comparison results in Figure 3, point-to-point correspondence may be broken by noise. Thus, DCP is sensitive to noise and unable to handle registration tasks with large noise. Compared with PointNetLK and FMR, DeepGMR and our method achieve more robust registration performance under different levels of noise. Moreover, given the latent GMM module with augmented features, our method has better robustness to different noise and sparse data. It can be concluded that accurate and robust correspondence is beneficial for PCR and applicable to handling registration tasks with noise and sparse data.

5. Conclusions

In this paper, we propose a semi-supervised method to solve the PCR problem. The proposed algorithm reduces dependence on the labeled dataset and is robust to the high level of noise. The proposed algorithm uses a modified DGCNN to extract features of point clouds and adopts the autoencoder framework to conduct a semi-supervised training method, which improves the efficiency and distinctiveness of the feature extraction module. Moreover, the augmented regression network is adopted to estimate the correspondences between point clouds and GMM components. Utilizing the point-to-GMM correspondence, the GMM parameters and rigid transformation are computed alternately. Comparison results on the ModelNet40 dataset validate that our algorithm is more robust to different levels of noise and sparsity. Experiments on the 7Scene dataset demonstrate the better generalization ability of our algorithm. In the ablation study, the advantages of different feature extraction modules are discussed. Considering the challenges of labeling and the high production cost associated with the medical point cloud dataset, there are very few public datasets that can be used to train the deep learning model. In the future, due to the good generalization performance and accuracy of the proposed algorithm, we will extend this approach to medical applications to solve the above problems.

Author Contributions

Conceptualization, Z.Z., E.L. and Z.M.; Methodology, Z.Z.; Software, Z.Z.; Validation, Z.Z.; Formal analysis, A.Z.; Supervision, M.Q.-H.M.; Project administration, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wei, H.; Qiao, Z.; Liu, Z.; Suo, C.; Yin, P.; Shen, Y.; Li, H.; Wang, H. End-to-End 3D Point Cloud Learning for Registration Task Using Virtual Correspondences. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 2678–2683. [Google Scholar] [CrossRef]
Li, L.; Yang, M.; Wang, C.; Wang, B. Robust Point Set Registration Using Signature Quadratic Form Distance. IEEE Trans. Cybern. 2020, 50, 2097–2109. [Google Scholar] [CrossRef] [PubMed]
Hitchcox, T.; Forbes, J.R. A Point Cloud Registration Pipeline using Gaussian Process Regression for Bathymetric SLAM. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 4615–4622. [Google Scholar] [CrossRef]
Lyu, E.; Liu, T.; Wang, J.; Song, S.; Meng, M.Q.H. Motion Planning of Manipulator by Points-Guided Sampling Network. IEEE Trans. Autom. Sci. Eng. 2022, 20, 821–831. [Google Scholar] [CrossRef]
Hansen, L.; Heinrich, M.P. Deep learning based geometric registration for medical images: How accurate can we get without visual features? In Proceedings of the Information Processing in Medical Imaging: 27th International Conference, IPMI 2021, Virtual Event, 28–30 June 2021; Proceedings 27. Springer: Berlin/Heidelberg, Germany, 2021; pp. 18–30. [Google Scholar]
Fu, Y.; Lei, Y.; Wang, T.; Patel, P.; Jani, A.B.; Mao, H.; Curran, W.J.; Liu, T.; Yang, X. Biomechanically constrained non-rigid MR-TRUS prostate registration using deep learning based 3D point cloud matching. Med. Image Anal. 2021, 67, 101845. [Google Scholar]
Baum, Z.M.; Hu, Y.; Barratt, D.C. Real-time multimodal image registration with partial intraoperative point-set data. Med. Image Anal. 2021, 74, 102231. [Google Scholar] [PubMed]
Si, H.; Qiu, J.; Li, Y. A review of point cloud registration algorithms for laser scanners: Applications in large-scale aircraft measurement. Appl. Sci. 2022, 12, 10247. [Google Scholar]
Min, Z.; Wang, J.; Meng, M.Q.H. Robust Generalized Point Cloud Registration With Orientational Data Based on Expectation Maximization. IEEE Trans. Autom. Sci. Eng. 2020, 17, 207–221. [Google Scholar] [CrossRef]
Min, Z.; Wang, J.; Pan, J.; Meng, M.Q.H. Generalized 3-D Point Set Registration With Hybrid Mixture Models for Computer-Assisted Orthopedic Surgery: From Isotropic to Anisotropic Positional Error. IEEE Trans. Autom. Sci. Eng. 2021, 18, 1679–1691. [Google Scholar] [CrossRef]
Min, Z.; Zhu, D.; Ren, H.; Meng, M.Q.H. Feature-Guided Nonrigid 3-D Point Set Registration Framework for Image-Guided Liver Surgery: From Isotropic Positional Noise to Anisotropic Positional Noise. IEEE Trans. Autom. Sci. Eng. 2021, 18, 471–483. [Google Scholar] [CrossRef]
Zhang, Z.; Min, Z.; Zhang, A.; Wang, J.; Song, S.; Meng, M.Q.H. Reliable Hybrid Mixture Model for Generalized Point Set Registration. IEEE Trans. Instrum. Meas. 2021, 70, 2516110. [Google Scholar] [CrossRef]
Žagar, B.L.; Yurtsever, E.; Peters, A.; Knoll, A.C. Point Cloud Registration With Object-Centric Alignment. IEEE Access 2022, 10, 76586–76595. [Google Scholar] [CrossRef]
Zováthi, Ö.; Nagy, B.; Benedek, C. Point cloud registration and change detection in urban environment using an onboard Lidar sensor and MLS reference data. Int. J. Appl. Earth Obs. Geoinf. 2022, 110, 102767. [Google Scholar]
You, B.; Chen, H.; Li, J.; Li, C.; Chen, H. Fast point cloud registration algorithm based on 3DNPFH descriptor. Photonics 2022, 9, 414. [Google Scholar]
Qin, Z.; Yu, H.; Wang, C.; Guo, Y.; Peng, Y.; Xu, K. Geometric Transformer for Fast and Robust Point Cloud Registration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11133–11142. [Google Scholar] [CrossRef]
Zheng, Y.; Li, Y.; Yang, S.; Lu, H. Global-PBNet: A Novel Point Cloud Registration for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22312–22319. [Google Scholar] [CrossRef]
Chen, Z.; Sun, K.; Yang, F.; Tao, W. SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13211–13221. [Google Scholar] [CrossRef]
Wu, Y.; Yao, Q.; Fan, X.; Gong, M.; Ma, W.; Miao, Q. PANet: A Point-Attention Based Multi-Scale Feature Fusion Network for Point Cloud Registration. IEEE Trans. Instrum. Meas. 2023, 72, 2512913. [Google Scholar] [CrossRef]
Chen, Z.; Yang, F.; Tao, W. Detarnet: Decoupling translation and rotation by siamese network for point cloud registration. Proc. AAAI Conf. Artif. Intell. 2022, 36, 401–409. [Google Scholar]
Besl, P.; McKay, N.D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
Segal, A.; Haehnel, D.; Thrun, S. Generalized-icp. Proc. Robot. Sci. Syst. 2009, 2, 435. [Google Scholar]
Bouaziz, S.; Tagliasacchi, A.; Pauly, M. Sparse iterative closest point. Proc. Comput. Graph. Forum 2013, 32, 113–123. [Google Scholar]
Rusinkiewicz, S. A symmetric objective function for ICP. ACM Trans. Graph. 2019, 38, 1–7. [Google Scholar]
Agamennoni, G.; Fontana, S.; Siegwart, R.Y.; Sorrenti, D.G. Point Clouds Registration with Probabilistic Data Association. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 4092–4098. [Google Scholar] [CrossRef]
Low, K.L. Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration; University of North Carolina: Chapel Hill, NC, USA, 2004; Volume 4, pp. 1–3. [Google Scholar]
Chetverikov, D.; Stepanov, D.; Krsek, P. Robust Euclidean alignment of 3D point sets: The trimmed iterative closest point algorithm. Image Vis. Comput. 2005, 23, 299–309. [Google Scholar]
Yang, J.; Li, H.; Jia, Y. Go-ICP: Solving 3D Registration Efficiently and Globally Optimally. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 1457–1464. [Google Scholar] [CrossRef]
Myronenko, A.; Song, X. Point Set Registration: Coherent Point Drift. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2262–2275. [Google Scholar] [CrossRef]
Horaud, R.; Forbes, F.; Yguel, M.; Dewaele, G.; Zhang, J. Rigid and Articulated Point Registration with Expectation Conditional Maximization. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 587–602. [Google Scholar] [CrossRef] [PubMed]
Evangelidis, G.D.; Kounades-Bastian, D.; Horaud, R.; Psarakis, E.Z. A generative model for the joint registration of multiple point sets. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 109–122. [Google Scholar]
Ravikumar, N.; Gooya, A.; Frangi, A.F.; Taylor, Z.A. Generalised coherent point drift for group-wise registration of multi-dimensional point sets. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec City, QC, Canada, 10–14 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 309–316. [Google Scholar]
Aoki, Y.; Goforth, H.; Srivatsan, R.A.; Lucey, S. PointNetLK: Robust amp; Efficient Point Cloud Registration Using PointNet. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7156–7165. [Google Scholar] [CrossRef]
Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar] [CrossRef]
Huang, X.; Mei, G.; Zhang, J. Feature-Metric Registration: A Fast Semi-Supervised Approach for Robust Point Cloud Registration Without Correspondences. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11363–11371. [Google Scholar] [CrossRef]
Wang, Y.; Solomon, J. Deep Closest Point: Learning Representations for Point Cloud Registration. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3522–3531. [Google Scholar] [CrossRef]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Yuan, W.; Eckart, B.; Kim, K.; Jampani, V.; Fox, D.; Kautz, J. Deepgmr: Learning latent gaussian mixture models for registration. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 733–750. [Google Scholar]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Gao, G.; Lauri, M.; Hu, X.; Zhang, J.; Frintrop, S. CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11081–11087. [Google Scholar] [CrossRef]
Liu, X.; Han, Z.; Wen, X.; Liu, Y.S.; Zwicker, M. L2g auto-encoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 989–997. [Google Scholar]
Svensén, M.; Bishop, C.M. Pattern recognition and machine learning. Technometrics 2007, 49, 366. [Google Scholar]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar] [CrossRef]
Zeng, A.; Song, S.; Nießner, M.; Fisher, M.; Xiao, J.; Funkhouser, T. 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 199–208. [Google Scholar] [CrossRef]
Zhou, Q.Y.; Park, J.; Koltun, V. Fast global registration. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 766–782. [Google Scholar]
Kenta-Tanaka et al. Probreg. Available online: https://probreg.readthedocs.io/en/latest/ (accessed on 23 July 2023).
Zhou, Q.Y.; Park, J.; Koltun, V. Open3D: A Modern Library for 3D Data Processing. arXiv 2018, arXiv:1801.09847. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]

Figure 1. The overview of the proposed PCR method. The equations denoted by the green dashed line are used to estimate the latent source GMM component, while the equations denoted by the orange dashed line in the lower right corner are used in the weighted SVD module.

Figure 2. The architecture of the proposed autoencoder module.

Figure 3. Comparison results on the noisy dataset. Even when noise is relative large (

σ^{2} =

0.04), the proposed method achieves accurate and robust performances.

Figure 3. Comparison results on the noisy dataset. Even when noise is relative large (

σ^{2} =

0.04), the proposed method achieves accurate and robust performances.

Figure 4. Qualitative registration results on the noisy ModelNet40 dataset (

σ^{2} = 0.04

). The figures in three rows show the registration results of three classes of objects, including person, stool, and toilet. The source and target point cloud are shown in green and blue, respectively.

Figure 4. Qualitative registration results on the noisy ModelNet40 dataset (

σ^{2} = 0.04

). The figures in three rows show the registration results of three classes of objects, including person, stool, and toilet. The source and target point cloud are shown in green and blue, respectively.

Figure 5. Comparison results on different density levels. The results confirm that our algorithm consistently exhibits accuracy and robustness in sparse PCR experiments.

Figure 6. Qualitative registration results on noisy ModelNet40 dataset (

σ^{2} = 0.04

). Figures in two rows show the registration results of two classes of objects, including wardrobe and table. The results show that our algorithm has better robustness to the used sparse data.

Figure 6. Qualitative registration results on noisy ModelNet40 dataset (

σ^{2} = 0.04

). Figures in two rows show the registration results of two classes of objects, including wardrobe and table. The results show that our algorithm has better robustness to the used sparse data.

Figure 7. Qualitative registration results on partial data. The figures show the registration results of CPD, DeepGMR, and ours.

Figure 8. Qualitative registration results on the real-world dataset. Figures in two rows show the registration results of two classes of scenes, including the kitchen and office.

Figure 9. Comparison results of different density levels on the 7Scene dataset. The point numbers of the source and target point clouds range from 1 k to 10 k. In all registration experiments, our method maintains accuracy and robustness.

Figure 10. Qualitative registration results on partial data in the 7Scene dataset. The figures show the registration results of CPD, DeepGMR, and ours.

Table 1. ModelNet40: Comparison results of the unseen categories without Gaussian noise. The best results are displayed in bold, while the second-best results are underlined.

	Rotation Error ( $^{\circ}$ )		Translation Error (m)
	RMSE ↓	MAE ↓	RMSE ↓	MAE ↓
ICP	30.351408	23.879347	0.291203	0.250332
FGR	25.922915	22.460184	0.006555	0.004499
CPD	4.786544	0.762930	0.003145	0.000309
PointNetLK	9.047126	1.736442	0.039545	0.006354
DeepGMR	4.902193	2.541245	0.003074	0.002126
DCP	2.682713	1.802736	0.005024	0.003697
FMR	5.939945	1.234902	0.020327	0.003700
Ours	3.064052	1.609843	0.001325	0.000938

Table 2. ModelNet40: Comparison results of the test on ModelNet40. The point numbers of two point clouds are both 512. The best results are displayed in bold, while the second-best results are underlined.

	Rotation Error ( $^{\circ}$ )		Translation Error (m)
	RMSE ↓	MAE ↓	RMSE ↓	MAE ↓
ICP	39.799759	36.863148	0.288150	0.250225
FGR	16.473726	5.050144	0.040978	0.015393
CPD	9.001674	4.457861	0.003620	0.028238
PointNetLK	19.243651	4.851967	0.029123	0.005638
DeepGMR	4.770106	2.563812	0.003002	0.002057
DCP	38.305710	21.316710	0.009296	0.006652
FMR	10.119906	3.569691	0.019505	0.007553
Ours	3.373327	1.880049	0.001526	0.001117

Table 3. ModelNet40: Comparison results of the test on the expanded transformation. The best results are displayed in bold, while the second-best results are underlined.

	Rotation Error ( $^{\circ}$ )		Translation Error (m)
	RMSE ↓	MAE ↓	RMSE ↓	MAE ↓
ICP	48.183731	44.349880	0.288150	0.250225
FGR	26.298521	10.552604	0.051524	0.022009
CPD	19.917654	9.409087	0.041613	0.030905
PointNetLK	38.747494	16.434245	0.047938	0.016704
DeepGMR	9.328154	4.062377	0.003980	0.002622
DCP	11.718525	5.886914	0.003053	0.002211
FMR	26.887462	12.422495	0.039190	0.018913
Ours	6.220909	3.203493	0.001616	0.001138

Table 4. ModelNet40: Comparison results of the test on partial data. The point numbers of two point clouds are 1024 and 921, respectively.

	Rotation Error ( $^{\circ}$ )		Translation Error (m)
	RMSE ↓	MAE ↓	RMSE ↓	MAE ↓
ICP	25.922915	22.460186	0.494428	0.410775
FGR	27.229246	10.612548	0.053987	0.027252
CPD	11.553748	5.624317	0.042879	0.032712
PointNetLK	28.036747	11.787026	0.066898	0.040418
DeepGMR	24.422211	11.974214	0.041303	0.036290
DCP	31.406713	18.313848	0.044668	0.038792
FMR	28.328537	14.110557	0.045046	0.029118
Ours	8.340002	4.352821	0.043925	0.038332

Table 5. 7Scene: Comparison results of the test on real-world scenes. The best results are displayed in bold, while the second-best results are underlined.

	Rotation Error ( $^{\circ}$ )		Translation Error (m)
	RMSE ↓	MAE ↓	RMSE ↓	MAE ↓
ICP	25.621710	22.087355	0.285877	0.246103
FGR	32.466103	17.400328	0.026984	0.012873
CPD	13.185246	7.772848	0.037270	0.028007
PointNetLK	15.288235	2.812165	0.013135	0.002467
DeepGMR	13.488777	6.892267	0.002648	0.001648
DCP	5.859264	3.314787	0.005533	0.003899
FMR	6.939286	1.400840	0.008716	0.001800
Ours	5.295327	3.909664	0.001243	0.000864

Table 6. 7Scene: Comparison results of the test on partial real-world scenes.

	Rotation Error ( $^{\circ}$ )		Translation Error (m)
	RMSE ↓	MAE ↓	RMSE ↓	MAE ↓
ICP	25.922913	22.460184	0.356067	0.292270
FGR	16.824322	9.779890	0.033003	0.021518
CPD	17.643698	10.207256	0.047480	0.034908
PointNetLK	13.717697	6.552353	0.034506	0.021542
DeepGMR	28.637363	17.125318	0.025591	0.021899
DCP	9.964090	7.042327	0.026280	0.022408
FMR	43.034859	30.941980	0.054910	0.039119
Ours	7.464927	5.561217	0.025730	0.021929

Table 7. Comparison results of the ablation study on the noisy ModelNet40 dataset (

σ^{2} = 0.01

).

Table 7. Comparison results of the ablation study on the noisy ModelNet40 dataset (

σ^{2} = 0.01

).

	Rotation Error ( $^{\circ}$ )		Translation Error (m)
	RMSE ↓	MAE ↓	RMSE ↓	MAE ↓
DeepGMR	6.416541	2.774974	0.003053	0.002131
Ours-V1	3.146087	1.535921	0.003167	0.002299
Ours-V2	3.103110	1.627020	0.001301	0.000921

Table 8. Comparison of registration runtime results for different algorithms.

	Registration Runtime (ms)
	ModelNet40	7Scene
	1024	1 k	5 k	10 k
ICP	1.16	486.48	506.50	535.04
FGR	26.80	507.22	587.40	896.00
CPD	6.36	490.74	493.65	509.09
PointNetLK	64.32	156.96	261.15	454.83
DeepGMR	3.72	485.04	480.81	494.96
DCP	11.95	493.92	1828.99	2431.65
FMR	22.23	479.49	479.18	478.65
Ours	9.20	486.61	535.36	646.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Lyu, E.; Min, Z.; Zhang, A.; Yu, Y.; Meng, M.Q.-H. Robust Semi-Supervised Point Cloud Registration via Latent GMM-Based Correspondence. Remote Sens. 2023, 15, 4493. https://doi.org/10.3390/rs15184493

AMA Style

Zhang Z, Lyu E, Min Z, Zhang A, Yu Y, Meng MQ-H. Robust Semi-Supervised Point Cloud Registration via Latent GMM-Based Correspondence. Remote Sensing. 2023; 15(18):4493. https://doi.org/10.3390/rs15184493

Chicago/Turabian Style

Zhang, Zhengyan, Erli Lyu, Zhe Min, Ang Zhang, Yue Yu, and Max Q.-H. Meng. 2023. "Robust Semi-Supervised Point Cloud Registration via Latent GMM-Based Correspondence" Remote Sensing 15, no. 18: 4493. https://doi.org/10.3390/rs15184493

APA Style

Zhang, Z., Lyu, E., Min, Z., Zhang, A., Yu, Y., & Meng, M. Q.-H. (2023). Robust Semi-Supervised Point Cloud Registration via Latent GMM-Based Correspondence. Remote Sensing, 15(18), 4493. https://doi.org/10.3390/rs15184493

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Semi-Supervised Point Cloud Registration via Latent GMM-Based Correspondence

Abstract

1. Introduction

1.1. Related Work

1.1.1. ICP Method and Its Variants

1.1.2. Probabilistic Registration Methods

1.1.3. End-to-End Learning-Based Registration Methods

1.2. Recap of Traditional GMM-Based Point Cloud Registration

2. Methods

2.1. Problem Statement

2.2. Feature Extraction Module

2.3. Latent GMM Module

2.4. Weighted SVD Module

2.5. Loss Definition

2.5.1. Chamfer Loss

2.5.2. Transformation Loss

2.5.3. Total Loss

3. Results

3.1. Evaluation Metrics

3.2. Implementation Details

3.3. Registration Results in the ModelNet40 Dataset

3.3.1. Robustness to Noise

3.3.2. Robustness to Sparse Data

3.3.3. Robustness to Different Transformation

3.3.4. Robustness to Partial Data

3.4. Registration Results in the 7Scene Dataset

3.4.1. Robustness to Sparse Data

3.4.2. Robustness to Partial Overlap

3.5. Ablation Study

3.6. Runtime Analysis

4. Discussion

4.1. Supervised Learning or Semi-Supervised Learning

4.2. Correspondence Strategy

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI