Data-Weighted Multivariate Generalized Gaussian Mixture Model: Application to Point Cloud Robust Registration

In this paper, a weighted multivariate generalized Gaussian mixture model combined with stochastic optimization is proposed for point cloud registration. The mixture model parameters of the target scene and the scene to be registered are updated iteratively by the fixed point method under the framework of the EM algorithm, and the number of components is determined based on the minimum message length criterion (MML). The KL divergence between these two mixture models is utilized as the loss function for stochastic optimization to find the optimal parameters of the transformation model. The self-built point clouds are used to evaluate the performance of the proposed algorithm on rigid registration. Experiments demonstrate that the algorithm dramatically reduces the impact of noise and outliers and effectively extracts the key features of the data-intensive regions.


Introduction
The purpose of point cloud registration is to extract the key points or features corresponding to the target point set and the point set to be registered as well as find the transformation mapping relationship between two point sets [1][2][3][4][5].This task involving image processing, data analysis and computer vision has essential applications in many practical scenarios.
For instance, point cloud registration is essential for driverless technology.Indeed, various hardware sensors, such as lidars, short-wave radars and depth-of-field cameras, could be mounted on the crew-less vehicle and point cloud registration technology can be used to fuse data collected from multiple sensors [6][7][8] to provide fundamental functions such as scene stitching, vehicle positioning [9], and typical scene recognition and matching for vehicle control strategies.For example, the authors in [10] proposed a framework used for unmanned vehicles based on end-to-end point cloud registration deep networks.They obtained the corresponding relationship by learned matching probabilities (LMP) among a group of candidate points related to static characteristics instead of using existing points.Point cloud registration is also applied in medical imaging [11,12].Indeed, to facilitate the diagnosis of the disease, several medical images from different instruments, such as Positron Emission Tomography (PET), Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), need to be combined [13].For example, authors in [14] improved the popular iterative Closest Point (ICP) algorithm by combining 3D scale-invariant feature transform to register 3D free-form closed surfaces (human skull models).In another work, the authors in [15] used the Gaussian mixture model (GMM) with a semi-supervised EM algorithm and geometric constraints to achieve retinal image registration.Moreover, 3D reconstruction also makes extensive use of point cloud registration technology [16,17].For large buildings, for example, general scanning equipment cannot complete the whole scanning process at one time because the scanning range is limited.It requires scanning multiple parts and then splicing point clouds together [18].In other cases, the objects to be observed may be dynamic or have complex surface characteristics.The accuracy of these features plays a crucial role in modelling and analysis; repeated scanning to build a fusion model can improve details [19].
Considering the point set's acquisition perspective, noise and outliers generated in the acquisition process, as well as the deformation and missing parts of the point set caused by other factors, point cloud registration is a challenging task [1].Various methods are proposed to enhance the robustness and accuracy of point cloud registration.In terms of pairwise registration, considering only two point sets, there are three main registration categories: distance-based methods including ICP [20], Graph Matching (GM) [21], filterbased methods and probability-based methods [22].
However, most point-to-point methods are prone to fall into local optima, especially if there are some similar point structure blocks in the point set.To improve this situation, registration based on mixed models (most are GMM-based) has proven effective [22][23][24][25].The core idea is to model and describe the probability distribution of the point set using a parameterized mixed model and find the closest response of the mixture model to determine the corresponding relationship between point sets.These models perform well even if the sampling rate of two point clouds is not the same.
Nonetheless, there are two evident deficiencies when using the GMM.First, the GMM cannot effectively describe certain non-Gaussian distributions, such as the typical peaktrailing distributions in signal processing [26][27][28][29].In the point cloud, intuitively speaking, space with dense data will carry more information.These high-density areas may represent the crucial feature structures in the point cloud, yet the GMM cannot effectively fit these high-density blocks, and its results tend to be relatively average.Secondly, GMM is easy to be disturbed by noise.Different noise levels will result in divergent response parameters in the mixture model, which could compromise the final registration accuracy [30].
The goal of this paper is to propose a point cloud registration method based on the weighted multivariate generalized Gaussian mixture model (WMGGMM) that we develop in this paper to address the difficulties above.The generalized Gaussian distribution (GGD) belongs to the family of elliptic distributions.Due to the addition of a shape parameter, it has a strong ability to describe various data distributions, particularly considering the data peak [31,32].Its special cases contain Gaussian and Laplace distributions, and therefore it is widely used in feature extraction [33][34][35] and texture retrieval [36][37][38].The mixture model of GGD has also been applied in several applications such as image processing and segmentation [39][40][41] and human movement recognition [42,43].
We show that the generalized Gaussian mixture model (GGMM) is an alternative worthy choice for point cloud registration when real-time analysis is not required.Although most parameters of GGMM have no closed solutions, its offers high registration accuracy and robustness.In addition, we introduce weights to reinforce the ability to pay attention to dense areas and reduce the influence of noise and outliers on the parameter estimation process.After obtaining the GGMM models for the target scene and the scene to be registered, the approximate Kullback-Leibler divergence (KLD) is computed to measure the models' difference.This will be used as a loss function to find the optimal registration parameters through the stochastic optimization algorithm.
The paper is organized as follows.Section 2 will tackle the WMGGMM parameter estimation.This section will also give the complete learning algorithm.Section 3 presents some experimental results conducted on a synthetic data set to verify the algorithm's performance and robustness.It is also devoted to our approach to obtain the optimal registration parameters using stochastic optimization, and the final registration effect performed on rigid transformation is presented.Conclusions and future works are finally reported in Section 4.

Weighted Multivariate Generalized Gaussian Mixture Model
In the majority of existing works, the feature independency assumption is used to simplify modeling high-dimensional data, which results in a distribution which is a product of one-dimensional generalized Gaussians [41,42].Unlike these works, we use here the PDF as defined in [31]: where Γ(•) denotes the Gamma function; x, µ ∈ R d and µ are the mean vectors; Σ is a d × d real symmetrical positive definite matrix called the scatter matrix; and m and β > 0 are the scale and shape parameters of the MGGD.The shape parameter controls the peak's sharpness and tail extension in the probability density function.It is worth noting that when β = 1, the MGGD becomes the multivariate Gaussian distribution.If β < 1, the distribution will have a sharper peak and a larger tail.However, when β tends to infinity, MGGD is close to a multivariate uniform distribution [31].

Data Weighting
Recent research has shown that data weighting could improve modeling capabilities (see, for example, [44] and references therein).Here, we follow these works via an extension to GGMM-based modeling.As proposed in [44], for example, each sample obtains a corresponding weight w greater than 0. If we consider the likelihood for a specific sample x, we can write it as p(x; µ, Σ, β, m) w .This is not a probability distribution because the integral is not equal to one.However, we notice that p(x; µ, Σ, β, m) w ∝ p(x; µ, Σ, β, mw −1/β ), and thus we can then obtain the PDF with weight as a parameter: where θ = {µ, Σ, β, m}.It can be seen that individual weight directly influences m in the PDF here.Still, in parameter estimation, the weights will simultaneously affect the scale and shape in the final mixture.Having the new distribution in Equation ( 2) in hand, we can obtain the K-component mixture: where Θ = {π 1 , ..., π K , θ 1 , ..., θ K } denotes the parameter set of the model, and (π 1 , ..., π K ) are the mixing coefficients which satisfy π k > 0 and ) are the parameters of the k th component.Let X = {x 1 , ..., x N } represent the whole data set and W = {w 1 , ..., w N } is the weight set, then the log-likelihood function is given by:

MGGD with Fixed Weights
For maximum likelihood estimation, the missing variables Z = {z 1 , ..., z N } are introduced.If x i is generated by the kth component, then z i = k.We assume that the weights are already known by prior knowledge, so the expected complete data log-likelihood (Q function) can be written as: where E P [•] is the expectation with respect to the distribution P and Θ = represents items only related to Θ. Subsequently, the optimal parameters Θ * can be obtained by the EM algorithm.In the expectation step, the posteriors are updated with: By taking the derivatives of the complete data log-likelihood with respect to the parameters and making the results equal to zero, we can obtain the parameter update formulas: After replacing m k in Equation ( 10) with the previous result in Equation (8), we can find that Σ k is the independent form m k .If we let we have: Furthermore, it is worth noting that µ k and Σ k do not have closed solutions; they are both solved through the fixed point (FP) method.According to Banach's fixed point theorem [45], if (S, d) is a non-empty complete metric space with a contraction mapping T : S → S, then there exists a unique fixed point S * in S. The authors in [31] proved the convergence of Σ k and explained the existence and uniqueness of the fixed point, based on the fact that β ∈ (0, 1] and Σ is a positive definite real symmetric matrix.This is also consistent with our assumptions about β.We introduce the Frobenius norm defined in Equation (12) to measure the difference between S n and S n−1 in the fixed point iteration.The process stops when the approximate solution satisfies the preset precision.Furthermore, to ensure the convergence of the FP equation, the value range of weights should also be between zero and one.
Then parameter β can be estimated using Newton-Raphson iterations [42][43][44]: where ξ is the learning rate used to prevent the oscillation and overflow in the iterative process, and it is usually around 0.1.If necessary, the method of exponential decay can be applied to make the convergence more stable.f (β k ) are given as follows: where Ψ(•) is the digamma function.

Weights Considered as Random Variables
Above, we have derived the WMGGMM parameter updating formulas with fixed weights.However, it is pointed out in [44] that the Bayesian formalism is more inclined to treat parameters as random variables and update the posterior of parameters by combining the parameters prior with the observed sample.Under this framework, the limitation of insufficient prior knowledge on the accuracy of weights is reduced, and the inference process of weights will also be more explanatory.As mentioned before, the generalized Gaussian distribution belongs to the family of elliptic distributions, and thus we select the same prior distribution, i.e., Gamma distribution, as in [44].At this point, the prior and posterior of the parameters have the same form.The advantage is that when we make a new observation, we do not have to re-calculate the whole process but only directly obtain the posterior distribution through the parameters, which undoubtedly simplifies the updating process of weight parameters.Then, the posterior will become the prior in the next calculation, Therefore, we can obtain: where G(w; a, b) is the Gamma distribution, and φ = {a, b} are the parameters of the prior distribution of w.The mean and variance of random variable w are given by: Due to the addition of prior parameters, the log-likelihood of the complete data becomes the following form: where Φ = {φ 1 , ..., φ N } and φ i = {a i , b i }.The posterior distribution factorizes on i as follows: Each of these product terms can be expressed as two-factor expressions: According to the above formula, the expectation step in the EM algorithm is divided into two parts (E-Z step and E-W step).In the E-Z step, the marginal posterior distribution of z i is obtained by integrating over w i : where p(x; µ, Σ, β, m, a, b) is given as: In step E-W, according to the property of conjugate distribution, we can obtain: Thus, the updating formulas of prior parameters can be obtained: This explains the outliers shielding feature of the weighted algorithm.As an outlier is by definition far from the center of all components, it has a low posterior weight wik for each component and then a low mean posterior probability wi .By expanding Equation (19), we can obtain a result similar to the Q function with fixed weights: Therefore, w i will be replaced with wik in all the parameter updates formulas of the mixture model.Since the equations are very similar, they are not repeated here.

Automatic Determination of the Number of Components
The model selection problem is tackled using the minimum message length (MML) criterion as proposed in [46]: where I C (Θ) denotes the expected complete Fisher information matrix (FIM) and D(Θ) is the dimensionality of the model.Using a similar process as in [46], we can show that where K + is the set of non-empty components and K + is the number of elements in it.Moreover, we can rewrite the formula for calculating π k in the maximization step of the EM algorithm: The threshold for minimum support is high when the number of non-empty components is large at the beginning.Some components can be removed quickly.In the process of updating components one by one, the threshold gradually approaches the situation in [46] as the number of non-empty components decreases.

Complete Algorithm
The complete steps are outlined in Algorithm 1.We use k-means to initialize π k , µ k , Σ k .The initial parameter β k is specified as 0.5, and the parameter m k is calculated according to the pre-clustering results and the formula in [31]: where x i ∈ X k is the ith sample for the kth cluster, and N k is the number of samples.We adopt the same data similarity measurement method based on the Gaussian kernel as in [44] for the initialization of weights.However, due to the constraint of the weight range, we modify it as follows: where d 2 (x i , x j ) denotes the Euclidean distance, S q i is the set containing q nearest neighbors of x i , and σ is a positive scale.The default setting for q is 20.We can then calculate the initialization of the prior parameters of the weight through Equations ( 17) and ( 18), i.e., a i = w 2 i and b i = w i .Figure 1 shows the impact of different σ on the weights; its significance is to change the degree of differentiation of the weights.When there is a small σ, the difference in the weights between dense and sparse areas becomes more pronounced.Conversely, the distribution of weights will be relatively close.In other words, a smaller σ is more beneficial at removing outliers and noise.However, it is important to note that when the σ is too small, the weight operation is equivalent to removing most of the points that are away from the clustering center in the data distribution.The actual points involved in parameter identification are reduced, leading to inadequate support of the components.Furthermore, the resulting mixed model parameters will have a large deviation from the original data distribution.Therefore, the selection of σ is a balance between model robustness and model accuracy.We set it to 25 in our case.Algorithm 1 Proposed WMGGMM algorithm with component-wise EM procedure.

Synthetic Data
First, we demonstrate the performance of the WMGGMM model through synthetic data.The method for generating data points that follow an MGGD comes from [31]: where x is a random vector that follows an MGGD with scatter matrix C = mΣ and shape parameter β, and = denotes equivalence on the distribution; x is a random vector uniformly distributed on a unit sphere and τ is a positive scalar random variable that satisfies τ 2β ∼ Γ(d/(2β), 2).Tables 1 and 2 show the parameter estimation results for two generated two-dimensional data sets from the 4-and 3-component mixture models, respectively, where ρ k = cov(X, Y)/ var(X)var(Y) represents the correlation coefficient used to measure the slope of the scatter matrix.We can see that the estimated parameters are close to the real ones.According to the correlation coefficient comparisons, the slopes of the real and estimated scatter matrices are also close.Due to the role of weight, the points essentially involved in parameter estimation are concentrated in data-intensive areas, leading to a reduced scale parameter m.However, the change in parameter β is not necessarily a decrease.Suppose that β is small in the default setting.In the big trailing part, the data points are distributed sparsely.The weight operation will remove most of these points, so the shape of data is changed, and the final estimated β will become larger instead.
Figure 2 presents the parameter estimation results at different noise levels.Parameter estimation is carried out after the proportional addition of random uniform noise in the primary distribution area ([−5.25,5.25]) of the original data.When σ is appropriately selected, the weights remove most of the noise and outliers, and the actual points involved in parameter estimation are mostly in the desired original data distribution.The final results show that the mixture models of the three different cases are very similar, which effectively proves the WMGGMM algorithm's robustness.The results of the mixture model for overlapping data are shown in Figure 3.The introduction of shape parameter β enables MGGD to improve the identification ability of data distribution peaks.Although the overall distribution of two components overlap, the WMGGMM algorithm can still accurately identify the parameters of each component when their centers do not coincide.4 displays the execution time of the WMGGMM algorithm under different conditions (the default number of components is four).The average value of five independent running times is taken as the final result, and the unit is seconds.Because most of the algorithm's parameters need to be estimated iteratively (fixed-point equations and Newton-Raphson method), the algorithm's running time is much higher than that of the standard EM algorithm.The anti-interference performance of the model is improved by sacrificing some efficiency.The time here can only be used as a relative reference since the algorithm's run time is influenced by computer performance, hardware and software acceleration, and randomness of parameter initialization.

Point Cloud Registration Using WMGGMM
We consider the approach proposed in [23], in the case of GMM, to perform WMGGMMbased registration.Two mixture models are adopted to describe the target scene and the scene to be registered.Then, the difference between these two models is applied to update the registration parameters.Nevertheless, the algorithm proposed in [23] is still a method based on point to point in essence.The data pre-clustering is not considered, but each point in the data set is used as a mean to initialize a mixture component.This approach is equivalent to converting the entire data set into a mixture model with multiple simple components.However, this method of initializing the mixture model is not suitable in the point cloud with a large data volume.The subsequent L2 divergence simplifies the derivation based on the transformation model.However, the derivative optimization process is closely related to the expression of the specific model.If the transformation model is changed, all optimization processes need to be re-derived.Therefore, we propose a method based on WMGGMM to pre-cluster the target scene and the scene to be registered and extract the key features to form the mixture models.Then, the optimal registration parameters are found by using stochastic optimization through the KLD between the mixture models.The KLD of the two statistical models is defined as follows: Unfortunately, the KLD between the mixture models has no analytical expression, so several approximations have been proposed [47][48][49].Compared with GMM, the KLD of the generalized Gaussian mixture model using the inner product of the components is too complicated due to the introduction of shape parameters.Even though a method for calculating the KLD of two MGGDs is proposed in [38], the matched bound and variational approximations are also unfeasible because the premise of this approach is that all MGGDs have a zero mean.However, the mean value of components is significant in the point cloud registration scenario.Therefore, we finally use Monte Carlo sampling to calculate the KLD between two GGMMs: where P S (x) and P M (x) are the probability density functions of the GGMM obtained from the target scene and the scene to be registered, respectively, and {x i } n i=1 denotes n samples taken from P S (x).In the above formula, the sum of the probability of samples replaces the integral, and the Monte Carlo method is the only method that can approximate the actual value of KLD when there are enough sample points.The Gibbs sampling, one of the Markov Chain Monte Carlo (MCMC) sampling techniques, is applied in this process using n = 1000 .
We assume that the point cloud transformation model is: where X 0 and X are the point sets before and after the transformation, and Ω = {ω 1 , ..., ω m } is the parameter set of the transformation model.Then, we can express the point cloud registration problem in the following form: where S is the target scene point set, M is the point set to be registered, and ggmm(•) denotes the obtained PDF of the GGMM from a data set using the WMGGMM algorithm.The complete point cloud registration algorithm using WMGGMM is shown in Algorithm 2. However, it is only a general framework and does not specify a concrete random optimization method.The stochastic optimization technique can be selected depending on needs, as long as the definition of the loss function in (38) is satisfied, and the optimization domain is Ω.The simulated annealing (SA) algorithm is used in this paper.
Algorithm 2 Point cloud registration algorithm based on WMGGMM and stochastic optimization.
Input: Scene set: S, Model set: M, Initial transformation parameter set: Ω (0) Output: Optimal transformation parameter set: Ω * Set: KLD old = 0, KLD new = 0, Ω old = Ω (0) ggmm s = GGMM(S) Gibbs sampling on ggmm s to get: When the transformation model is rigid, i.e., X = R α X 0 + t, the optimization process can be further simplified, where R α is the rotation matrix with an angle of α and t is a translation vector.The transformation of the data set is equivalent to the rigid change of the corresponding mixture model.We can obtain µ k = R α µ k + t and Σ k = R T α Σ k R α ; the shape parameter β and scale parameter m are not affected during the transformation.In other words, we only calculate the mixture models of the target scene and the scene to be registered through WMGGMM at the beginning, and the parameter estimation does not need to be repeated during optimization.The transformation models used in the experiments below are all rigid transformations and all test data are generated based on the method in the previous section.
The unlikeness in sampling rate is a common challenge in point cloud registration.In general, the original scene will have a higher sampling rate to achieve more accurate modeling.In comparison, the scene to be registered may have a lower sampling rate due to the limitations of the sampling environment and tasks.Figure 5 shows the registration results of point sets with different sampling rates.The target scene (blue) has 1200 samples, while the scene to be registered (red) has 800.Although the sampling rate varies, the two scenes' data meet similar statistical distributions, so the method based on GGMM can still effectively register the two sets with rotational and translational errors of 1.5% and 4.3%, respectively.The registration results at different noise levels are given in Figure 6.The outcomes show that our method can effectively remove the noise and outliers' interference and extract the main features of the point cloud for matching, and the algorithm has good robustness with average rotational and translational errors of 3.5% and 6.3%, respectively.In Figure 7, we designed a point cloud shaped like the Chinese character "zhi" with a symmetrical approximate center (rotational and translational errors of 6.1% and 9.3%, respectively).The distinction between it and the original graph after 180 degrees of rotation is only at one "point".Hence, it is straightforward to fall into the optimal local solution in registration optimization.However, in several experiments, only 6% fell into local optima.Compared with derivative optimization, stochastic optimization provides the ability to jump out of the local optimal solution.

Conclusions
This paper proposes a weighted multivariate generalized Gaussian mixture model and combines it with the stochastic optimization algorithm for point cloud rigid registration.This method requires enough samples in the registration scene to meet the minimum support of components.It extracts the data's mass features rather than the edge features (contour and shell), so it is suitable for substantial point clouds with a large data volume.The introduction of weights and the ability of generalized Gaussian distributions to describe peak data can effectively reduce the influence of noise and outliers and obtain the critical features of data-intensive regions.Experimental results attest that the algorithm has sufficient robustness.The stochastic optimization algorithm reduces the coupling between algorithm modules, intensifies the algorithm's expansibility, and provides a more potent global optimization capability.However, in the mixture model's parameter estimation process, almost all parameters need to be learned iteratively; therefore, some performance is sacrificed to enhance the algorithm's accuracy.In addition, this work only considered a 2D scenario.Future work will be committed to improve the parameter estimation approach to improve the algorithm's performance and extend the approach to 3D scenarios.

Figure 3 .
Figure 3.The results of parameter estimation in the case of component overlap.

Figure
Figure 4  displays the execution time of the WMGGMM algorithm under different conditions (the default number of components is four).The average value of five independent running times is taken as the final result, and the unit is seconds.Because most of the algorithm's parameters need to be estimated iteratively (fixed-point equations and Newton-Raphson method), the algorithm's running time is much higher than that of the standard EM algorithm.The anti-interference performance of the model is improved by sacrificing some efficiency.The time here can only be used as a relative reference since the algorithm's run time is influenced by computer performance, hardware and software acceleration, and randomness of parameter initialization.

Figure 5 .
Figure 5. Registration results of point clouds with different sampling rates.(a) original; (b) centralized; (c) registration.

Table 1 .
Estimation of the mixture model parameters for the first synthetic data set.

Table 2 .
Estimation of the mixture model parameters for the second synthetic data set.