Robust Cuboid Modeling from Noisy and Incomplete 3D Point Clouds Using Gaussian Mixture Model

: A cuboid is a geometric primitive characterized by six planes with spatial constraints, such as orthogonality and parallelism. These characteristics uniquely deﬁne a cuboid. Therefore, previous modeling schemes have used these characteristics as hard constraints, which narrowed the solution space for estimating the parameters of a cuboid. However, under high noise and occlusion conditions, a narrowed solution space may contain only false or no solutions, which is called an over-constraint. In this paper, we propose a robust cuboid modeling method for point clouds under high noise and occlusion conditions. The proposed method estimates the parameters of a cuboid using soft constraints, which, unlike hard constraints, do not limit the solution space. For this purpose, a cuboid is represented as a Gaussian mixture model (GMM). The point distribution of each cuboid surface owing to noise is assumed to be a Gaussian model. Because each Gaussian model is a face of a cuboid, the GMM shares the cuboid parameters and satisﬁes the spatial constraints, regardless of the occlusion. To avoid an over-constraint in the optimization, only soft constraints are employed, which is the expectation of the GMM. Subsequently, the soft constraints are maximized using analytic partial derivatives. The proposed method was evaluated using both synthetic and real data. The synthetic data were hierarchically designed to test the performance under various noise and occlusion conditions. Subsequently, we used real data, which are more dynamic than synthetic data and may not follow the Gaussian assumption. The real data are acquired by light detection and ranging-based simultaneous localization and mapping with actual boxes arbitrarily located in an indoor space. The experimental results indicated that the proposed method outperforms a previous cuboid modeling method in terms of robustness.


Introduction
Geometric primitive-based modeling is a widely used method for abstracting point clouds for applications such as scene reconstruction [1][2][3][4][5][6], rendering [7][8][9][10][11], and shape processing [12,13], as geometric primitives are lighter and easier to manipulate than raw point clouds. Moreover, these characteristics can reduce the labor needed for inspection, assessment, and management. Hence, Geometric primitive-based modeling is also useful in the fields of extraction of bridge components [14][15][16][17] for inspection purposes. Among geometric primitives, the cuboid model is the most practical for substituting point clouds acquired by simultaneous localization and mapping (SLAM). A cuboid can substitute a large portion of point clouds as it is frequently observable in man-made environments. Moreover, a cuboid model is light and easy to manipulate because it renders six spatially related planar patches simultaneously with nine parameters. However, most point clouds generated by SLAM contain many defects, such as noise and occlusion. Therefore, a cuboid modeling method must be robust to remain useful.
A traditional cuboid modeling method [4,10,[18][19][20] consists of two processes: plane detection and spatial constraint validation, which impose hard constraints to narrow the solution space, as shown in Figure 1a. In the plane detection process, first, planes are sequentially detected from point clouds, and subsequently, cuboid candidates consisting of at least two of the detected planes are searched. Specifically, plane detection extracts planes from three-dimensional (3D) point clouds. This process bounds the solution space by providing a set of planes that could form the face of a cuboid. Because the solution space of the cuboid parameters is restricted to the set of planes, an error in the detected planes propagates to the cuboid modeling results. Therefore, plane detection accuracy is important. The most well-known approaches for plane detection are random sample consensus (RANSAC) [21] and Hough transform [22]. Wu et al. [19] used an RANSAC-based plane detection approach proposed by Li et al. [9], which considers global relations, such as orientation, placement, and equality alignments, to enhance the plane detection accuracy. Wei et al. [10] used the Hough transform to detect planes. In addition, they extracted the Minimum Bounding Rectangle (MBR) from the detected planes because it closely represented the actual face of a cuboid.
In the spatial constraint validation process, the detected planes are determined to belong to a cuboid candidate based on their satisfaction of spatial relations, such as proximity, perpendicularity, and parallelism. To be more specific, because a cuboid is characterized by six planes that are orthogonal to adjacent planes, previous methods [4,10,[18][19][20] have used these characteristics as spatial constraints. Therefore, spatial constraints, such as proximity and orthogonality, are used to narrow down the solution space by excluding combinations of planes that violate the constraint. In addition, they added other spatial constraints depending on the input or detected planes. Cuboid modeling methods with a single RGB-D image as an input [4,18] utilize camera position to create constraints. Jiang and Xiao [4] used a concept called solidness, which is the proportion of a cuboid behind a 3D scene surface. Mishima et al. [18] checked convexity with the center of mass and normal of a point, which aligned with the viewpoint. Wei et al. [10] included the edges of an MBR as constraints to check proximity and orthogonality.
Because the aforementioned two processes are hard constraints that bound the solution space, the solution space of the cuboid parameters is narrowed down. However, narrowing the solution space in this manner may lead to the exclusion of the true solution owing to the following factors. First, the plane detection results could be inaccurate for use as hard constraints because of sensor noise, motion noise, and incomplete measurements owing to occlusion. In addition, because the parameters of a plane are optimized for the point clouds associated with the plane, the absence of an appropriate data association between the point clouds and faces of a cuboid also increases the error in the detected plane. Furthermore, the high occlusion of a cuboid implies a few plane combinations that can satisfy the spatial constraints. Consequently, applying the abovementioned hard constraints under high noise and occlusion may lead to an over-constraint, which results in a false or no solution in the solution space.
To address the aforementioned problems caused by hard constraints, we propose using soft constraints for cuboid modeling by adopting a Gaussian mixture model (GMM) [23]. Unlike hard constraints, which cut and bound the solution space multiple times, the proposed soft constraint strategy assigns a smoothly changing probability to the solution space, as shown in Figure 1b. Specifically, a Gaussian model describes the point cloud data (PCD) noise well, and a linear combination of such models (mixture of Gaussian models) covers all possible cases of data association, even under severe occlusion. Therefore, our strategy of conducting one-step optimization iteratively estimates the true solution while preventing possible failures due to multiple optimization steps. Finally, the optimization procedure is accurate and efficient owing to the use of an analytic gradient, which was newly derived in this study. The contributions of this study are as follows: • We introduced a GMM to estimate a cuboid model directly from point clouds to ensure model robustness against noise and occlusion by simultaneously considering noise, spatial constraints, and data association. • We derived analytic partial derivatives of the expected values of GMM with respect to cuboid parameters to achieve effective optimization. • We verified and evaluated the advantages of the proposed approach over a previous cuboid modeling method by conducting extensive experiments using synthetic and real data.
The remainder of this paper is organized as follows: The formulation of soft constraints to estimate cuboid parameters, optimization of the soft constraints, and implementation details arepresented in Section 2. Section 3 shows configurations and results of experiment. Subsequently, Section 4 discusses the validation of the robustness of the proposed method based on experiments. Finally, Section 5 concludes that the proposed method robustly models a cuboid from point clouds.

Matetials and Methods
In this section, we describe a cuboid model as a GMM based on a noise model of measured point clouds and describe the derivation of an expectation of the GMM, which is a soft constraint.

Gaussian Assumption of Point Distribution
We assumed that the point clouds from a planar object follow a Gaussian distribution. The ideal measurement of a planar object involves subsampled points on the same plane. However, real point clouds acquired by SLAM are randomly distributed around a measured planar surface owing to the sensor and motion noise, as shown in Figure 2a. Therefore, we used the Gaussian model, as shown in Figure 2b, for the point distribution after evaluating planar object's point distribution along the plane's normal vector. We performed quantitative and qualitative evaluations to verify the Gaussian distribution assumption. First of all, we segmented the point clouds of a wall from the results of the SLAM algorithm [6] that was used in the real data experiment. Then, we calculated the distance of each point from the wall. Subsequently, we visualized the distances using a Quantile-Quantile plot and histogram to qualitatively prove that it follows a Gaussian distribution as shown in Figure 3. The Quantile-Quantile plot shows distances are plotted in a straight line that represents a Gaussian distribution. In addition, the histogram has the shape of a Gaussian distribution.
For quantitative proof, we conducted normality tests [24] that consisted of a Kolmogorov-Smirnov test Limiting form (KS-Lim), Stephens Method (KS-S), Marsaglia Method (KS-M), Lilliefors test (KS-L)), Anderson-Darling (AD) test, Cramer-von Mises (CvM) test, Shapiro-Wilk (SW) test, Shapiro-Francia (SF) test, Jarque-Bera (JB) test, and D'Agostino and Pearson (DAP) test. As this code is not meant for large size of data, we subsampled the point clouds and performed the test for 10,000 iterations. As a result, the point clouds of a wall from the result of a SLAM [6] passed the test for all methods at the rate of more than 90%. According to these validations, we assumed that the point clouds from a planar object follow a Gaussian model.
The Gaussian model can be expressed as follows: where n is the normal vector of the measured plane, X is the acquired point, and p is an arbitrary point on the plane. This is the probability of point generation from the measured planar object. Because the faces of a cuboid are planar patches, Equation (1) is adopted for each face of a cuboid with cuboid parameters.

Cuboid as GMM
We assume that point clouds from planar objects follow a Gaussian distribution, as mentioned in Section 2.1. This assumption can be applied to point clouds generated from each face of a cuboid. Therefore, we can deduce that point clouds generated from a cuboid follow a GMM that consists of six Gaussian models that represent each face of a cuboid.
The log-likelihood function of a GMM is derived from Equation (1), and the basic notation used in Equation (2) is as follows: • Latent variables (z j , j = 1, 2, . . . , 6): τ 1 , τ 2 , . . . , τ 6 (Probability that a point is generated from the j-th Gaussian model.) • Size of a cuboid (width, depth, and height): Center of a cuboid: X C (3 × 1) • Euler angle of a cuboid orientation: θ θ x θ y θ z • X, Y, Z axes of the cuboid coordinate system: Here, j represents the Gaussian model of the corresponding face of the cuboid. If j is 1-3, the corresponding face is located in the positive direction in the cuboid coordinate system, and each j indicates a face perpendicular to the x-, y-, and z-axes, respectively. Additionally, the jth face is parallel to the (j + 3)th face. Therefore, the size of a cuboid and the X, Y, and Z axes of the cuboid coordinate system values used in 4-6th Gaussian model (µ 4-6 and u [4][5][6] ) are the same as those used in 1st-3rd Gaussian model(µ 1-3 and u 1-3 ).
The plane parameters of the face of a cuboid can be represented as cuboid parameters such as size(µ j ), center(X C ), and orientation(θ). Here, the center and orientation of a cuboid are represented as x, y, and z coordinates and the Euler angle based on the reference coordinate system, respectively. Consequently, the Gaussian model of each face of a cuboid can be represented based on Equation (1) by substituting n, p as cuboid parameters with the notation listed above.
(2) (± depends on j value. -when j < 4, else +) Here, n is replaced with (C(θ)u j ) because the normal vector of a face is the same as the axis of the cuboid orientation. The point where the center of the cuboid is projected onto the face is used as p, which is an arbitrary point on the plane.
If each point is labeled with the face of a cuboid where the point is measured, it would be the complete data. However, we acquired only point clouds without information on the face of the cuboid where the point originated. Therefore, we used the incomplete data log-likelihood function of a cuboid with Equation (2) as follows:

Expectation of GMM
The expectation of the GMM is calculated as the product of the conditional distribution and log-likelihood. The conditional distribution of Z i based on the current (t) estimation (T (t) j,i ) is as follows: Thus, the expectation of the log-likelihood function Q can be expressed as follows:

Optimization
The generally used optimization method for a GMM is the expectation-maximization method [25]. We maximized the expectation, derived in Section 2.3, using the gradientascending approach. The analytic partial derivatives of the expectation with respect to the cuboid parameters were used to calculate the gradient and determine the step size. Furthermore, the backtracking line search method was implemented to adjust the step size while updating the parameters.

Gradient Ascending
First, we update the latent variables. Before calculating the partial derivatives of Equation (5), the terms related to the latent variables are clustered as follows for brevity: Equation (6) has the same form as the maximum likelihood estimation for a multinomial. Hence, we can optimize Equation (6) by substituting the result of the maximum likelihood estimation for a multinomial.
After collecting the terms related to the size, similar to the case of latent variables, the size parameter is updated as follows: The center parameter is similarly updated.
Here, the product of (C(θ)u j ) is a scalar. Therefore, it can be multiplied to either side of a matrix. Thus, the expression is modified as follows: Consequently, substituting Equation (10) into Equation (9) yields Thus, we obtain X (t+1) C by passing the term related to X C to the right-hand side.
Finally, the same procedure is repeated for the orientation parameter. The rotation matrix of the Euler angle is a non-linear equation. Thus, the result of the linearized rotation matrix in Equation (A4), which is derived in Appendix A, is applied to Q(K|K (t) ). Here, 2 ) has a 1 × 1 dimension, i.e., it is a scalar and can be multiplied to either side of a matrix. However, ∆X × i must be multiplied to the right-hand side of ∆X i to match the dimensions and apply the distributive law. Therefore, Equation (13) can be written as follows: By arranging Equation (14) in the same manner as Equation (11), δφ is obtained as follows: Before multiplying the inverse of the coefficient of δφ on both sides, we must check whether it is invertible. According to Van den Bos [26], a real symmetric matrix is positive definite if and only if its eigenvalues are positive. D i is a diagonal real matrix with elements that represent membership probabilities. Therefore, D i is positive definite. Furthermore, C(θ (t) )D i C(θ (t) ) is positive definite because the rotation matrix is nonsingular. However, a skew-symmetric matrix is singular; therefore, the coefficient of δφ is the summation of many positive semi-definite matrices. Despite the limitation of positive semi-definition, the number of points is sufficiently large to make the coefficient positive definite, which is invertible. Eventually, we obtain δφ and update the rotation matrix from Equation (A4), which is convertible to the Euler angle.

Backtracking Line Search
Although we updated the parameters with analytic derivatives, there is a possibility of oscillation during iterations because we computed the step size using the Newton method, in which the derivative becomes zero. Therefore, we implemented a backtracking line search method.
Backtracking line search is a strategy for parameter updating. It first moves the parameters to the maximum step size and subsequently moves them again by reducing the step size at a certain ratio if the updated value is not sufficiently enhanced. The implemented backtracking line search procedure is shown in Figure 4. First, the parameters are updated with the maximum step size, as colored in red. In this study, the derived result presented in Section 2.4.1 is the maximum number of steps. Subsequently, the method compares the updated soft constraint values with the previous ones. If the updated soft constraint values are lower than the previous ones, the step size is reduced by half, as shown in blue. This procedure is repeated until the updated values exceed the previous values.

Implementation Details
In this section, the implementation details of the proposed cuboid modeling method are described in Algorithm 1.
The inputs of Algorithm 1 are segmented point clouds with only location information comprising a single cuboid. Subsequently, the initial guesses of the cuboid parameters are acquired from the results of the minimum bounding box [27] of the input point clouds. Moreover, we set the initial values of the latent variables equal. In short, latent variables are set to 1 6 because the total probability is equally divided into six. Finally, we calculate the expectation of the GMM based on the initial parameters.
After the initialization step, we iteratively update the parameters from the calculations presented in Section 2.4.1 and compare the updated expectation with the previous expectation. Based on the comparison results, we choose to update the parameters or perform a backtracking line search method to adjust the step size, as described in Section 2.4.2. This optimization process is repeated until the parameters or expectation converge, or the count reaches the maximum iteration number. Specifically, we consider convergence when two norm of the step size or the change in expectation is less than the current two norm of parameters or expectation multiplied by e −8 . In addition, the maximum iteration number is set to 100.
We developed Algorithm 1 in MATLAB environment. The operation of our code is confirmed in MATLAB 2020b and 2021b on Windows 7 and 10. In addition, many toolboxes were used in this code, such as 'Computer Vision Toolbox 9.3', 'LiDAR Toolbox 1.0', 'Navigation Toolbox 1.2', 'Robotics system Toolbox 3.2', 'ROS Toolbox 1.2', and 'UAV Toolbox 1.0'.
As Algorithm 1 is implemented with optimization process, we conducted algorithm complexity analysis, WHICH is one of the important aspect of optimization. The computational complexity of the proposed algorithm is O(N p × log N p ). Here, N p stands for the number of points. Specifically, the computing expectation and updating the parameters have O(N p ) as there is only one nested for loop. However, the minimum bounding box [27] code used in initial guess performs a convex hull extraction that has computational complexity of O(N p × log N p ).

Results
We set up experiments to validate the robustness of the proposed method compared with that of the traditional method, which is based on accumulated hard constraints. We used 500 synthetic data points of various sizes and shapes, and 354 real data points for verification. Synthetic data were designed to evaluate the robustness of the proposed method by computing the error between the results and ground truth. The real data acquired by the SLAM algorithm are considered to have higher noise and occlusion than the synthetic data. Therefore, we tested whether the proposed method is sufficiently robust to model a cuboid using real data. Because the real data do not contain the ground truth, we used the distance between a point cloud and the model as a metric to evaluate the results. The details of this metric are described in Section 3.3.

Comparison Implementation
We used the method of Wei et al. [10] for comparison because of its following features: First, the Hough transform is implemented for plane detection. The Hough transform is an extensively used approach because it addresses with missing data and is robust to noise. Secondly, the accuracy of the detected plane was increased by fitting the MBR. Finally, the plane fitting error and supporting point clouds of the detected planes are considered to improve the orientation and size parameters of the cuboid model. Because these features increase the robustness of the traditional method, the robustness of the proposed method can be verified by comparing it with the method proposed by Wei et al. [10]. Here, the method of Wei et al. [10] was implemented using the source code provided by the author, and the parameters of the code were unchanged from those in the given source.
Wei et al. [10] modeled multiple cuboids from point clouds that may consist of several cuboids. However, in some cases, they unintentionally modeled multiple cuboids from the point clouds of a single cuboid during the experiments ( Figure 5). Modeling multiple cuboids from point clouds consisting of one cuboid can be considered incorrect. However, to exclude the effect of prior information on the number of cuboids, the cuboid with the best performance is selected and compared with multiple cuboids. The specifics of the cuboid selection are described in Sections 3.2 and 3.3.

Synthetic Data
For synthetic data generation, we considered the characteristics of real data such as noise and occlusion. Specifically, we generated all synthetic data to have noise that follows the Gaussian assumption introduced in Section 2.1. In addition, occlusion was introduced by excluding the point clouds of the occluded planes from the point clouds of the cuboid. In addition to noise and occlusion, the density of point clouds was considered because it varies with the SLAM algorithm and the sensor used.
Synthetic data were hierarchically generated, as shown in Figure 6a. First, 500 cuboid data are divided equally into five cases depending on the number of generated faces, because there are no point clouds on the occluded faces. Second, 100 cuboid data of each case are also divided into five types depending on the characteristics related to noise, such as the standard deviation and noise ratio. Here, the standard deviation and noise ratio were arbitrarily taken within ranges of 0.02-0.04 m and 50-100%, respectively. The standard deviation is a parameter of the Gaussian model, and the noise ratio represents the proportion of point clouds following the Gaussian model among all the point clouds of the cuboid. Each cuboid has unique parameters and point density. To thoroughly evaluate the proposed algorithm, cuboids with 2-6 faces, i.e., the minimum to the maximum number of faces, were created to model a cuboid, even though the measured point clouds generally have a maximum of five faces. Moreover, we included two spatial combinations of reconstructed faces when generating cuboids with 2-4 faces, as shown in Figure 6b.
When Wei et al. [10] generated multiple cuboids, the cost of each cuboid was calculated using the following equation, and the cuboid with the lowest cost was selected.
Here, ∆Angle(1 × 1), ∆Size(3 × 1), and ∆Center(1 × 1) represent the orientation, size, and center error, respectively. Specifically, size error is the absolute difference between the corresponding edges of ground truth. The center error is a two-norm distance between the modeling result and the ground truth. The orientation error is computed as the minimum angle required to transform the obtained orientation to the ground truth orientation. Robustness is evaluated by the number of correctly modeled cuboids and narrowness of the error range. We counted correctly modeled cuboids by defining an incorrect cuboid model consisting of false and no solutions. The case in which the cuboid parameter estimation fails is determined as no solution, and that in which the estimated parameters have large differences from the ground truth is considered as a false solution. A cuboid model is considered a false solution if any error in the cuboid parameters of size, angle, and center exceeds the threshold. Among the cuboid parameters, we evaluated the size results as the volume, which is the product of each size. This is because it is difficult to achieve consistency in the individual size error results because the error in each size varies depending on the occlusion of the plane of the corresponding axis.  Table 1 reports the cuboid modeling results based on the number of false and no solutions. The threshold was calculated based on the interquartile range of the entire cuboid error generated using the proposed method and that of Wei et al. [10] from the synthetic data. The volume, center, and angle error thresholds were set at 1.5, 1.5, and 5 interquartile ranges, respectively. The units and calculated threshold values are listed in the second row of the table. According to Table 1, in most cases, the proposed method is more robust than the method of Wei et al. [10]. First, there is no case in which the proposed method fails to obtain the cuboid parameters, whereas the method proposed by Wei et al. [10] cannot estimate the cuboid parameters for 50 data in two faces case, which reflects severe occlusion. Second, both the methods accurately estimate the angle of the cuboid; however, the proposed method outperformed the method of Wei et al. [10] in estimating the volume and center for all faces.
The average error between the result parameters and the ground truth, and the number of false and no solutions are listed in Table 2. The average error was calculated using the results of the correctly modeled cuboid. The center and orientation errors were calculated in a manner similar to the selection of the cuboid from multiple results. The volume error was calculated as a percentage, which is the volume of the results divided by the volume of the ground truth. In addition, the units of error are represented in parentheses. The values in bold font indicate better performance, and the hyphen indicates that there is no correctly modeled cuboid.    The proposed method outperformed the method of Wei et al. [10] under most conditions, as shown in Table 2. This suggests that the proposed method can model a cuboid under various noise and occlusion conditions with a lower average error. In some cases, the method of Wei et al. [10] shows better performance than the proposed method, for example, the volume and center error for type 2 in two faces and volume error for type 3 in three faces. However, the proposed method estimates 10 and 5 more cuboids correctly for type 2 in two faces and type 3 in three faces, respectively, compared to the method of Wei et al. [10]. Therefore, it is necessary to match the data used to calculate the average error to accurately compare the error range.
To accurately evaluate the robustness from the perspective of the error range, the average error and standard deviation of the data obtained using both methods are listed in Table 3. In the case of two faces, the data are insufficient to calculate the average and standard deviation because Wei et al. [10] succeeded with only one cuboid. Therefore, the results for the cases of 3-6 faces are provided. The number of cuboids correctly modeled by each method is reported in the used data row.  Table 3 lists the mean and standard deviation values of the angle, volume, and center errors for both methods. Based on Table 3, the proposed method achieves smaller mean error and standard deviation for all parameters of the cuboid in most cases. Specifically, the proposed method is more robust than the previously studied method because it achieves a lower and more consistent error for various cases. These results are shown as boxplots in Figure 7 for an easy comparison. Here, the proposed method whose results are colored in green outperforms the previous method whose results are colored in red.   Table 3. Symbols representing method of Wei et al. [10] and proposed method are colored in red and green, respectively. Mean errors of two methods are denoted as blue lines with circle markers, respectively, to visualize results reported in Table 3.

Real Data
We verified that the proposed method is robust under various noise and occlusion conditions, as discussed in Section 3.2 by comparing its modeling results with the ground truth. Although synthetic data are designed to cover various cases of point clouds, conditions for real data are more difficult. Point clouds measured by the SLAM algorithm are more dynamic than synthetic data. Therefore, real data offers more challenging conditions for the proposed method.
We designed an environment and sensor system for data acquisition. The method of Lee et al. [6] was implemented for the SLAM algorithm to scan the environment, and the sensor system shown in Figure 8a was used to measure the surroundings. We set up a space in which 78 boxes are arbitrarily located, as shown in Figure 8b. The space was scanned five times, and all boxes were randomly rearranged for each scan.
Robustness was evaluated by the number of correctly modeled cuboids and the error range in the synthetic data. However, because there is no ground truth for real data, we used metrics such as P, R, and F, which were used by Wei et al. [10], to evaluate the results instead of the error range. Here, P denotes the number of uniformly sampled point clouds of the cuboid model within a certain distance from the measured point clouds. R represents the number of measured point clouds within a certain distance from the uniformly sampled point clouds of the cuboid model. F represents the harmonic mean of P and R. However, these metrics may be unsuitable for comparing the performance of the cuboid model because the point clouds measured in this study were highly occluded. A practical example of this phenomenon is shown in Figure 9.  Although the results of Wei et al. [10], colored in red cuboid, do not cover most point clouds, higher scores are obtained in P and F metrics, and the proposed method yields higher score only in the R metric.
Therefore, we developed a metric based on the concept of R score, and the distance between the measured point clouds and uniformly sampled point clouds of the cuboid model. The concept of the R score focuses on the ratio of the number of points located at a certain distance to the total number of points. By contrast, our metric focuses on the distance that makes a certain ratio of points inliers. Specifically, we uniformly sampled points from the cuboid model using an open-source software [28]. Subsequently, we determined the closest distance from the measured point to the sampled point for each measured point. Then, the closest distances were sorted in ascending order. According to this order, k% of points are located near the surface of cuboid modeling results under the k-th percentile distance. Therefore, we obtained the 75th, 80th, and 85th percentile distances to obtain the threshold distances in that 75%, 80%, and 85% of points are located near the surface of the cuboid.
Moreover, we also tested the performance of the proposed method without the Backtracking Line Search on real data to validate the benefits of this step. We noted the performance of the proposed method without the Backtracking Line Search as 'w/o BTLS' on Tables 4 and 5 and Figure 10 to see the enhancement of the robustness.
The robustness was compared based on the number of correctly modeled cuboids and the mean and standard deviation of the results, which were in a manner similar to the synthetic data. The results of the false and no solutions are listed in Table 4. The method of Wei et al. [10] fails to model the correct cuboids for 218 of the total 354 real data, which is approximately 61% of the total cuboid. This percentage is in the middle of the failure rates of the two and three-face cases of the synthetic data, which are 99% and 36%, respectively. Therefore, the condition of real data is difficult. Moreover, this interpretation is supported by the fact that the proposed method yielded two no-solution results for the first time in the experiments. The proposed method without the backtracking line search failed to model the correct cuboids for 47 of the 354 real data. On the other hand, The proposed method with the backtracking line search failed to model the correct cuboids for only 6 of the 354 real data. These real data results show not only a significant improvement over those of the previous method but also the benefit of the backtracking line search. Therefore, it can be concluded that the proposed method is the most robust among other methods under high noise and occlusion conditions.  Figure 10. Boxplots of results with succeeded data using triple methods summarized in Table 5. The method of Wei et al. [10], the proposed method without Backtracking Line Search step, and the proposed method are colored in red, blue, and green, respectively. Mean distances of triple methods are denoted as blue lines and circle markers, respectively, to visualize the results of Table 5.
A robustness evaluation based on the distance metric is summarized in Table 5. This table shows that 75, 80, and 85% of the points are distributed at distances within the corresponding values in the table from the surface of the cuboid model. Therefore, we consider that a result with a lower value indicates better performance because the points are located closer to the surface of the cuboid model. Consequently, based on Table 5, the proposed method outperforms other methods in most cases. Specifically, the points are closer to the model surface of the proposed method on average. Moreover, the proposed method achieved consistent results with a lower standard deviation than previous methods. These results are also visualized as boxplots in Figure 10 for easy comparison. Here, the proposed method whose results are colored in green shows an improved performance compared to the previous method whose results are colored in red.

Discussion
We verified the robustness of the proposed method through experiments designed to include various tough conditions using synthetic and real data. This was realized from the perspectives of the number of correctly modeled cuboids and the error range. The error range was represented by the mean and standard deviation of the error metric. The difference between the results and the ground truth was used as the error metric for synthetic data, and the distance between the surface of the cuboid model and the point clouds was used as the error metric for real data. The experimental results validated that the proposed method is more robust than a previous method from both perspectives.
In spite of improving the robustness of the proposed method, the limitations of the proposed method are also clear due to the design of the proposed method. Specifically, the proposed method is designed to model a cuboid one by one. Therefore, an appropriate point cloud segment process might be needed to utilize the proposed method as a module of another framework.

Conclusions
In this study, we established a robust cuboid modeling method for 3D point clouds with high noise and occlusion. We introduced the GMM concept to directly formulate soft constraints from the point clouds. Specifically, a soft constraint was formulated as an expectation of the GMM. Subsequently, we estimated cuboid parameters using the expectation-maximization scheme, which maximizes the soft constraints. In addition, for accuracy and efficiency, we derived analytic partial derivatives of the soft constraints to calculate the step size during optimization. Thereafter, the parameters of the cuboid model were updated using the backtracking line search method. Lastly, we verified the robustness of the proposed method through experiments designed to include various tough conditions using synthetic and real data.
On the other hand, the limitations of the proposed method still exist as discussed in Section 4. Therefore, to overcome the limitations, there are two future works to consider to enhance the applicability of the proposed method. The first future work is extending the single cuboid modeling to multiple cuboid modeling. If the extension of the proposed method to multiple cuboids is possible, then the proposed method will be more useful in the field of Geometric primitive-based modeling.
The second future work is finding alternative noise model, such as symmetric alpha stable distribution that is the extension of GMM. If another noise model is validated to be suitable for the proposed model, then the proposed method will be utilized with the appropriate model depending on the SLAM algorithm or the sensor system.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Linearized Rotation Matrix
In this Appendix, we derive the linearized rotation matrix introduced by Barfoot et al. [29] according to robotics field expressions. This is because the quaternion operations (left-hand and right-hand compounds) used by Barfoot et al. [29] are dissimilar to those in the robotics field, which this study follows. The linearized rotation matrix substitutes the rotation matrix in Equation (13) to update the orientation parameters of the cuboid, as discussed in Section 2.4.1.
The notation and operations of the quaternion are rewritten as follows.
The rotation matrix, C, and its derivative are restated as follows by replacing the corresponding notation and operations of Barfoot et al. [29].
The rotation matrix that rotates ϕ about the reference axis, a, is as follows.
Therefore, the derivatives of the rotation matrix with respect to the Euler angle of the ZYX sequence according to Equation (A2) become Therefore, the first-order Tayler-series expansion of the rotation matrix is as follows: Before applying the derived results, we verify Equation (A4), similar to the method proposed by Barfoot et al. [29]. C(θ + δθ)C(θ + δθ) ≈ (1 + δφ × )C(θ)((1 + δφ × )C(θ)) = 1 − δφ × δφ × (A5) A rotation matrix multiplied by its own transpose produces a 3 × 3 identity matrix and second-order term, which is negated by linearization. This proves that the rotation matrix derived in the robot coordinate system is reasonably linearized. Consequently, we substitute Equation (A4) into Equation (13) to optimize the orientation parameters of a cuboid.