Model Identification Using Weighted Implicit Shape Representation and Panoramic View

In this paper, we propose a 3 dimensional (3D) model identification method based on weighted implicit shape representation (WISR) and panoramic view. The WISR is used for 3D shape normalization. The 3D shape normalization method normalizes a 3D model by scaling, translation, and rotation with respect to the scale factor, center, and principal axes. The major advantage of the WISR is reduction of the influences caused by shape deformation and partial removal. The well-known scale-invariant feature transform descriptors are extracted from the panoramic view of the 3D model for feature matching. The panoramic view is a range image obtained by projecting a 3D model to the surface of a cylinder which is parallel to a principal axis determined by the 3D shape normalization. Because of using only one range image, the proposed method can provide small size of features and fast matching speed. The precision of the identification is 92% with 1200 models that consist of 24 deformed versions of 50 classes. The average feature size and matching time are 4.1 KB and 1.9 s.


Introduction
Development of 3 dimensional (3D) printing technology has led to the explosive growth of 3D models recently.Hence the 3D printing services are increasing rapidly [1,2].However, copyright infringement of 3D models has become an issue for 3D printing ecosystem of product distribution websites, 3D scanning and design-sharing [3,4].To prevent the copyrighted 3D models from distributing and using illegally, the identification of 3D models remains.
2 dimensional (2D) view-based 3D model identification has a high discriminative property for 3D model representation [5][6][7][8][9][10][11].Generally, a 2D view image is a range image obtained from a viewpoint located on a 3D model's bounding sphere.The identification is implemented by matching the features extracted from the range images.However, the existent approaches suffer from big size of features and slow matching speed.To overcome these problems, we propose an approach using only one range image, which means a panoramic view is used for identification.The panoramic view bridges the gaps between the range images rendered from multiple views.It is obtained by projecting a 3D model onto the surface of a cylinder, which is parallel to a principal axis determined by 3D shape normalization.The purpose of the 3D shape normalization is to normalize 3D models into a canonical coordinate frame to guarantee a unique representation [12][13][14].Nevertheless, how to determine the principal axes is the keypoint.The most common method is principal component analysis (PCA).However, it is not preferable when 3D models have unobvious orientations or undergo large deformations.If the shape normalization cannot determine the principal axes of a query model as similar as those of original model in database, the identification needs many more range images to match them.Implicit shape representation (ISR) was described in [13] for normalizing 3D articulated models.However, it has some limitations when some parts of a 3D model are removed or when a 3D model undergoes a large deformation.In this paper, the 3D shape normalization uses a weighted ISR (WISR) to reduce the influence caused by shape deformation and partial removal.It estimates the number of clusters based on rate distortion theory [15].It also shows the most representative part for one viewpoint of the six degree of freedom.
After the shape normalization, the model is wrapped by the cylinder to generate a range image.The range image is used for providing features of the model.The feature used in our approach is the scale invariant feature transform (SIFT) descriptor [16].The SIFT is generally used to extract geometric transformation invariant local features from images [17,18].It detects interest points called keypoints and assigns orientation information to each keypoint based on local gradient directions.With the SIFT descriptor, object recognition approaches can achieve high performance in feature matching.In this paper, the 3D models are identified by matching the SIFT descriptors of the query model with those in database.In the section of experimental results, we show the comparisons between the precision of identification of the proposed method and those of other methods.

Related Work
Several researches have been conducted to group 3D models into corresponding categories by matching the features and comparing the similarities of the models [5][6][7][8][9][10][11]14,[19][20][21][22].The matching and comparison are implemented on a huge dataset containing various models of different poses and shapes.The models are mainly classified into two types: rigid models and non-rigid (deformable) models.Early works are rigid model-based approaches.In this paper, we focus on the identification of non-rigid models.
3D model identification methods are classified into two categories: view-based and model-based methods [6,7].Model based methods include geometric moment [19], volumetric descriptor [20], surface distribution [21], surface geometry [22].However, the geometry and topology based methods are generally computationally cost and are fragile to 3D model removal.View-based methods have a high discriminative property for 3D model representation [5][6][7][8][9][10][11].A 2D view image is a range image obtained from a viewpoint located on a 3D model's bounding sphere.After the range image is obtained, image processing technologies are applied to the range image for extracting features.To be invariant against geometrical transformation, researchers proposed shape normalization methods to preprocess the 3D models before extracting the features.
Several view-based methods have been proposed.In Ref. [9], authors proposed a view based 3D model retrieval method using the SIFT and a bag-of-features approach.The bag-of-features was inspired by the bag-of-words in the text retrieval approach, which classifies documents by histograms of words in the text.The method extracted SIFT features from the range images of the model viewed from dozens of viewpoints located uniformly around the model.The bag-of-features was composed of the SIFT features.The well-known k-means clustering algorithm was applied to the bag-of-features to classify the features and generate visual words.The visual words are integrated in to a histogram and become a feature vector of the model.However, the large number of the range images leads to large capacity of features and slow matching speed.
In [10,11], authors proposed 3D model descriptors using the panoramic views which can describe the orientation and position of the model's surface.In [10], the panoramic views were obtained by projecting the model to surfaces of cylinders parallel to three principal axes.The principal axes were obtained by using continuous PCA and normal PCA.For each cylinder, the coefficients of 2D discrete Fourier transform and 2D discrete wavelet transform were extracted to generate the 3D shape descriptors.However, these descriptors are not suitable for distinguishing the 3D models well.In [11], the exes were perpendicular to the surfaces of a dodecahedron generated around the model.Three panoramic views were obtained from each axis.The other two panoramic views were obtained from additional two axes which are orthogonal to each other and to the principal axis.Then, the SIFT features were extracted to generate the 3D model descriptors.However, because of using dozens of panoramic views, the method leads to large capacity of features and slow matching speed.
In [8], authors normalized the model with the PCA and extracted 18 views from the vertices of a bounding 32-hedron of the model.The 3D model descriptors were composed of 2D Zernike moments, 2D Krawtchouk moments and coefficients of Fourier transform.However, the PCA is fragile to partial removed and deformed models, which means the PCA can't extract the same axes from the deformed and removed models as those of original models.Hence, the different axes lead to different views and descriptors.Eventually, the method can't identify the deformed and partial removed versions of the original models.
In [13], authors proposed a shape normalization method for 3D volumetric models using ISR.The ISR is a set of minimum Euclidean distance values between the surface of the model and the voxels inside the surface.It is invariant to translation and rotation.The method computed an initial center of the model with the ISR and voxels inside the model.It also computed an initial principal axis with the PCA.Then the center and three principal axes were iteratively upgraded based on implicit intensity value and principal axis dependent weight function.Finally, the method translated, rotated and scaled the model with the final center, principal axes and a scale factor which was computed with the ISR.
In [23], the competition results of SHREC 2015 range scans based 3D shape retrieval were presented.The best performance was achieved by a SIFT based cross-domain manifold ranking method.However the precision was about 70%.In [24], the results of the SHREC 2015 Track: Non-rigid 3D shape retrieval were presented.The best performance was achieved by a method of super vector-local statistical features.However, the local statistical feature extraction and matching is time consuming.The matching time is over 50 s for 907 models [25].
In [26], authors proposed a view-based 3D model retrieval method using bipartite graph matching and multi-feature collaboration.The complement descriptors were extracted from the interior region and contour of 3D models.The employed three types of features: Zernike moments, bag of visual words descriptor and Fourier descriptor to construct bipartite graphs.However, because of using various types of features, it is time consuming.
In [27], the discriminative information of 2D projective views were learned for 3D model retrieval.The dissimilarity between discriminative ability and view's semantic is investigated by classification performance.An effective and simple measurement is used to study the discriminative ability.The discriminative information is used for view set matching with a reverse distance metric.Various features were employed to boost the retrieval method.However, each model was represented by 216 views.The feature size is too large.The querying time is 1.7 s for 330 models.It is also time consuming.
In 2015, five leading feature extraction algorithms: SIFT, speeded-up robust features, binary robust independent elementary features, binary robust invariant scalable keypoints and Fast retina keypoint, were used to generate keypoint descriptors of radiographs for classification of bone age assessment [28].After comparing the five algorithms, the SIFT was found to perform best based on precision.In 2016, a survey was presented to evaluate various object recognition methods based on local invariant features from a robotics perspective [29].The evaluation results reported that the best performing keypoint descriptor is the SIFT and it is very robust to real-world conditions.Based on the previous research results on pattern recognition and computer vision, we decide to extract the SIFT descriptors as the features of 3D models.

3D Shape Normalization Using WISR
3D shape normalization is a process of adjusting the orientation, location, and size of a given 3D model into a canonical coordinate frame.A 3D model is usually composed of a main body part and branch parts (e.g., arms and legs).To reduce the effect caused by the deformation or abscission of branch parts when determining the principal axes, we increase the weight of the main body part.The procedure of weight calculation requires three steps.The first step is automatically estimating the number of clusters based on rate distortion theory [15].With the clustering method, we can distinguish the main body part and the branch parts.However, different 3D models have different shapes and topologies that lead to different main body and branch parts.Therefore, a method of automatically estimating the number of clusters is required.The second step is performing the k-means algorithm with the estimated number and calculating the distance among cluster centers.The third step is calculating the number of points inside each cluster sphere.Generally, the number of points inside a main body part is greater than that of a branch part.Thus the weight is obtained based on the number of points.
First of all, N random points P = {p i |i = 1, . . ., N} are generated inside the surface of a model.A measure of cluster dispersion called distortion d is defined as Equation (1).It is derived from Mahalanobis distance.
where Γ is the covariance and c p i is the closest center to a given point p i .The cluster centers are obtained by using the k-means algorithm.We iteratively fit k ∈ {1, K} clusters to the points P. Therefore, there are K distortions d k corresponding to K clusters.Each d k denotes cluster dispersion of k clusters.After evaluating the distortions with 1 to K partitions, the d k is transformed as follows where m is the dimension of the points, thus m is equal to 3. The − m 2 is a transform power motivated by asymptotic reasoning.The number of clusters is set to be k = argmax k {J k }.The k is the ideal number of clusters.Then the k-means algorithm is performed to partition the P into k clusters.The points near the boundaries of clusters interfere with the relation between the points and the main body and branch parts.Therefore, we only consider the points inside a sphere with a specific radius.First we compute a distance cd between two cluster centers as follows: Then we can obtain C = k − 1 + k − 2+, . . ., +k − (k − 1) distances.The radius is defined as r = min cd a,b /2.The weight of each cluster is the number of nearest points within the radius from each center.The nearest points set of the jth cluster center c j is defined as The weight w j of cluster j is the the number of the elements in the S j .The ISR is defined as f (p) = min{||p − q||}, which is the minimum Euclidean distance from p to vertices q on the surface of the model.The w j is applied to the ISR of nearest points of cluster center j to produce WISR as follows The points nearest to the cluster center inside a main body are much more than those inside a branch.Therefore, the weight of the main body will be increased, whereas that of the branch will be decreased.To reduce the influence of surface deformation of main body, we quantize the WISR and delete some values which are less than a specified threshold.We use search-based optimization of Otsu's criterion to find 5 thresholds T = {t i |i = 1, . . ., 5}.We delete the values of f (p i ) by setting them to be 0, if they are less than t  The principal axes are calculated by singular value decomposition.The points corresponding to the existing ( ) are selected for analyzing the principal axes.The center of gravity of a model is defined as It is moved to the origin of coordinate to solve the normalization of translation.It represents the weighted average of all points in a model.It is much closer to the center of main body than conventional barycenter.To normalize a model size, a scale factor is defined as follows The principal axes are calculated by singular value decomposition.The points p corresponding to the existing f (p i ) are selected for analyzing the principal axes.The center of gravity of a model is defined as It is moved to the origin of coordinate to solve the normalization of translation.It represents the weighted average of all points in a model.It is much closer to the center of main body than conventional barycenter.To normalize a model size, a scale factor is defined as follows It is based on the volume of the model and is effective in normalizing the 3D model size.Finally, the 3D model is normalized by achieving scaling, translation, and rotation with respect to the scale factor, center, and principal axes.

Panoramic View Generation
Once the shape normalization has been done, one panoramic view will be generated.First, a cylinder is generated around a 3D model as shown in Figure 2a.Its center and axis are the center and the first principal axis of the model.Its radius is defined as R = 2•max{||o − q||}.Its height is the height of the model.We sample the axis of the cylinder with a sample rate F. Each sample point of the axis is a center of a cross section of the cylinder.For each cross section, M rays are emanated from each center to the surface of the cylinder.Thus, the degree between each ray is 2π/M.Each ray may have more than 1 intersection with the surface of the model.The distance rd ∈ [0, R] from a center to the furthest intersection of the ray is mapped to a value in the range of [0, 1] for representing one pixel in the F × M range image.After generating the panoramic view, SIFT descriptors are extracted from the panoramic view and stored as the feature of the model.Figure 2b shows the SIFT descriptors of panoramic view-based range image.

= ( )
It is based on the volume of the model and is effective in normalizing the 3D model size.Finally, the 3D model is normalized by achieving scaling, translation, and rotation with respect to the scale factor, center, and principal axes.

Panoramic View Generation
Once the shape normalization has been done, one panoramic view will be generated.First, a cylinder is generated around a 3D model as shown in Figure 2a.Its center and axis are the center and the first principal axis of the model.Its radius is defined as = 2 • max ‖ − ‖ .Its height is the height of the model.We sample the axis of the cylinder with a sample rate .Each sample point of the axis is a center of a cross section of the cylinder.For each cross section, rays are emanated from each center to the surface of the cylinder.Thus, the degree between each ray is 2π/ .Each ray may have more than 1 intersection with the surface of the model.The distance ∈ 0, from a center to the furthest intersection of the ray is mapped to a value in the range of 0,1 for representing one pixel in the × range image.After generating the panoramic view, SIFT descriptors are extracted from the panoramic view and stored as the feature of the model.Figure 2b shows the SIFT descriptors of panoramic view-based range image.The matching procedure of the SIFT descriptors uses the Euclidean distance as in [16].Suppose the SIFT descriptor of a query model is = ( , , … , ) and that of a model in database is = ( , , … , ).The distance between the two descriptor is given by A keypoint with the least distance value is defined as a matched keypoint.We match the keypoints of the query model to those of the models in the database and obtain the number of matched keypoints.Finally, we identify the model with the maximum number of matched keypoints as the original model of the query model.

Experimental Results
In this section, some experimental results about the shape normalizations are shown first.To achieve high precision of 3D model identification, how to accurately normalize the shapes of the models is of great significance in practice.Figure 3a,c are original sumotori and tortoise models.Figure 3b is deformed version of sumotori model by articulating around its joints in different ways.Figure 3d is partially removed version of tortoise model.There is a certain extent of difference between the original models and deformed and removed models.If we extract 6 range images from each view point of six degree of freedom, we can obtain 24 possible poses of a 3D model [8].We only selected the most representative range image from all possible poses of a model.Figure 4a-l show the range images which have the most representative surface of the models using PCA, ISR, and WISR, respectively.The main body and face in Figure 4b are oblique.Figure 4d shows left side of the model.Figure 4f,h show the range images were viewed obliquely from above.Both Figure 4j,l show their fronts to a view point, which means the deformed and removed models were well normalized using WISR.The matching procedure of the SIFT descriptors uses the Euclidean distance as in [16].Suppose the SIFT descriptor of a query model is VQ = (vq 1 , vq 2 , . . . ,vq n ) and that of a model in database is VD = (vd 1 , vd 2 , . . . ,vd n ).The distance D between the two descriptor is given by A keypoint with the least distance value is defined as a matched keypoint.We match the keypoints of the query model to those of the models in the database and obtain the number of matched keypoints.Finally, we identify the model with the maximum number of matched keypoints as the original model of the query model.

Experimental Results
In this section, some experimental results about the shape normalizations are shown first.To achieve high precision of 3D model identification, how to accurately normalize the shapes of the models is of great significance in practice.Figure 3a,c are original sumotori and tortoise models.Figure 3b is deformed version of sumotori model by articulating around its joints in different ways.Figure 3d is partially removed version of tortoise model.There is a certain extent of difference between the original models and deformed and removed models.If we extract 6 range images from each view point of six degree of freedom, we can obtain 24 possible poses of a 3D model [8].We only selected the most representative range image from all possible poses of a model.Figure 4a-l show the range images which have the most representative surface of the models using PCA, ISR, and WISR, respectively.The main body and face in Figure 4b are oblique.Figure 4d shows left side of the model.Figure 4f,h show the range images were viewed obliquely from above.Both Figure 4j,l show their fronts to a view point, which means the deformed and removed models were well normalized using WISR.We evaluated the proposed identification method with 1200 non-rigid 3D models in SHREC 2015 benchmark.The models consist of 24 deformed versions of 50 classes.We selected one model for each class to compose 50 query models.We also experimented with the other 3D model identification methods: combination of PCA and SIFT (PCAS) [9], that of continuous PCA, normal PCA, 2D discrete Fourier transform, and 2D discrete wavelet transform (CPCA) [10], and that of dodecahedron and SIFT (DODE) [11].Two types of experiments were performed to evaluate the performances of the methods.First one is to identify the 50 original query models.Then we   We evaluated the proposed identification method with 1200 non-rigid 3D models in SHREC 2015 benchmark.The models consist of 24 deformed versions of 50 classes.We selected one model for each class to compose 50 query models.We also experimented with the other 3D model identification methods: combination of PCA and SIFT (PCAS) [9], that of continuous PCA, normal PCA, 2D discrete Fourier transform, and 2D discrete wavelet transform (CPCA) [10], and that of dodecahedron and SIFT (DODE) [11].Two types of experiments were performed to evaluate the performances of the methods.First one is to identify the 50 original query models.Then we We evaluated the proposed identification method with 1200 non-rigid 3D models in SHREC 2015 benchmark.The models consist of 24 deformed versions of 50 classes.We selected one model for each class to compose 50 query models.We also experimented with the other 3D model identification methods: combination of PCA and SIFT (PCAS) [9], that of continuous PCA, normal PCA, 2D discrete Fourier transform, and 2D discrete wavelet transform (CPCA) [10], and that of dodecahedron and SIFT (DODE) [11].Two types of experiments were performed to evaluate the performances of the methods.First one is to identify the 50 original query models.Then we removed some parts of the models such as arms and legs.Second experiment is to identify the 50 partially removed query models.The percentage of removal ranges from 6.1% to 33.6%.The average percentage is 13.8%.We set the range of the number of clusters from 1 to 10, which means the K is set to 10.Both the sample rate F and the number of rays M are set to be 180.We performed the experiments on an IBM compatible computer with a 3.4 GHz CPU and a 4 GB random-access memory.The average feature size and matching time of the corresponding method for 1200 models are shown in Table 1.Because of using only one range image, the proposed method provides small size of feature and fast matching speed.Figure 5 shows the precision of identification for each method.Although the feature size is greatly reduced, the precision is still greater than those of the other 3 methods even with the removed versions.removed some parts of the models such as arms and legs.Second experiment is to identify the 50 partially removed query models.The percentage of removal ranges from 6.1% to 33.6%.The average percentage is 13.8%.We set the range of the number of clusters from 1 to 10, which means the is set to 10.Both the sample rate and the number of rays are set to be 180.We performed the experiments on an IBM compatible computer with a 3.4 GHz CPU and a 4 GB random-access memory.The average feature size and matching time of the corresponding method for 1200 models are shown in Table 1.Because of using only one range image, the proposed method provides small size of feature and fast matching speed.Figure 5 shows the precision of identification for each method.Although the feature size is greatly reduced, the precision is still greater than those of the other 3 methods even with the removed versions.

Conclusions
In this paper, we have proposed a 3D model identification method, which consists of WISR-based 3D shape normalization and panoramic view for feature extraction.To achieve high precision of 3D model identification with 2D view-based approach, how to accurately normalize the shapes of the models has great significance in practice.The proposed 3D shape normalization clusters random points inside a model and defines the number of nearest neighbors within a specified radius from each cluster center as the weight.The weight is applied to ISR to produce WISR for reducing the influence caused by shape deformation and partial removal.A panoramic view is generated by projecting a 3D model onto the surface of a cylinder for extracting SIFT descriptors.The average feature size and matching time are 4.1 KB and 1.9 s.The precision of identification of original models is 92% and that of removed versions is 64%.The experimental results show the performance of the 3D model identification is significantly improved.In the future work, we will optimize the identification method and increase the precision of the identification.

Conclusions
In this paper, we have proposed a 3D model identification method, which consists of WISR-based 3D shape normalization and panoramic view for feature extraction.To achieve high precision of 3D model identification with 2D view-based approach, how to accurately normalize the shapes of the models has great significance in practice.The proposed 3D shape normalization clusters random points inside a model and defines the number of nearest neighbors within a specified radius from each cluster center as the weight.The weight is applied to ISR to produce WISR for reducing the influence caused by shape deformation and partial removal.A panoramic view is generated by projecting a 3D model onto the surface of a cylinder for extracting SIFT descriptors.The average feature size and matching time are 4.1 KB and 1.9 s.The precision of identification of original models is 92% and that of removed versions is 64%.The experimental results show the performance of the 3D model identification is significantly improved.In the future work, we will optimize the identification method and increase the precision of the identification.

2 .
The ISR and WISR of a mouse model are shown in Figure 1.To illustrate their salient characteristic more clearly, the figures are shown in xy-plane.

Figure 2 .
Figure 2. The cylinder around model and its panoramic view-based range image; (a) Cylinder; (b) Panoramic view-based range image.

Figure 4 .
Figure 4.The most representative range images of shape normalized models using PCA (a-d): original and deformed sumotori model, original and partially removed tortoise model; ISR (e-h): original and deformed sumotori model, original and partially removed tortoise model; and WISR (i-l): original and deformed sumotori model, original and partially removed tortoise model.

Figure 4 .
Figure 4.The most representative range images of shape normalized models using PCA (a-d): original and deformed sumotori model, original and partially removed tortoise model; ISR (e-h): original and deformed sumotori model, original and partially removed tortoise model; and WISR (i-l): original and deformed sumotori model, original and partially removed tortoise model.

Figure 4 .
Figure 4.The most representative range images of shape normalized models using PCA (a-d): original and deformed sumotori model, original and partially removed tortoise model; ISR (e-h): original and deformed sumotori model, original and partially removed tortoise model; and WISR (i-l): original and deformed sumotori model, original and partially removed tortoise model.

Table 1 .
Average feature size and matching time; PCAS: combination of PCA and SIFT; CPCA: combination of continuous PCA, normal PCA, 2D discrete Fourier transform, and 2D discrete wavelet transform; DODE: combination of dodecahedron and SIFT.

Table 1 .
Average feature size and matching time; PCAS: combination of PCA and SIFT; CPCA: combination of continuous PCA, normal PCA, 2D discrete Fourier transform, and 2D discrete wavelet transform; DODE: combination of dodecahedron and SIFT.