A Machine Learning Method Based on 3D Local Surface Representation for Recognizing the Inscriptions on Ancient Stele

: It is challenging to extract reliefs from ancient steles due to their rough surfaces, which contain relief-like noise such as dents and scratches. In this paper, we propose a method to segment relief region from 3D scanned ancient stele by exploiting local surface characteristics. For each surface point, four points that are apart from the reference point along the direction of the principal curvatures of the point are identiﬁed. The spin images of the reference point and the four relative points are concatenated to provide additional local surface information of the reference point. A random forest model is trained with the local surface features and, thereafter, used to classify 3D surface point as relief or non-relief. To effectively distinguish relief from the degraded surface region containing relief-like noise, the model is trained using three-class labels consisting of relief, background, and degraded surface region. The initial three-class result obtained from the model is reﬁned using the k -nearest neighbors algorithm, and, ﬁnally, the degraded region is re-labeled to background region. Experimental results show that the proposed method performed better than the state-of-the-art, SVM-based method with a margin of 0.68%, 3.53%, 2.25%, and 2.36%, in accuracy, precision, F1 score, and SIRI, respectively. When compared with the height- and curvature-based methods, the proposed method outperforms these existing methods with accuracy, precision, F1 score, and SIRI gains of over 4%, 20%, 11%, and 12%, respectively.


Introduction
Stone monuments are important artifacts that provide artistic, cultural, political, or historical background to events during the time they were created [1]. Due to the long exposure of these stones to wind, heat, rain, animal activities, and other factors in the environment where they are located, their surfaces degrade over time [2,3]. As a result, the contents (or information) which are sometimes in the form of ancient characters or drawings carved on the surfaces fade before archaeologists excavate them. One of the problems that archaeologists encounter after collecting these stones is how to accurately recover the information on the external surface of the monuments.
In the past, the stone rubbing method was commonly used to recover inscriptions on monuments. This method extracts important features from the surface of the stone by rubbing rendering materials over paper placed on top of the stone monuments [4]. The quality of the relief extracted through this method depends on the professional experience of the epigrapher. If the technique is applied carelessly, the cultural monuments which have undergone abrasion can be further damaged or contaminated. To prevent damages and avoid labor costs for repairs, non-physical methods for extracting relief from monuments have received considerable attention lately in the field of computer graphics and computer vision [5][6][7][8][9][10]. The decorrelation stretch (DStretch), an image processing plug-in made by Defrasne [5], was applied to enhance images of faint reliefs carved on rocks [11]. DStretch involves the application of the Karhunen-Loeve transformation to a 2D colored image of the surface of a rock to obtain a 3 × 3 transformation matrix. The product of the obtained transformation matrix and the image that contains faint reliefs on rock produced a new image with enhanced reliefs. However, the performance of this functional software is not reliable in extracting reliefs when the surfaces of the monuments contain severely damaged regions. In another study, a feature extraction method, known as the Hough transform, was used for the representation of surface features by applying the feature curve approximation function to a family of curves [9]. This transformation was demonstrated to be effective in the recognition of decorative patterns from the surface of defective or severely degraded archaeological objects. Recently, Romanengo et al. [10] extended the application of Hough transform for surface representation by defining new set of rules and conditions for selecting suitable curve approximation that guarantees reliable recognition of decorative surface points. Furthermore, new curve models were analyzed and added to the existing family of curves to increase the accuracy of identifying the characteristic curves.
With the introduction of sophisticated 3D scanners, it is now realistic to extract reliefs from 3D scanned surfaces of stone monuments. The 3D scanned monument is represented as surface mesh with triangular faces. Several 3D data processing techniques can thereafter be applied to the 3D mesh to gain meaningful insights about the ancient monuments, such as changing the angle of views at various scales without optical illusions in the material pattern or discoloration [6,7,[12][13][14][15]. Scanning defects in the form of outliers, noise, ghost geometry, and holes are usually introduced to the 3D mesh surface during the scanning process. A set of tools based on point modeling techniques was presented to address these problems [16]. The point modeling techniques mainly involve data structures, i.e., the moving least square optimization and point relaxation functions. Another post-processing method based on normalized vector field was introduced and applied in operations such as surface smoothing, details enhancement, and artificial coloring of the 3D scanned surface [6].
Considering a smooth base normal, a relief extraction technique that estimates the height distribution of reliefs and background by integrating relative heights between adjacent vertices was presented by Zatzarinni et al. [8]. The normals of the base surface were first estimated to determine the height of the surface points and a Gaussian mixture model was applied to the height distribution to obtain a threshold that was used to decide if a surface point is a relief or background. Degraded regions were identified and removed using a filter that relies on the size and shape of the identified regions. This technique was demonstrated to be reliable in applications such as detail exaggeration and dampening, cut and paste of relief, and drawing of prominent curves. Recently, a digital 3D stone rubbing approach which also estimates the heights of mesh vertices using a weighted least square method was presented by Pan et al. [15]. The estimated heights were transformed subsequently to grey values using a nonlinear transformation function. The transformed heights were segmented into black, white, or grey colors based on the relationship between the individual height and the mean and standard deviation of the heights of neighboring vertices. In this manner, a digital stone rubbing similar to the traditional rubbing was determined from the 3D mesh.
Following the successful application of the Frangi filter in the detection of blood vesselness [17], a modified Frangi filter was developed to extract relief details from the surface of 3D mesh [7,18]. Considering the canal-like shape of the reliefs, curvatures are intuitive and effective measures for detecting reliefs. At a position corresponding to the relief, its minimum and maximum curvatures, κ min and κ max , would be close to zero and a large value, respectively. The earlier modification of the Frangi filter is based on the computation of curvature to obtain carvings. This is achieved by clustering the surface regions and using a measure of saliency to rank the detected carvings on weathered plasters [7]. In addition, the authors presented a flexible and interactive tool with simple visualization for modifying reliefs. While this method is not suitable for extracting reliefs from the rough surfaces of stele because the presence of degraded regions disrupt the ranking of the saliency, another variants of the Frangi filter handles this problem by applying two vesselness values of different curvature sizes and selecting an acceptable threshold to effectively remove the irregular surface regions [18].
A handful of machine learning techniques have been introduced lately to analyze prehistoric cultural heritage [19][20][21]. The features extracted from these monuments can include the textural information of the soil samples of the monuments or the digital contents of the 2D/3D images that are captured by electronic cameras and scanners. Liu et al. [22] presented a supervised machine learning method to refine bas-relief surfaces for reversed engineering. Initially, the surface of the bas-relief was segmented using a snake-based method and the background region was estimated using b-spline surface fitting curves. Although the results obtained are promising, the roughness nature of the surface of the stele affects the performance of this method. In addition, the method considered only the relief and background labels, and ignored the degraded surface region. The use of the two-class labels affects the performance of these models since the degraded surface region share geometrical features similar to the reliefs, and this interferes in the conditions set for extracting reliefs [23].
More recently, support vector machine (SVM) was applied to extract relief region from the surface of stone monument [24]. The modified curvature-based method [18] was initially used to identify relief candidate segments from mesh partitions. The relief candidates include actual reliefs and relief-like noise. From each candidate segment, 79-dimensional geometrical features were extracted, which consist of the appearance, quadratic approximation for cross section, and local extrema of the candidate segments. The geometrical information was learned by the SVM to determine whether the candidate is relief or not. While this machine learning method outperformed the height estimation [8] and curvature-based methods [7,18], the superior performance of the SVM-based method [24] is largely dependent on the local surface features that were identified on the 3D mesh and the conditions for the removal of degraded region.
In this paper, we consider the spin image (SI) as features to describe the local surface of 3D mesh. SI is widely used in computer vision and computer-aided design because of its efficient shape representation of 3D objects which was posed as surface matching problem [25]. The effectiveness of SI for describing the surface of steles had been promisingly demonstrated [26]. The main contribution of this paper is the training of a machine learning classifier based on the SI descriptions that are extracted from the 3D mesh of very rough stele. In order to effectively distinguish the relief region from the background surface, we enhance the representation of the SI description by structurally configuring multiple SIs. Considering the canal-like shape of the reliefs, the minimal curvature direction, t min follows the canal at a position within the relief, while the maximal curvature direction, t max climbs up the canal along the cross section. As a result, the locations apart with a certain distance along t min from the given relief position tend to belong to the identical relief, while the locations apart along t max would be the background. Using the principal curvature directions at a particular vertex with reference, four relative surface points are identified on the 3D mesh. We refer to the aggregation of the SIs of these five points including the reference and its four relative points as cross spin image (CSI). In the proposed approach, the CSIs are extracted as features per vertex to train a three-class label random forest (RF) model. The class labels of the surface points include the relief, background, and the degraded region. The k-nearest neighbors (k-NN) algorithm is used to refine the classification results. The degraded region is identified and re-labeled to background region. As such, the proposed method classifies the surface points as relief or non-relief region. In contrast with the SVM-based method which classifies relief using candidate segment [24], the classification of the proposed machine learning-based method is done per vertex.
The remaining sections of the paper are divided as follows: Studies relevant to the proposed method are discussed in Section 2. We present the proposed approach in Section 3. Various parameters used in the proposed method are determined and the performance of the proposed method is compared with existing techniques based on intuitive geometrical features in Section 4. Finally, we conclude our findings in Section 5.

Basic Geometrical Features
In this subsection, we discuss the two conventional geometrical feature-based relief extraction methods: height (depth)-based and curvature-based methods. The height-based method is one of the simplest and most prominent methods for relief extraction [8]. The height estimation of the new surface point is considered using a given path on the mesh surface with the surface normal vector. The height estimateĥ(v n ) at v n on the surface along the path from v 0 to v n is given aŝ where v 0 is the reference point location on the surface mesh, and the vertices along the path are {v i }, such that 0 ≤ i ≤ n, and n B (e i,i−1 ) is the base normals associated with each corresponding edge (e i,i−1 ) between two vertices. Assuming that the heights of the reliefs and background follow Gaussian distributions, respectively, the Gaussian mixture model is applied to determine the threshold that intersects the two height distributions. This implies that the accuracy of this technique relies on the selection of a good threshold value. Furthermore, depth cannot be an absolute measure for detecting reliefs because dents and cracks on the surface of the stele can also be deep, particularly in a severely weathered stele. For the curvature-based methods, the authors exploited the distribution of surface curvatures to extract reliefs from wall carvings [7,18]. Letting κ min and κ max (|κ min | ≤ |κ max |) denote the minimal and maximal curvatures, respectively. In a relief region, it is expected that |κ min | ≈ 0 and |κ max | ≥ 0. The Frangi filter is modified and applied to measure the vesselness, Υ, at each vertex within the surface mesh as [17] where A and B are κ min κ max and κ 2 min + κ 2 max , respectively, and α and β are parameters that are used to control the effect of the first and second term of the filter. From Equation (2), when κ max ≥ 0, the first term identifies regions of interest relating to the local shape of the surface. This term becomes large when the local surface is curved like a valley. The second term is related to the structure of the local surface and increases as the valley becomes steep and deep.
After the vesselness, Υ, for all the vertices has been determined, vertices with nonzero values of Υ are identified as reliefs and the zero values as background. However, the computation of the curvatures requires derivative operation, and thus is prone to be affected by noise from the scanning process and the roughness of the surfaces of the monuments. These noisy curvatures give inaccurate vesselness values because of the small marginal difference in the principal curvatures, which result in unsatisfactory reliefs.

Relief Extraction Using Machine Learning
Among the machine learning models with acceptable performance in the extraction of reliefs is the SVM-based approach [24]. The main idea of this method involves the initial extraction of relief candidate segments which includes actual reliefs and relief-like noise (cracks/dents). Then, various geometrical features are extracted from the candidate segments, and, lastly, the SVM model is trained to carefully determine if each candidate is actual relief or non-relief. The relief candidate segments are extracted from the stele using the modified curvature-based method [18]. The geometrical features extracted from the segments consist of appearance, cross-section, and local extrema information.
The size and positions of relief candidate segment are important parameters for describing the appearance of objects. While the number of vertices in the segment determines the area of the segments, the statistical values of the surface points and their corresponding normalized surface points of the relief segments provide useful information relating to the positions of the surface points that are identified within the candidate segments. To obtain the cross-section features, 2D skeleton information of the segments are derived from the xand y-coordinates of the surface points within the segments. The width and depth of the cross section and the magnitude of the curvatures are extracted as the cross-section features. For the local extrema information, the location of the maximum depth point along the skeleton of the segments is considered. This local feature relies on the depth value and the distance separation between the point of interest and the skeleton.
The numbers of features extracted for the appearance, cross-section, and local extrema are 31, 25, and 23, respectively. Altogether, these features are used to train a binary SVM classifier with a third-order polynomial kernel.

Spin Images
Many papers describe the characteristics of local surfaces of 3D objects [27][28][29][30][31][32]. Spin image representation, one of the local surface descriptors, has been widely used for face recognition and object classification. Originally, the technique was introduced to address the problems of surface matching in computer vision [25]. The generation of SIs on the surface of a 3D object can reveal the local structural properties around each surface point. Assume that oriented points are distributed in the 3D space, each point is represented as a pair of its position v and normal vector n, as shown in Figure 1a. An imaginary plane P that contains v i and has the same normal n divides the 3D space around v i into two non-overlapping subspaces: the upper subspace above the plane and the lower subspace below it. Similarly, several imaginary planes that are parallel to P and equally spaced apart from each other also divide the space into smaller non-overlapping subspaces. Each subspace can be further partitioned radially from n based on the distance to n, as shown in Figure 1b. The SI for a certain oriented point (v i , n), denoted by S(v i ), is computed by encoding the coordinates of neighboring points, v j , around v i into a 2D histogram with two indices, ρ and γ, as shown in Figure 1c. Specifically, the coordinates of the SI, ρ and γ, indicate the indices of discrete intervals defined along the radial direction from n and the direction of n, respectively. As a result, a bin of the SI represents a cylindrical space with a hole whose axis coincides with n, as shown in the left image in Figure 1c, and the bin value indicates the number of points belonging to the space. The mathematical representation of the spin image coordinate of any 3D surface point, v j , is mathematically given as The generation of SI is dependent on the bin sizes and image dimension. While the bin sizes ∆ρ and ∆γ determine the volume of each cylindrical subspace, the image dimension is the number of rows, N γ , and columns, N ρ , of the SI.

Proposed Method
We present the proposed approach of extracting relief regions from the stone monuments in this section. A simple block diagram of the proposed relief extraction method is shown in Figure 2. Mesh partitions that contain identifiable characters are cropped from the 3D mesh. For very uneven steles, the cropped mesh partition can be slanted. To correct this, principal components of vertices of the mesh partition are obtained using the principal component analysis. The mesh partition is transformed to make the eigenvector corresponding to the smallest eigenvalue point vertically upward. The transformed mesh partition is denoted by M . The generation of SI involves the creation of SI for every vertex of all the 3D mesh partitions, and the information is stored offline in an SI database (Figure 3). For a reference vertex v i , four relative surface points, v  The simple concatenation of the five SIs, however, causes the feature dimension to increase five times. In order to avoid overfitting and training time increase, the dimensions of the SIs of the four localized vertices are reduced effectively by the sum-pooling operation. The spin image information of the five vertices are merged to form a feature vector, which is referred to as the CSI of v i and is denoted by C(v i ). The CSI is used for the training of a three-class label RF model.
Classification per vertex may cause isolated misclassified vertices. The k-NN algorithm is applied to correct such noisy misclassification, which improves the performance of the proposed method as well as enhances the visualization of the classification results.

Preprocessing
As mentioned above, the roughness nature of the surface of the ancient stele hinders the reliable estimation of surface curvatures. To address this problem, the input mesh is smoothed using the Gaussian filter with the standard deviation σ before estimating the curvatures. It is notable that the smoothed mesh is exploited only for curvature calculation. The SIs are generated using the original mesh without smoothing.

CSI Feature Extraction
The modules involved in the feature extraction process are explained as follows: • Relative vertices localization: As mentioned above, to enhance the classification performance at each vertex, the representation of a single SI is augmented with the SIs of its neighboring vertices. To estimate the position of the vertices along the principal curvatures of v i , the curvature tensors and the curvature derivatives are initially determined, and then the principal curvatures whose directions are denoted by t max and t min are obtained [33]. 3D points at a distance of d CSI from v i along t max , t min , −t max , and −t min are identified. Since the points are probably not on the mesh surface, we project the 3D points on the mesh and find the vertices closest to the projections. For clarity, we illustrate in Figure 4 how v 1 i and v 2 i are searched along the maximal principal direction; the same procedures are repeated in the minimal principal direction to search for v 3 and v 4 i are determined, the SI information of these surface points and v i are retrieved from the SI database. • Sum-pooling: To reduce the dimension of S(v 1 i ), · · · , S(v 4 i ), each SI is subsampled by using a sum-pooling operation with s × s filters and stride s. That is, adjacent s × s bins of the SI are summed up to yield a single bin of the subsampled SI. Therefore, the dimensions of each SI reduces from N γ N ρ to N γ N ρ s 2 . The purpose of this operation is to reduce the size of the SIs while preserving important information before they are vectorized. The sum-pooling operation is inspired by the multi-resolution SI, which applies the concept of pyramid structure of an image to compress a high resolution SI [27]. It is notable that the sum-pooling operation is actually identical to generating SI by partitioning the same 3D space into smaller number of larger subvolumes. In our experiments, s was set to 2.
• Vectorization: This operation converts the original SI information to vector of length N γ N ρ for easy merging with the four subsampled SIs. The operation is also repeated for the four subsampled SIs before they are merged. For each of the subsampled SI, the length of the vector is N γ N ρ s 2 . • Concatenation: All five vectors obtained from the output of the vectorization stages are concatenated to a single feature vector of length N γ N ρ + 4 N γ N ρ s 2 . The concatenated array forms the CSI and it is denoted as C(v i ).

Classification
Random forest has been widely and successfully used for various classification problems. We trained a three-class label RF classifier and searched for the optimal set of hyperparameters by varying the number of trees and maximum depth during training. The search for the optimal set of parameters guarantees that the RF model with the best performance is used for the extraction of reliefs from the surface of the stele. The k-NN search algorithm is applied to refine the classification results of the trained RF. The idea of this complementary application is to re-label the surface points by considering the dominant label class of the nearest k neighbors of each vertex. Although the application of k-NN algorithm does not significantly improve the performance of the RF model, it gives a better visualization of the relief, background, and the degraded region. The final result with two classes of the relief and the background is obtained by re-labeling the degraded region class to the background class.

Experimental Results and Discussion
For our experiments, we scanned the stele "Musul-ojakbi" to obtain a 3D mesh with size of 672.9 × 987.1 × 32.4 mm. The mesh contains over five million vertices, with an average resolution and accuracy of 250 µm and 30 µm, respectively. Musul-ojakbi monument is recorded to have been created in AD 578 during the Silla Dynasty. The monument has a rough surface because it was not ground before engraving and has undergone severe weathering. As illustrated in Figure 5, the characters inscribed on the surface are very difficult to interpret by visual inspection, especially at the center and the lower right portions of the stele. For further information on the Musul-ojakbi stele, we refer the reader to the previous study [24]. To reduce the presence of the degraded region in the ground truth dataset before extracting important features, we carefully cropped fine partitions of the 3D scanned data using the MeshLab 3D processing tool [34]. The ancient stele has 169 partitions containing different characters, but we selected 70 partitions because the reliefs pattern on their surfaces can be traced and labeled as relief, background, or degraded region. The SI and CSI information was extracted per vertex for all the surface points of the 70 partitions. From the information obtained from these 70 partitions, 56 mesh partitions were used for training while 14 partitions for testing. The experiments are presented in the following subsections.

Hyperparameters Search
We determined the various parameters for the proposed relief extraction method before comparing the performance with existing methods. For clarity, RF models that were trained using the SI and CSI information are denoted as RF(SI) and RF(CSI), respectively.

SI Parameters Search
Before generating the CSI, it is important to investigate thoroughly the optimal SI parameter combination. The SI generator requires four parameters: N ρ , N γ , ∆ρ, and ∆γ. In order to limit the feature dimension, N ρ and N γ were fixed to 10, resulting in 100 bins of SI. We performed a grid search for ∆ρ and ∆γ ranging from 0.05 to 1. For fair comparison among different SI combinations, the number of trees and maximum depth for the RF models were initially fixed to 500 and 32, respectively. The F1 scores of the RF models of different SI combinations were compared, as shown by the plot in Figure 6. The plot produces a smooth and concave shape with a peak value of 0.8730 at (∆ρ, ∆γ) = (0.6, 0.5). Since the variation of F1 score is very small when both ∆ρ and ∆γ are greater than 0.2, the performance of the RF model is insensitive to the SI parameters near the optimal combination. It is noteworthy that the average relief width of the stele is about 2 mm. Since SI of ∆ρ = 0.2 and N ρ = 10 covers a circular area with a radius of 2 mm, it is expected that the SI with a higher ∆ρ contains sufficient information to represent reliefs.
Consequently, other parameter combinations near ∆ρ = 0.6 and ∆γ = 0.5 does not reduce the performance much. This SI parameter combination that yielded the highest F1 score of 0.8730 is the optimal.

RF Hyperparameters Search
Following the investigation of the optimal SI parameters, were searched for the RF hyperparameters. Several RF models with different number of trees, N t , and maximum depth, N d , were trained with the optimal SI parameter combination, i.e., ∆ρ, ∆γ, N ρ , and N γ selected as 0.6, 0.5, 10, and 10, respectively. The performances of 12 RF models are presented in Table 1. From the search results, the RF model of 1000 trees and maximum depth of 32 performed better than other models in terms of accuracy, F1 score, and recall; however, the precision was slightly lower than 0.8814. Based on these results, the number of trees and maximum depth selected for all subsequent RF models were 1000 and 32, respectively.

Effect of the Smoothing Filter
The effect of the Gaussian smoothing on the estimation of curvatures was investigated. Figure 7 illustrates the maximal principal curvature direction of a mesh partition in colors. Hue changes with respect to the azimuthal angle of the curvature direction, while saturation reduces as its elevation angle increases. When the standard deviation of the filter, σ, is 0.9 and 1.6 ( Figure 7d,e), the vertices are realigned such that surfaces with similar principal directions have the same color. At σ = 4, the principal direction of the vertices are seriously disoriented, and this standard deviation value is not recommended to be used (Figure 7g).
Since the estimation of curvatures is directly linked to the localization of the four relative points used in CSI, it is also vital to demonstrate how the variation in σ affects the extraction of reliefs. For s = 2, the size of the subsampled SI is 5 × 5, and d CSI is initially selected as 1. Figure 8 shows that, as the value of σ increases from 0, the performances of the models also increase up to the highest F1 score of 0.8828. However, after σ > 1.6, the performances of the RF models start to decline. This observation is in line with the colormap of curvature directions reported in Figure 7, in which, when σ is large, the surface becomes severely flattened and the curvature directions does not provide any useful information. In the proposed method, σ = 1.6 was used for the Gaussian filter before estimating the principal curvatures of the mesh surface.

d CSI Search
The distance of relative vertices from the reference vertex is also a factor that influences the CSI. Using the parameters of ∆ρ, ∆γ, N t , N d , and σ as determined, i.e., 0.6, 0.5, 1000, 32, and 1.6, respectively, the performances of several RF models trained using the CSI generated from several values of d CSI are illustrated in Figure 9. The plot reveals that, when the d CSI value is small, that is, close to zero, the subsampled SI of the relative vertices has similar information as that of the reference vertex. However, as d CSI increases, the subsampled SI provides additional information, thereby improving the performance of the RF models in the detection of reliefs. At d CSI > 2, the performances of the RF models consistently decrease. According to the plot, the RF model with acceptable performance is observed when d CSI is 1 with the F1 score of 0.8828. The feature importance of two RF(CSI)s with the same SI parameters (N ρ , N γ , ∆ρ, and ∆γ of 10, 10, 0.6, and 0.5, respectively) but different d CSI are compared in Figure 10. The additional information of the relative surface points in RF(CSI) with d CSI = 1 shows that the model is more descriptive than RF(CSI) of d CSI = 3. Specifically, the incremental performance is related to the subsampled SI of v 3 i and v 4 i searched along the minimal principal direction, t min . These two relative points contain important features and their uniqueness is what make the model accurately identify relief or non-relief regions. Another interpretation is that, if the reference vertex v i is in the valley, then v 3 i and v 4 i would also be in the valley (within relief).

Size of Neighboring Vertices in k-NN
Lastly, we investigated the influence of different k-neighbors in refining the classification results of RF(CSI). Table 2 presents the performance of the RF model with suitable hyperparameters and eight different k sizes. The accuracy and F1 score are highest at k = 60. Considering the case when k = 60, the accuracy and F1 score of the classification results are higher than the other cases. Although the application of the k-NN does not significantly improve the result, it provides a good visualization as the relief, background, and degraded region are well clustered, as shown in Figure 11. The second and third columns of Figure 11 were obtained using the RF models trained using SI and CSI, respectively. The fourth column was obtained after applying k-NN to the results in the third column. The k-NN fills the holes and gets rid of isolated points. Once the relief, background, and degraded region are well clustered, it is possible to change the degraded region to the background region, thereby making the classification a two-class label consisting of relief and background regions.   Figure 11. Subjective comparison of the classification results before re-labeling. The second and third columns were obtained using the RF models trained using SI and CSI, respectively. The fourth column was obtained after applying k-NN to the third column. The k-NN fills the holes and gets rid of isolated points.

SI and CSI
In the proposed method, the CSI is geometrically configured by multiple SIs to enhance the surface representation. In order to demonstrate the effectiveness of the CSI, we compared the RF models that were trained using CSI and SI information, respectively. For fair comparison, except feature vectors fed to both the RF models, the identical pipeline, i.e., three-class classifier followed by the k-NN algorithm and re-labeling, was employed. Considering that the dimension of CSI is 200, i.e., N ρ = 10 and N γ = 10, the SI of 200 bins was generated using N ρ , N γ , ∆ρ, and ∆γ as 20, 10, 0.3, and 0.5, respectively. Note that ∆ρ = 0.3 in the SI of 200 bins to make the spin image coverage area the same as that of CSI in the ρ-direction. Since N ρ in SI was doubled, ∆ρ was halved. Table 3 shows the performance of the two RF models after applying k-NN. The RF(CSI) has a better performance than the RF(SI) in all metrics. This impressive performance is attributed to the additional features (from the subsampled SI) contained in CSI, which makes the classifier understand the differences between the class labels of the surface regions. The feature importance of the two RF models are illustrated in Figure 12. There are many bins in RF(SI) with less significance compared to RF(CSI). In addition, the important feature bins in RF(SI) are also contained in the feature bins of the RF(CSI), which implies that the original local surface information is preserved in the CSI.

Two-Class and Three-Class RF(CSI)
Another way to verify the effectiveness of the proposed method is to perform a direct two-class relief extraction, i.e., the RF classifier was trained with two classes, relief and background background. The comparison of the classification results between the twoclass and three-class RF(CSI) is presented in Table 4. The performance of the three-class RF is higher than the two-class RF in all metrics except precision. Based on this result, the three-class RF(CSI) is more reliable in the extraction of reliefs when compared to the two-class RF(CSI).

Evaluation of the Proposed Method with Other Existing Methods
In order to evaluate the performance of the proposed method, we compared it with several methods including the height-based method [8], the curvature-based method [7], the modified curvature-based method [18], and the SVM trained on appearance, crosssection, and local extrema-based features [24]. The parameters used in the proposed method and existing methods are given in Table 5.  Figure 13a shows eight mesh partitions of the test data and Figure 13b illustrates the corresponding ground truth labels of the mesh partitions with three classes. The degraded region label is utilized in the SVM-based method and the proposed method, while the label is treated as background in the other methods.
As shown in Figure 13c, the height-based method produces unsatisfactory classification results because of the inconsistencies with the distribution of the heights, largely from the degraded region. Figure 13d shows the relief regions extracted using the curvaturebased method. The method yields an unreliable extraction of reliefs due to noisy curvatures without the consideration of the rough surface of the stele. Furthermore, the boundaries of the extracted relief regions are not smooth, and it is obvious that the reliefs are not accurately extracted. There is significant improvement in the modified curvature-based method when compared with results from the curvature-based method. However, the modified curvature-based method cannot distinguish degraded region from reliefs, and thus results in many false detections, as shown in Figure 13e.
The machine learning-based methods using SVM and RF are both promising and they outperformed all other methods by eliminating degraded regions (see Figure 13f,g). However, the SVM-based method misclassifies many thin and small degraded regions as reliefs, which hinders recognition of the characters. The proposed method yielded more pleasant results by suppressing those thin degraded regions and producing reliefs with smooth boundaries.  [8]; (d) the curvature-based method [7]; (e) the modified curvature-based method [18]; and (f) the SVM-based method [24]. (g) The proposed method. Figure 14 illustrates the results of the comparative methods with respect to the entire ancient stele. The stone rubbing image in Figure 14a indicates not only the reliefs but also cracks and scratches, which makes it difficult for the epigraphists to recognize the characters correctly. The results obtained using the height-based method and the curvaturebased method are presented in Figure 14b,c. The classification results are inconsistent as the methods are affected by the degraded regions. There is a significant improvement in the modified curvature-based method, but the extracted reliefs are not accurate and precise (see Figure 14d). For the proposed method, we cross-validated the CSI information by separating the 70 mesh partitions into five folds. Four folds were used to train the RF model in the proposed method and the last fold was selected as test data. This process was repeated until we were able to extract the reliefs for all 70 mesh partitions. One of the RF models trained from the fold was applied to the remaining 113 mesh partitions that were not included in the train. The initial classification result of the 169 mesh partitions were refined with the k-NN, and the result is presented in Figure 14f. The classification result for the SVM-based method in Figure 14e was obtained in the same procedure, but the SVM models were trained using the appearance, cross-section, and local extrema features extracted from the surface of the mesh partitions. From visual inspection, the classification results obtained from the machine learning-based methods involving SVM and RF models were promising when compared to other existing methods, as illustrated in Figure 14e,f. This indicates that the geometrical features extracted in the SVM-based method contained useful information for identifying reliefs and background, and this observation is the same in the proposed RF-based method involving CSI features. It is difficult to quantitatively analyze the performance of these methods on the entire stele because there is no ground truth label for the 113 mesh partitions. Figure 14. The comparison of the existing relief extraction methods and the proposed method: (a) stone rubbing; (b) the height-based method [8]; (c) the curvature-based method [7]; (d) the modified curvature-based method [18]; (e) the SVM-based method [24]; and (f) the proposed method.
The performance of the machine learning methods, based on SVM and RF (with and without k-NN), were further evaluated using a fold containing the same 14 mesh partitions selected as test data. These machine learning methods were considered for comparison since their classification results are relatively similar. The comparison of the performance of these methods is presented in Table 6. The initial classification result obtained using the proposed method without k-NN, denoted as RF(CSI), is better than the SVM-based method in all metrics except recall. To further improve the reliability of the proposed method, the initial classification results is refined with the k-NN. As such, the proposed method, RF(CSI), refined with k-NN leads the SVM-based method with an increase of approximately 0.68%, 3.53%, 0.18%, and 2.25% in accuracy, precision, recall, and F1 score, respectively.  Figure 15 depicts the objective comparison of these methods in terms of accuracy, precision, recall, F1 score, and segmented inscription recognition index (SIRI) [35]. The SIRI can evaluate the inscriptions extracted from the 3D scanning data by imitating human subjective inscription recognition. If a region slightly thinner or thicker than the actual relief segment is detected, it has little effect on the perceived quality of the inscription. In contrast, unexpected misdetection makes the perceived quality degrade significantly. Based on these observations, SIRI evaluates the quality of extracted inscriptions with different importance with respect to region.
For the height-based method, false detection reduces precision and F1 score. The curvature-based method yields a number of false detections, resulting in much lower precision and F1 score. With the modified curvature-based method, all metrics increased but results are not acceptable. The SVM-based method significantly increases in precision, F1 score, and SIRI. The proposed method achieves the highest performance in all metrics except recall. The precision value improves by 0.0353 as compared to the second best method, which also leads to 2.66% higher F1 score. It is noteworthy that the proposed method achieves a larger improvement of 3.2% in SIRI. That indicates that the proposed method produces more recognizable characters, as confirmed in Figure 14f.  Figure 15. Performance evaluation of the relief extraction methods in terms of precision, accuracy, recall, F1 score, and SIRI.

Conclusions
In this study, we proposed a machine learning-based approach for extracting reliefs from ancient stele. By exploiting the local surface description of the surface of the stele, a cross spin image is used to train a RF model. The optimal parameter combination required for generating the spin image and cross spin image were thoroughly searched. The suitable number of trees and maximum depth for the random forest model were investigated. The initial classification results obtained from the random forest model were refined using k-NN algorithm to cluster regions belonging to the same class label. The degraded regions were identified and re-labeled as background. Experimental results show that the proposed method outperformed previous methods by more than 0.68%, 3.53%, 2.25%, and 2.30% in accuracy, precision, F1 score, and SIRI, respectively. Especially, the proposed method successfully improved recognizability. With the proposed method, users with little or no experience in epigraphy can accurately identify relief patterns on rough ancient stele.
As part of future work, we would investigate further other advanced machine learning models trained on local surface descriptors, with the objective to improve the classification results, without the need to employ k-NN for refining the results.