Classiﬁcation of 3D Digital Heritage

: In recent years, the use of 3D models in cultural and archaeological heritage for documentation and dissemination purposes is increasing. The association of heterogeneous information to 3D data by means of automated segmentation and classiﬁcation methods can help to characterize, describe and better interpret the object under study. Indeed, the high complexity of 3D data along with the large diversity of heritage assets themselves have constituted segmentation and classiﬁcation methods as currently active research topics. Although machine learning methods brought great progress in this respect, few advances have been developed in relation to cultural heritage 3D data. Starting from the existing literature, this paper aims to develop, explore and validate reliable and efﬁcient automated procedures for the classiﬁcation of 3D data (point clouds or polygonal mesh models) of heritage scenarios. In more detail, the proposed solution works on 2D data (“texture-based” approach) or directly on the 3D data (“geometry-based approach) with supervised or unsupervised machine learning strategies. The method was applied and validated on four different archaeological/architectural scenarios. Experimental results demonstrate that the proposed approach is reliable and replicable and it is effective for restoration and documentation purposes, providing metric information e.g. of damaged areas to be restored.


Introduction
The generation of 3D data of heritage sites or monuments, being point clouds or polygonal models, is altering the approach that cultural heritage specialists use for the analysis, interpretation, communication and valorization of such historical information.Indeed, 3D information allows one, e.g.; to perform morphological measurements, quantitative analysis, and information annotation, as well as produce decay maps, while enabling easy access and study of remote sites and structures.
The management of architectural heritage information is considered crucial for a better understanding of the heritage data as well as for the development of targeted conservation policies and actions.An efficient information management strategy should take into consideration three main concepts: segmentation, structuring the hierarchical relationships and semantic enrichment [1].The demand for automatic model analysis and understanding is continuously increasing.Recent years have witnessed significant progress in automatic procedures for segmentation and classification of point clouds or meshes [2][3][4].There are multiple studies related to the segmentation topic, mainly driven by specific needs provided by the field of application (Building Information Modeling (BIM) [5], heritage documentation and preservation [6], robotics [7] autonomous driving [8], urban planning [9], etc.).
In the cultural heritage field, the identification of different components (Figure 1) in point clouds and 3D meshes is of primary importance because it can facilitate the study of monuments and integrating them with heterogeneous information and attributes.However, it remains a challenging task considering the complexity and high variety that heritage case studies can have.
should take into consideration three main concepts: segmentation, structuring the hierarchical relationships and semantic enrichment [1].The demand for automatic model analysis and understanding is continuously increasing.Recent years have witnessed significant progress in automatic procedures for segmentation and classification of point clouds or meshes [2][3][4].There are multiple studies related to the segmentation topic, mainly driven by specific needs provided by the field of application (Building Information Modeling (BIM) [5], heritage documentation and preservation [6], robotics [7] autonomous driving [8], urban planning [9], etc.)In the cultural heritage field, the identification of different components (Figure 1) in point clouds and 3D meshes is of primary importance because it can facilitate the study of monuments and integrating them with heterogeneous information and attributes.However, it remains a challenging task considering the complexity and high variety that heritage case studies can have.
The research presented in this article was motivated by the need to identify and map different states of conservation phases or employed materials in heritage objects.Towards this direction, we developed a method: (i) to document and retrieve historical and architectural information; (ii) to distinguish different constructing techniques (e.g.types of opus, etc.); and (iii) to recognize the presence of existing restoration evidence.The retrieval of such information in historic buildings by traditional methods (e.g.; manual mapping or simple visual inspection by experts) are considered time-consuming and laborious procedures [10].The aim of our research was not to develop a new algorithm for the classification of 3D data (point clouds or polygonal mesh models) of heritage scenarios, but to explore the applicability of supervised machine learning approaches to an almost unexplored field of application (i.e.; cultural heritage) proposing a reliable and efficient pipeline that can be standardized for different case studies.

State of the Art: 2D/3D Segmentation and Classification Techniques
Both image and cloud segmentation are fundamental tasks in various application, such as object detection [11], medical analyses [12], license plate and vehicle recognition [13], classification of microorganisms [14], fruit recognition [15] and many more [16].Segmentation is the process of grouping data (e.g.; images, point clouds or meshes) into multiple homogeneous regions with similar properties [17] (Figure 2).These regions are homogeneous with respect to some criteria, called features, that constitute a characteristic property or set of properties which is unique, measurable and The research presented in this article was motivated by the need to identify and map different states of conservation phases or employed materials in heritage objects.Towards this direction, we developed a method: (i) to document and retrieve historical and architectural information; (ii) to distinguish different constructing techniques (e.g.types of opus, etc.); and (iii) to recognize the presence of existing restoration evidence.The retrieval of such information in historic buildings by traditional methods (e.g.; manual mapping or simple visual inspection by experts) are considered time-consuming and laborious procedures [10].The aim of our research was not to develop a new algorithm for the classification of 3D data (point clouds or polygonal mesh models) of heritage scenarios, but to explore the applicability of supervised machine learning approaches to an almost unexplored field of application (i.e.; cultural heritage) proposing a reliable and efficient pipeline that can be standardized for different case studies.

State of the Art: 2D/3D Segmentation and Classification Techniques
Both image and cloud segmentation are fundamental tasks in various application, such as object detection [11], medical analyses [12], license plate and vehicle recognition [13], classification of microorganisms [14], fruit recognition [15] and many more [16].Segmentation is the process of grouping data (e.g.; images, point clouds or meshes) into multiple homogeneous regions with similar properties [17] (Figure 2).These regions are homogeneous with respect to some criteria, called features, that constitute a characteristic property or set of properties which is unique, measurable and differentiable.In the case of 2D imagery, features refer to visual properties such as size, color, shape, scale patterns, etc., while, for 3D point cloud data, they typically result from specific geometric characteristic of the global or local 3D structure [18].Typically, in 3D data, surface normals, gradients and curvature in the neighborhood of a point are used.
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 24 differentiable.In the case of 2D imagery, features refer to visual properties such as size, color, shape, scale patterns, etc., while, for 3D point cloud data, they typically result from specific geometric characteristic of the global or local 3D structure [18].Typically, in 3D data, surface normals, gradients and curvature in the neighborhood of a point are used.Once 2D or 3D scenarios have been segmented, each group can be labeled with a class giving the parts some semantics, hence classification is often called semantic segmentation or pixel/point labeling.

Segmentation Methods
The image segmentation topic has been widely explored [20] and current state-of-the-art Once 2D or 3D scenarios have been segmented, each group can be labeled with a class giving the parts some semantics, hence classification is often called semantic segmentation or pixel/point labeling.

Segmentation Methods
The image segmentation topic has been widely explored [20] and current state-of-the-art techniques include edge-based [21,22] and region-based approaches [23] and clustering technique [24][25][26][27] (Figure 3).Once 2D or 3D scenarios have been segmented, each group can be labeled with a class giving the parts some semantics, hence classification is often called semantic segmentation or pixel/point labeling.

Segmentation Methods
The image segmentation topic has been widely explored [20] and current state-of-the-art techniques include edge-based [21,22] and region-based approaches [23] and clustering technique [24][25][26][27] (Figure 3).Except for the model fitting approach, most of the point cloud data segmentation methods have some root to image segmentation.However, due to the complexity and variety of point clouds caused by irregular sampling, varying density, different types of objects, etc.; point cloud segmentation and classification are more challenging and still very active research topics.
Edge-based segmentation methods have two main stages [27]: (i) edge detection to outlines the borders of different regions; and [28]: (ii) grouping of points inside the boundaries to deliver the final segments.Edges in a given depth map are defined by the points where changes in the local surface properties exceed a given threshold.The most used local surface properties are normals, gradients, principal curvatures or higher-order derivatives.Methods based on edge-based segmentation techniques were reported by Bhanu et al. [29], Sappa and Devy [30], and Wani and Arabnia [31].Although such methods allow a fast segmentation, they may produce inaccurate results due to noise and uneven density of point clouds, situations that commonly occur in point cloud data.In 3D space, Except for the model fitting approach, most of the point cloud data segmentation methods have some root to image segmentation.However, due to the complexity and variety of point clouds caused by irregular sampling, varying density, different types of objects, etc.; point cloud segmentation and classification are more challenging and still very active research topics.
Edge-based segmentation methods have two main stages [27]: (i) edge detection to outlines the borders of different regions; and [28]: (ii) grouping of points inside the boundaries to deliver the final segments.Edges in a given depth map are defined by the points where changes in the local surface properties exceed a given threshold.The most used local surface properties are normals, gradients, principal curvatures or higher-order derivatives.Methods based on edge-based segmentation techniques were reported by Bhanu et al. [29], Sappa and Devy [30], and Wani and Arabnia [31].Although such methods allow a fast segmentation, they may produce inaccurate results due to noise and uneven density of point clouds, situations that commonly occur in point cloud data.In 3D space, such methods often detect disconnected edges making the identification of closed segments difficult without a filling or interpretation procedure [32].
Region-based methods work with region growing algorithms.In this case, the segmentation starts from one or more points (seed points) featuring specific characteristics and then grows around neighboring points with similar characteristics, such as surface orientation, curvature, etc. [27,33].The initial algorithm was introduced by Besl et al. [34], and several variations are presented in the literature [35][36][37][38][39][40].In general, the region growing methods are more robust to noise than the edge-based ones because they utilize global information [41].However, these methods are sensitive to: (i) the location of initial seed regions; and (ii) inaccurate estimations of the normals and curvatures of points near region boundaries.
The model fitting approach is based on the observation that many man-made objects can be decomposed into geometric primitives such as planes, cylinders and spheres (Figure 4).Therefore, primitive shapes (such as cylinders, cubes, spheres, etc.) are fitted onto 3D data and the points that best fit the mathematical representation of the fitted shape are labeled as one segment.As part of the model fitting category, two widely employed algorithms are the Hough Transform (HT) [42] and the Random Sample Consensus (RANSAC) [43].Model fitting methods are fast and robust with outliers, although a set of dedicated primitives is necessary.As it falls short for complex shapes or fully automated implementations, the use of the richness of surface geometry through local descriptors provides a better solution [44].In the architectural field, details cannot always be modeled into easily recognizable geometric shapes.Thus, while some entities can be characterized by geometric properties, others are more readily distinguished by their color content [45].
points near region boundaries.
The model fitting approach is based on the observation that many man-made objects can be decomposed into geometric primitives such as planes, cylinders and spheres (Figure 4).Therefore, primitive shapes (such as cylinders, cubes, spheres, etc.) are fitted onto 3D data and the points that best fit the mathematical representation of the fitted shape are labeled as one segment.As part of the model fitting category, two widely employed algorithms are the Hough Transform (HT) [42] and the Random Sample Consensus (RANSAC) [43].Model fitting methods are fast and robust with outliers, although a set of dedicated primitives is necessary.As it falls short for complex shapes or fully automated implementations, the use of the richness of surface geometry through local descriptors provides a better solution [44].In the architectural field, details cannot always be modeled into easily recognizable geometric shapes.Thus, while some entities can be characterized by geometric properties, others are more readily distinguished by their color content [45].Machine learning approaches are described in detail in the next sections.It should be noted that, while clustering algorithms (unsupervised technique) belong to segmentation methods, after applying supervised machine learning methods, the 3D data are not only segmented but also classified.
Several benchmarks have been proposed in the research community, providing labeled terrestrial and airborne data on which users can test and validate their own algorithms.However, most of the available datasets provide classified natural, urban and street scenes [46][47][48][49][50][51][52].While in those scenarios the object classes and labels are almost defined (mainly ground, roads, trees, and buildings), the identification of precise categories in the heritage field is much more complex (several classes could be used to identify and describe the same building characteristics, based on different purposes).For these reasons, to the authors' knowledge, in the literature, there is currently a lack of benchmarks of labeled images or 3D heritage data.
Machine learning (including deep learning) is a scientific discipline concerned with the design and development of Artificial Intelligence algorithms that allow computers to makes decisions based Machine learning approaches are described in detail in the next sections.It should be noted that, while clustering algorithms (unsupervised technique) belong to segmentation methods, after applying supervised machine learning methods, the 3D data are not only segmented but also classified.
Several benchmarks have been proposed in the research community, providing labeled terrestrial and airborne data on which users can test and validate their own algorithms.However, most of the available datasets provide classified natural, urban and street scenes [46][47][48][49][50][51][52].While in those scenarios the object classes and labels are almost defined (mainly ground, roads, trees, and buildings), the identification of precise categories in the heritage field is much more complex (several classes could be used to identify and describe the same building characteristics, based on different purposes).For these reasons, to the authors' knowledge, in the literature, there is currently a lack of benchmarks of labeled images or 3D heritage data.
Machine learning (including deep learning) is a scientific discipline concerned with the design and development of Artificial Intelligence algorithms that allow computers to makes decisions based on empirical and training data.Broadly, there are three types of approach with machine learning algorithms:

•
A supervised approach is where semantic categories are learned from a dataset of annotated data and the trained model is used to provide a semantic classification of the entire dataset.If for the aforementioned methods the classification is a step after the segmentation, when using supervised machine learning methods, the class labeling procedure is planned before to segment the model.Random forest [62], described in detail at Section 4.2, is one of the most used supervised learning algorithms for classification problem [63,64].

•
An unsupervised approach is where the data are automatically partitioned into segments based on a user-provided parameterization of the algorithm.No annotations are requested but the outcome might not be aligned with the user's intention.Clustering is a type of unsupervised machine learning that aims to find homogeneous subgroups such that objects in the same group (clusters) are more similar to each other than those in other groups.K-Means is a clustering algorithm that divides observations into k clusters using features.Since we can dictate the number of clusters, it can be easily used in classification where we divide data into clusters that can be equal to or more than the number of classes.The original K-means algorithm presented by MacQueen et al. [65] has been then largely exploited for image and point clouds by various researchers [66][67][68][69][70].

•
An interactive approach is where the user is actively involved in the segmentation/classification loop by guiding the extraction of segments via feedback signals.This requires a large effort from the user side but it could adapt and improve the segmentation result based on the user's feedback.
Feature extraction is a prerequisite for image/cloud segmentation.Features play an important role in these problems and their definition is one of the bottlenecks of machine learning methods [71,72].Weinmann et al. [73] discussed the suitability of features that should privilege quality over quantity (Figure 5).High quality features allow better interpreting the models and enhancing algorithm performance with respect to both the speed and accuracy.This shows a need to prioritize and find robust and relevant features to address the heterogeneity in images or point clouds.
• An unsupervised approach is where the data are automatically partitioned into segments based on a user-provided parameterization of the algorithm.No annotations are requested but the outcome might not be aligned with the user's intention.Clustering is a type of unsupervised machine learning that aims to find homogeneous subgroups such that objects in the same group (clusters) are more similar to each other than those in other groups.K-Means is a clustering algorithm that divides observations into k clusters using features.Since we can dictate the number of clusters, it can be easily used in classification where we divide data into clusters that can be equal to or more than the number of classes.The original K-means algorithm presented by MacQueen et al. [65] has been then largely exploited for image and point clouds by various researchers [66][67][68][69][70]. • An interactive approach is where the user is actively involved in the segmentation/classification loop by guiding the extraction of segments via feedback signals.This requires a large effort from the user side but it could adapt and improve the segmentation result based on the user's feedback.
Feature extraction is a prerequisite for image/cloud segmentation.Features play an important role in these problems and their definition is one of the bottlenecks of machine learning methods [71,72].Weinmann et al. [73] discussed the suitability of features that should privilege quality over quantity (Figure 5).High quality features allow better interpreting the models and enhancing algorithm performance with respect to both the speed and accuracy.This shows a need to prioritize and find robust and relevant features to address the heterogeneity in images or point clouds.In both 2D and 3D segmentation/classification, approaches can be combined to exploit the strength of a method and bypass the weakness of others [41,74].The success of these "hybrid methods" depends on the success of the underlying approaches.

Segmentation and Classification in Cultural Heritage
In the field of cultural heritage, processes such as segmentation and classification can be applied at different scales, from entire archaeological sites and landscapes to small artifacts.
In the literature, different solutions are presented for the classification of architectural images, using different techniques such as pattern detection [75], Gabor filters and support vector machine [76], K-means algorithms [77], clustering and learning of local features [78], hierarchical sparse coding of blocks [79] or CNN deep learning [80,16].
Many experiments were also carried out on 3D data at different scales [6,81,82].Some works aim to define a procedure for the integration of architectural 3D models within BIM [1,5,83].In many others, the classification is conducted manually for annotation purposes (www.aioli.cloud).In the In both 2D and 3D segmentation/classification, approaches can be combined to exploit the strength of a method and bypass the weakness of others [41,74].The success of these "hybrid methods" depends on the success of the underlying approaches.

Segmentation and Classification in Cultural Heritage
In the field of cultural heritage, processes such as segmentation and classification can be applied at different scales, from entire archaeological sites and landscapes to small artifacts.
In the literature, different solutions are presented for the classification of architectural images, using different techniques such as pattern detection [75], Gabor filters and support vector machine [76], K-means algorithms [77], clustering and learning of local features [78], hierarchical sparse coding of blocks [79] or CNN deep learning [16,80].
Many experiments were also carried out on 3D data at different scales [6,81,82].Some works aim to define a procedure for the integration of architectural 3D models within BIM [1,5,83].In many others, the classification is conducted manually for annotation purposes (www.aioli.cloud).In the NUBES project, for example, 3D models are generated from 2D annotated images.In particular, the NUBES web platform [84] allows the displaying and cross-referencing of 2D mapping data on the 3D model in real time, by means of structured 2D layer, such as annotations concerning stone degradation, dating and material.Apollonio et al. [85] used 3D models and data mapping on 3D surfaces in the context of the restoration documentation of Neptune's Fountain in Bologna.Campanaro et al. [86] realized a 3D management system for heritage structures by exploiting the combination of 3D visualization and GIS analysis.The 3D model of the building was originally split into architectural sub-elements (facades) to add color information projecting orthoimages by means of planar mapping techniques (texture mapping).Sithole [87] proposed an automatic segmentation method for detecting bricks in masonry walls, working on the point clouds and assuming that mortar channels are reasonably deep and wide.Oses et al. [76] used machine learning classifiers, support vector machines and classification trees for the masonry classification.Riveiro et al. [88] suggested an algorithm for the segmentation of bricks in point cloud built on a 2.5D approach and creating images based on the intensity attribute of LiDAR sensors.Recently, Messaoudi et al. [89] developed a correlation pipeline for the integration of semantic, spatial and morphological dimension of a built heritage.The annotation process provides a 3D point-based representation of each 2D region.

Project's Methodology
Considering the availability and reliability of segmentation methods applied to (2D) images and the efficiency of machine learning strategies, a new methodology was developed to assist cultural heritage experts analyze digital 3D data.In particular, the approach presented hereafter relies on supervised and unsupervised machine learning methods for segmenting texture information of 3D digital models.Starting from colored 3D point clouds or textured surface models, our pipeline relies on the following steps:

•
Segment the orthoimage or the UV map following different approaches tailored to the case study (clustering, random forest) (Figure 6d-e).

•
Project the 2D classification results onto the 3D object space by back-projection and collinearity model (Figure 6f).
combination of 3D visualization and GIS analysis.The 3D model of the building was originally split into architectural sub-elements (facades) to add color information projecting orthoimages by means of planar mapping techniques (texture mapping).Sithole [87] proposed an automatic segmentation method for detecting bricks in masonry walls, working on the point clouds and assuming that mortar channels are reasonably deep and wide.Oses et al. [76] used machine learning classifiers, support vector machines and classification trees for the masonry classification.Riveiro et al. [88] suggested an algorithm for the segmentation of bricks in point cloud built on a 2.5D approach and creating images based on the intensity attribute of LiDAR sensors.Recently, Messaoudi et al. [89] developed a correlation pipeline for the integration of semantic, spatial and morphological dimension of a built heritage.The annotation process provides a 3D point-based representation of each 2D region.

Project's Methodology
Considering the availability and reliability of segmentation methods applied to (2D) images and the efficiency of machine learning strategies, a new methodology was developed to assist cultural heritage experts analyze digital 3D data.In particular, the approach presented hereafter relies on supervised and unsupervised machine learning methods for segmenting texture information of 3D digital models.Starting from colored 3D point clouds or textured surface models, our pipeline relies on the following steps: • Create and optimize models, orthoimages (for 2.5D geometries) and UV maps (for 3D geometries) (Figure 6a-c).• Segment the orthoimage or the UV map following different approaches tailored to the case study (clustering, random forest) (Figure 6d-e).• Project the 2D classification results onto the 3D object space by back-projection and collinearity model (Figure 6f).

Image Preparation
The method works on the texture information of a 3D model.The texture is prepared according to the geometry and complexity of the considered 3D object:

•
Planar objects (e.g.walls, Section 5.1): The object orthophoto is created and the procedure classifies and finally re-maps the information onto the 3D geometry.

•
Regular objects (e.g.building or other 3D structures with certain level of complexity fit into this category, Section 5.2): Instead of creating various orthoimages from different points of view, unwrapped texture (UV maps) are generated and classified.To generate a good texture image to be classified, we followed these steps: 1.
Remeshing: Beneficial to improve the quality of the mesh and to simplify the next steps.2.
Unwrapping: UV maps are generated using Blender, adjusting and optimizing seam lines and overlap (Figure 6c) to facilitate the subsequent analysis with machine learning strategies.This correction is made commanding the UV unwrapper to cut the mesh along edges chosen in accordance with the shape of the case study [90].

3.
Texture mapping: The created UV map is then textured (Figure 6d) using the original textured polygonal model (as vertex color or with external texture).This way the radiometric quality is not compromised despite the remeshing phase.

•
Complex objects (e.g.; monuments or statue, Section 5.3): When objects are too complex for a good unwrap, the classification is done directly on the texture generates as output during the 3D modeling procedure.
When we consider color image segmentation, choosing a proper color space becomes an important issue [91].This is because different color spaces present color information in different ways that make certain calculations more convenient and also provide a way to identify colors that is more intuitive.Several color representations are currently in use in color image processing.The most common is the RGB, but also HSV and L*A*B* are frequently chosen color spaces [92,93].In the RGB color space, for example, shadowed areas will most likely have very different characteristics than areas without shadows.In the HSV color space, the hue component of areas with and without shadow are more likely to be similar: the shadow will primarily influence the value, or the saturation component, while the hue-indicating the primary "color" without its brightness and diluted-ness by white/black-should not change so much.Another popular option is LAB color space-where the AB channels represent the color and Euclidean distances in AB space-better match the human perception of color.Again, ignoring the L channel (Luminance) makes the algorithm more robust to lighting differences.

Supervised Learning Classification
The 2D classification method relies on different machine learning models embedded in WeKa [94] coupled with the Fiji distribution of ImageJ, an image processing software that exploits WeKa as an engine for machine learning models [95].The method combines a collection of machine learning algorithms (random tree, support vector machine, random forest, etc.) with a set of selected image features to produce pixel-based segmentations.All the available classifiers are based on a decision tree learning method.In this approach, during the training, a set of decision nodes over the values of the input features (e.g."feature x is greater than 0.7?") are built and connected to each other in a tree structure.
This structure, as a whole, represents a complex decision process over the input features.The result of this decision is a value for the label that classifies the input example.During the training phase, the algorithm learns these decision nodes and connects them.
Among the different approaches, we achieved the best results in terms of accuracy exploiting the random forest method (Section 5.1) [62].In this approach, several decision trees are trained as an ensemble, with the mode of all the predictions that is taken as the final one.This allows us to overcome some typical problems in decision tree learning, such as overfitting the training data and learning uncommon irregular patterns that may occur in the training set.This behavior is mitigated by the random forest procedure by randomly selecting different subsets of the training set and, for each of these subsets, a random subset of input features.At the same time, for each of these subsets of training examples and features, a decision tree is learned.The main intuition between these procedures is called "feature bagging", where some features are very strong predictors for the output class.Such features will be likely to be selected in many of the trees, causing them to become correlated.
For each case study, the random forest was trained giving in input the manually annotated model's textures.More specifically, not all the pixels of those images were manually annotated with its corresponding label, but just some significant and well distributed portions (e.g., see Figure 8b).
The first time the training process starts, the features of the input image are extracted and converted to a set of vectors of float values (Weka input).This step can take some time depending on the size of the images, the number of features and the number of cores of the machine where the classification is running.The feature calculation is done in a completely multi-thread fashion.The features are calculated only the first time it is trained after starting the plugin or after changing any of the feature options.In the case of color (RGB) images, the hue, saturation and brightness are also part of the features.

Unsupervised Learning Classification
The unsupervised segmentation approach is performed using the k-means clustering plugin of ImageJ or Fiji [96].The algorithm performs pixel-based segmentation of multi-band images.Each pixel in the input image is assigned to one of the clusters.Values in the output image represent the cluster number to which original pixel is assigned.Before starting the elaboration, the operator decides the number K of classes the image will be divided into and the cluster center tolerance.

Evaluation Method
To assess the performance of the classification, small sections of the entire datasets were trained and then compared quantitatively with the ground truths.More specifically, we relied on the precision, recall and F1 score calculated for each class-computed for each point by comparing the label predicted by the classifier with the same manually annotated-and on the overall accuracy, which is useful to evaluate the overall performance of the classifier.
Overall accuracy = number o f correct predictions total number o f predictions (4) where, for each considered class, (true positive), Tn (true negative), Fp (false positive), and Fn (false negative) come from the confusion matrix, which is commonly used to evaluate machine learning classifiers.
Once considerable levels of overall accuracy were reached (>70%), the classification was extended to the whole datasets, which were evaluated qualitatively.In this way, the performance of the model was assessed measuring the performance against another set of images, different from the ones used during the training phase, so that the capabilities of the model to generalize over unseen data could be effectively measured.

Test Objects and Classification Results
The proposed methodology was applied to and tested on various archaeological and architectural scenarios to prove its replicability and reliability with different 3D objects.
In particular, these case studies were considered: • The Pecile's wall of Villa Adriana in Tivoli: it is a 60 m L × 9 m H wall (Figure 7a) with holes on its top meant for the beams of a roof.The digital model of the wall was classified to identify the different categories of opus (roman building techniques), distinguishing original and restored parts.

•
Part of a renaissance portico located in the city center of Bologna: It spans ca. 8 m L × 13 m H × 5 m D (Figure 7b).The classification aimed to identify principal parts and architectural elements; • The Sarcophagus of the Spouses (Figure 7c): It is a late 6th century BC Etruscan anthropoid sarcophagus, 1.14 m high by 1.9 m wide, made of terracotta, which was once brightly painted [97].
The classification aimed at identifying surface anomalies (such as fractures and decays) and quantifying the percentage of mimetic cement used to assemble the sarcophagus.

•
The Bartoccini's Tomb in Tarquinia (Figure 7d): the tomb, excavated in the hard sand in the 4th century, has four rooms-a central one (ca.5 m × 4 m) and three later rooms (ca. 3 m × 3 m)-all connected through small corridors.The height of the tomb rooms does not exceed 3 m and it is all painted with a reddish color and various figures.The aim was to automatically identify the still painted areas on the wall and the deteriorated parts.
• The Pecile's wall of Villa Adriana in Tivoli: it is a 60 m L × 9 m H wall (Figure 7a) with holes on its top meant for the beams of a roof.The digital model of the wall was classified to identify the different categories of opus (roman building techniques), distinguishing original and restored parts.
• Part of a renaissance portico located in the city center of Bologna: It spans ca. 8 m L × 13 m H × 5 m D (Figure 7b).The classification aimed to identify principal parts and architectural elements; • The Sarcophagus of the Spouses (Figure 7c): It is a late 6th century BC Etruscan anthropoid sarcophagus, 1.14 m high by 1.9 m wide, made of terracotta, which was once brightly painted [97].The classification aimed at identifying surface anomalies (such as fractures and decays) and quantifying the percentage of mimetic cement used to assemble the sarcophagus.• The Bartoccini's Tomb in Tarquinia (Figure 7d): the tomb, excavated in the hard sand in the 4th century, has four rooms-a central one (ca.5 m × 4 m) and three later rooms (ca. 3 m × 3 m)-all connected through small corridors.The height of the tomb rooms does not exceed 3 m and it is all painted with a reddish color and various figures.The aim was to automatically identify the still painted areas on the wall and the deteriorated parts.For the porticoes and the Bartoccini's tomb, a supervised segmentation approach directly on the 3D models was also applied.

The Pecile's Wall
To define the correctness of the developed approach, different analyses were conducted on this first case study.Only a portion of the Pecile's wall (4 m length × 9 m height) was considered at first (Figure 8).On this portion of the wall's orthoimage, different training processes were run using different image scales to identify the solution that best fit this case study, taking into account the classification aims.For the porticoes and the Bartoccini's tomb, a supervised segmentation approach directly on the 3D models was also applied.

The Pecile's Wall
To define the correctness of the developed approach, different analyses were conducted on this first case study.Only a portion of the Pecile's wall (4 m length × 9 m height) was considered at first (Figure 8).On this portion of the wall's orthoimage, different training processes were run using different image scales to identify the solution that best fit this case study, taking into account the classification aims.At a 1:10 scale, the results present an over segmentation.Using a 1:50 scale, many details were lost, identifying only some macro-areas.The scale 1:20 (normally used for restoration purposes as it allows to distinguish bricks) turned out to be the optimal choice.It allowed the capture of the details but could not consider the cracks of the mortar between the bricks (Figure 8d).Given the manually selected training classes (seven classes), different classifiers were trained and evaluated.Table 1 reports the overall accuracy results for all tested classifiers run on the orthoimage at scale 1:20.Moreover, we report the time elapsed for each algorithm, considering that creating the classes and the training data took around 10 min and the feature stack array required 14 min.Out of all the tests performed with the different algorithms, the best overall accuracy obtained was 70% using a random forest classifier.To better identify the classification errors, a confusion matrix was used (Table 2).From the table analysis, it was possible to understand that most errors in classification were in those classes where an overlap of plaster was present on the surface of the opus.However, it is believed that an expert should not consider the accuracy percentage absolute without previous verification.Comparing the segmentation handled by the operator and by the algorithm, it was found that the supervised method allowed the identification of more details and differences in the material's composition.In fact, it could not only distinguish the classes, but also identify the presence of plaster above the wall surface.This is an important advantage for the degradation analysis.At a 1:10 scale, the results present an over segmentation.Using a 1:50 scale, many details were lost, identifying only some macro-areas.The scale 1:20 (normally used for restoration purposes as it allows to distinguish bricks) turned out to be the optimal choice.It allowed the capture of the details but could not consider the cracks of the mortar between the bricks (Figure 8d).Given the manually selected training classes (seven classes), different classifiers were trained and evaluated.Table 1 reports the overall accuracy results for all tested classifiers run on the orthoimage at scale 1:20.Moreover, we report the time elapsed for each algorithm, considering that creating the classes and the training data took around 10 min and the feature stack array required 14 min.
Out of all the tests performed with the different algorithms, the best overall accuracy obtained was 70% using a random forest classifier.To better identify the classification errors, a confusion matrix was used (Table 2).From the table analysis, it was possible to understand that most errors in classification were in those classes where an overlap of plaster was present on the surface of the opus.However, it is believed that an expert should not consider the accuracy percentage absolute without previous verification.Comparing the segmentation handled by the operator and by the algorithm, it was found that the supervised method allowed the identification of more details and differences in the material's composition.In fact, it could not only distinguish the classes, but also identify the presence of plaster above the wall surface.This is an important advantage for the degradation analysis.Starting from this result the training dataset was applied to a larger part of the wall (Figure 9b).To classify 540 m 2 of surface the process took about 1 h.Considering that the operator took 4 h just for classifying a smaller part (24 m 2 ), the supervised technique could obtain a more accurate result in a shorter time.The classification results can also be used to create the most commonly requested map for restoration purpose, with dedicated symbols/legend (Figure 9d).

Bologna's Porticoes
The historical porticoes of Bologna were built during the 11th-20th centuries and can be regarded as unique from an architectural viewpoint in terms of their authenticity and integrity.Thanks to their great extension, permanence, use and history, the porticoes of Bologna are considered of outstanding universal value.They span approximately 40 km, mainly in the historic city center of Bologna, and they represent a high-quality architectural work.

PREDICTED LABEL
Starting from this result the training dataset was applied to a larger part of the wall (Figure 9b).To classify 540 m 2 of surface the process took about 1 h.Considering that the operator took 4 h just for classifying a smaller part (24 m 2 ), the supervised technique could obtain a more accurate result in a shorter time.The classification results can also be used to create the most commonly requested map for restoration purpose, with dedicated symbols/legend (Figure 9d).

Bologna's Porticoes
The historical porticoes of Bologna were built during the 11th-20th centuries and can be regarded as unique from an architectural viewpoint in terms of their authenticity and integrity.Thanks to their great extension, permanence, use and history, the porticoes of Bologna are considered of outstanding universal value.They span approximately 40 km, mainly in the historic city center of Bologna, and they represent a high-quality architectural work.Starting from this result the training dataset was applied to a larger part of the wall (Figure 9b).To classify 540 m 2 of surface the process took about 1 h.Considering that the operator took 4 h just for classifying a smaller part (24 m 2 ), the supervised technique could obtain a more accurate result in a shorter time.The classification results can also be used to create the most commonly requested map for restoration purpose, with dedicated symbols/legend (Figure 9d).

PREDICTED LABEL
Starting from this result the training dataset was applied to a larger part of the wall (Figure 9b).To classify 540 m 2 of surface the process took about 1 h.Considering that the operator took 4 h just for classifying a smaller part (24 m 2 ), the supervised technique could obtain a more accurate result in a shorter time.The classification results can also be used to create the most commonly requested map for restoration purpose, with dedicated symbols/legend (Figure 9d).

Bologna's Porticoes
The historical porticoes of Bologna were built during the 11th-20th centuries and can be regarded as unique from an architectural viewpoint in terms of their authenticity and integrity.Thanks to their great extension, permanence, use and history, the porticoes of Bologna are considered of outstanding universal value.They span approximately 40 km, mainly in the historic city center of Bologna, and they represent a high-quality architectural work.

Bologna's Porticoes
The historical porticoes of Bologna were built during the 11th-20th centuries and can be regarded as unique from an architectural viewpoint in terms of their authenticity and integrity.Thanks to their great extension, permanence, use and history, the porticoes of Bologna are considered of outstanding universal value.They span approximately 40 km, mainly in the historic city center of Bologna, and they represent a high-quality architectural work.
Such structures combine variegated geometric shapes, different materials and many architectural details such as moldings and ornaments.According to the different classification requirements, the aim of the task could be the identification of:

Classification of 2D Data
Starting from the available 3D data of the porticoes [98], the texture of the 3D digital porticoes was unwrapped (Figure 10) and used to manually identify training patches and classes (10).The results (Figure 11), based on fast random forest model/classifier, show many classification errors under the porticoes, where the plaster is not homogeneous and presents different decays.In this case, a solution might be to create many different classes according to the number of decay categories or to apply, as a post processing phase, algorithms to make more uniform the areas with small spots.Such structures combine variegated geometric shapes, different materials and many architectural details such as moldings and ornaments.According to the different classification requirements, the aim of the task could be the identification of: • different architectural elements; • diverse materials (bricks vs. stones vs. marble); and • categories of decay (cracks vs. humidity vs. swelling).

Classification of 2D data
Starting from the available 3D data of the porticoes [98], the texture of the 3D digital porticoes was unwrapped (Figure 10) and used to manually identify training patches and classes (10).The results (Figure 11), based on fast random forest model/classifier, show many classification errors under the porticoes, where the plaster is not homogeneous and presents different decays.In this case, a solution might be to create many different classes according to the number of decay categories or to apply, as a post processing phase, algorithms to make more uniform the areas with small spots.Such structures combine variegated geometric shapes, different materials and many architectural details such as moldings and ornaments.According to the different classification requirements, the aim of the task could be the identification of: • different architectural elements; • diverse materials (bricks vs. stones vs. marble); and • categories of decay (cracks vs. humidity vs. swelling).

of 2D data
Starting from the available 3D data of the porticoes [98], the texture of the 3D digital porticoes was unwrapped (Figure 10) and used to manually identify training patches and classes (10).The results (Figure 11), based on fast random forest model/classifier, show many classification errors under the porticoes, where the plaster is not homogeneous and presents different decays.In this case, a solution might be to create many different classes according to the number of decay categories or to apply, as a post processing phase, algorithms to make more uniform the areas with small spots.

Classification of 3D Data
Considering the classification errors of the facades, due to the heterogeneous surfaces and the presence of different decays, a supervised classification approach was performed based on the Computational Geometry Algorithms Library (CGAL) [99] and the Random Forest Template Library (ETHZ Random Forest, 2018) [100].The classification consists of three steps: feature computation, model training and prediction.To characterize the model, a combination of features considering both geometry and color factors was selected: distance to plane, eigenvalues of the neighborhood, elevation, verticality and HSV channels.The training dataset of correct labels was manually annotated on small portions of the entire 3D model.To conclude that phase, a classifier was defined and trained using random forest: from the set of values taken by the features at an input item, it measured the likelihood of this item belonging to one label or another.Finally, once the classifier was generated, the prediction process was performed on the entire point cloud (Figure 12).

Classification of 3D Data
Considering the classification errors of the facades, due to the heterogeneous surfaces and the presence of different decays, a supervised classification approach was performed based on the Computational Geometry Algorithms Library (CGAL) [99] and the Random Forest Template Library (ETHZ Random Forest, 2018) [100].The classification consists of three steps: feature computation, model training and prediction.To characterize the model, a combination of features considering both geometry and color factors was selected: distance to plane, eigenvalues of the neighborhood, elevation, verticality and HSV channels.The training dataset of correct labels was manually annotated on small portions of the entire 3D model.To conclude that phase, a classifier was defined and trained using random forest: from the set of values taken by the features at an input item, it measured the likelihood of this item belonging to one label or another.Finally, once the classifier was generated, the prediction process was performed on the entire point cloud (Figure 12).As shown in the figure, the classification problems on the walls were solved, but there were some false positive due to the classification of drain pipes as columns (vertical pipes) and vaults (horizontal pipes).However, it was not considered a classification error, but an error in the training phase, as a class was not assigned to these features.
Considering the results obtained working onto the 3D models and the suitability of the case study (40 km of similar buildings), as a future works the authors aim to extend the classification to a bigger portion of the porticoes of Bologna.

The Etruscan Sarcophagus of the Spouses
The Etruscan masterpiece "Sarcofago degli Sposi" was found in 1881 in the Banditaccia necropolis in Tarquinia (Italy).The remains were found broken into more than 400 pieces.The sarcophagus was then reassembled and joined using a mimetic cement to fill the gaps among the different pieces.In 2013, digital acquisitions and 3D modeling of the sarcophagus, based on different technologies (photogrammetry, TOF and triangulation-based laser scanning) were conducted to deliver highly detailed photo-realistic 3D representation of the Etruscan masterpiece for successive multimedia purposes [101].Using the high-resolution photogrammetric 3D model (5 million triangles), the segmentation task aimed to detect the surface anomalies (fractures and decays) and to test the reliability of the method on a heritage objects with a more complex topology and few chromatic differentiations on the texture.For the training set, three main categories, and two accessories ones, were identified (Figure 13a).As shown in the figure, the classification problems on the walls were solved, but there were some false positive due to the classification of drain pipes as columns (vertical pipes) and vaults (horizontal pipes).However, it was not considered a classification error, but an error in the training phase, as a class was not assigned to these features.
Considering the results obtained working onto the 3D models and the suitability of the case study (40 km of similar buildings), as a future works the authors aim to extend the classification to a bigger portion of the porticoes of Bologna.

The Etruscan Sarcophagus of the Spouses
The Etruscan masterpiece "Sarcofago degli Sposi" was found in 1881 in the Banditaccia necropolis in Tarquinia (Italy).The remains were found broken into more than 400 pieces.The sarcophagus was then reassembled and joined using a mimetic cement to fill the gaps among the different pieces.In 2013, digital acquisitions and 3D modeling of the sarcophagus, based on different technologies (photogrammetry, TOF and triangulation-based laser scanning) were conducted to deliver highly detailed photo-realistic 3D representation of the Etruscan masterpiece for successive multimedia purposes [101].Using the high-resolution photogrammetric 3D model (5 million triangles), the segmentation task aimed to detect the surface anomalies (fractures and decays) and to test the reliability of the method on a heritage objects with a more complex topology and few chromatic differentiations on the texture.For the training set, three main categories, and two accessories ones, were identified (Figure 13a).The manual identification of the necessary patches took about 15 min and was accomplished with the support of restoration experts.The sustaining legs of the sarcophagus were excluded from the classification, as they are the only parts where pigment decorations are clearly visible, thus their analysis was outside the segmentation scope.After the patch identification, the model took some 2 h of processing to classify the entire texture (Figure 13b), which was then mapped onto the available 3D geometry (Figures 14 and 15).The segmented 3D model highlighted every single detail of the assembly; fractures were distinguished from engraving; and the different grades of conservation were also identified.The classification output also allowed calculating the percentage that each label occupied.From the results, we have that the 12% of the entire surface of the object (i.e.3D model) is composed by mimetic cement.As the overall surface of the 3D model is 6.8 m 2 , it means that approximately 0.8 m 2 are reconstructed parts.The manual identification of the necessary patches took about 15 min and was accomplished with the support of restoration experts.The sustaining legs of the sarcophagus were excluded from the classification, as they are the only parts where pigment decorations are clearly visible, thus their analysis was outside the segmentation scope.After the patch identification, the model took some 2 h of processing to classify the entire texture (Figure 13b), which was then mapped onto the available 3D geometry (Figures 14 and 15).The segmented 3D model highlighted every single detail of the masterpiece assembly; fractures were distinguished from engraving; and the different grades of conservation were also identified.The classification output also allowed calculating the percentage that each label occupied.From the results, we have that the 12% of the entire surface of the object (i.e.3D model) is composed by mimetic cement.As the overall surface of the 3D model is 6.8 m 2 , it means that approximately 0.8 m 2 are reconstructed parts.The manual identification of the necessary patches took about 15 min and was accomplished with the support of restoration experts.The sustaining legs of the sarcophagus were excluded from the classification, as they are the only parts where pigment decorations are clearly visible, thus their analysis was outside the segmentation scope.After the patch identification, the model took some 2 h of processing to classify the entire texture (Figure 13b), which was then mapped onto the available 3D geometry (Figures 14 and 15).The segmented 3D model highlighted every single detail of the masterpiece assembly; fractures were distinguished from engraving; and the different grades of conservation were also identified.The classification output also allowed calculating the percentage that each label occupied.From the results, we have that the 12% of the entire surface of the object (i.e.3D model) is composed by mimetic cement.As the overall surface of the 3D model is 6.8 m 2 , it means that approximately 0.8 m 2 are reconstructed parts.

The Etruscan Bartoccini's Tomb in Tarquinia, Italy
Tarquinia was one of the most ancient cities of the Etruscan civilization.The necropolis, situated in the of Monterozzi and Calvario, is composed of some 6000 tombs, 60 of which are decorated with paintings.The Bartoccini tomb, dated to around the 4th century B.C.; was discovered in 1959.Combined TOF scanning and panoramic photographic surveys were carried out to obtain the complete 3D model (3 million triangles) [102].The TOF range data were used to derive the geometry of the tomb, while the panoramic image was used to derive the photo realistic high-resolution texture.In-house developed algorithms were used to project the panoramic image onto the 3D geometry and then extract a high-resolution texture.As over the centuries, the tomb has suffered from erosion caused by various reasons such as infiltration, seasoning, aging, etc.The aim of the segmentation was the automated identification of the deteriorated surfaces' area on the painted walls.To reach this goal, different strategies were tested, on both 2D (Section 5.4.1) and 3D (Section 5.4.2) data.Quantification analyses are presented in Section 5.4.3.

Classification of 2D Data
Given the texture of the tomb, a clustering process was chosen instead of manually training various classes.K-means clustering (Section 4.3) was performed to generate a pixel-based segmentation of the panoramic images.To optimize the work time in a trial phase, only one wall was analyzed, was then the segmentation was extended to the whole model.To avoid segmentation errors, and thus achieve better results, the image was transformed from RGB to Lab* color space (Figure 16b).The obtained results (Figure 16d) were compared with the ground truth (manually segmented) data (Figure 16c), achieving an overall accuracy of 91.15%.The clustering method was then applied to the entire panoramic texture of the tomb's room (Figure 17) and finally mapped onto the 3D geometry (Figure 18).Combined TOF scanning and panoramic photographic surveys were carried out to obtain the complete 3D model (3 million triangles) [102].The TOF range data were used to derive the geometry of the tomb, while the panoramic image was used to derive the photo realistic high-resolution texture.In-house developed algorithms were used to project the panoramic image onto the 3D geometry and then extract a high-resolution texture.As over the centuries, the tomb has suffered from erosion caused by various reasons such as infiltration, seasoning, aging, etc.The aim of the segmentation was the automated identification of the deteriorated surfaces' area on the painted walls.To reach this goal, different strategies were tested, on both 2D (Section 5.4.1) and 3D (Section 5.4.2) data.Quantification analyses are presented in Section 5.4.3.

Classification of 2D Data
Given the texture of the tomb, a clustering process was chosen instead of manually training various classes.K-means clustering (Section 4.3) was performed to generate a pixel-based segmentation of the panoramic images.To optimize the work time in a trial phase, only one wall was analyzed, was then the segmentation was extended to the whole model.To avoid segmentation errors, and thus achieve better results, the image was transformed from RGB to Lab* color space (Figure 16b).The obtained results (Figure 16d) were compared with the ground truth (manually segmented) data (Figure 16c), achieving an overall accuracy of 91.15%.The clustering method was then applied to the entire panoramic texture of the tomb's room (Figure 17) and finally mapped onto the 3D geometry (Figure 18).

Classification of 3D data
From the 3D geometry of a tomb's wall, a plane fitting procedure allowed extracting a depth map of the wall and identifying the eroded surfaces (Figure 19b), i.e. those below the ideal fitted surface.Although only the damaged areas below a certain depth variation threshold were identified, the volume of the eroded wall could still be calculated (Table 3).Consequently, as for the porticoes case study, a supervised classification approach was achieved.The correct labels (three classes) were annotated in small, well-distributed portions, and then the prediction process was performed on the entire 3D model (Figure 20).

Classification of 3D data
From the 3D geometry of a tomb's wall, a plane fitting procedure allowed extracting a depth map of the wall and identifying the eroded surfaces (Figure 19b), i.e. those below the ideal fitted surface.Although only the damaged areas below a certain depth variation threshold were identified, the volume of the eroded wall could still be calculated (Table 3).

Classification of 3D data
From the 3D geometry of a tomb's wall, a plane fitting procedure allowed extracting a depth map of the wall and identifying the eroded surfaces (Figure 19b), i.e. those below the ideal fitted surface.Although only the damaged areas below a certain depth variation threshold were identified, the volume of the eroded wall could still be calculated (Table 3).Consequently, as for the porticoes case study, a supervised classification approach was achieved.The correct labels (three classes) were annotated in small, well-distributed portions, and then the prediction process was performed on the entire 3D model (Figure 20).

Conclusions and Future Works
This paper presents a pipeline to classify 3D heritage data, either working on the texture or directly on the 3D geometry, depending on the needs and scope of the classification.With the proposed methods, archaeologists, restorers, conservator and scientists can automatically annotate 2D textures of heritage objects and visualize them onto 3D geometries for a better understanding.

Conclusions and Future Works
This paper presents a pipeline to classify 3D heritage data, either working on the texture or directly on the 3D geometry, depending on the needs and scope of the classification.With the proposed methods, archaeologists, restorers, conservator and scientists can automatically annotate 2D textures of heritage objects and visualize them onto 3D geometries for a better understanding.The existence of a vast variety of building techniques and the usage of diverse ornamental elements

Conclusions and Future Works
This paper presents a pipeline to classify 3D heritage data, either working on the texture or directly on the 3D geometry, depending on the needs and scope of the classification.With the proposed methods, archaeologists, restorers, conservator and scientists can automatically annotate 2D textures of heritage objects and visualize them onto 3D geometries for a better understanding.The existence of a vast variety of building techniques and the usage of diverse ornamental elements

Conclusions and Future Works
This paper presents a pipeline to classify 3D heritage data, either working on the texture or directly on the 3D geometry, depending on the needs and scope of the classification.With the proposed methods, archaeologists, restorers, conservator and scientists can automatically annotate 2D textures of heritage objects and visualize them onto 3D geometries for a better understanding.The existence of a vast variety of building techniques and the usage of diverse ornamental elements introduces an extra barrier in generalizing segmentation techniques to heritage case studies.Consequently, as for the porticoes case study, a supervised classification approach was achieved.The correct labels (three classes) were annotated in small, well-distributed portions, and then the prediction process was performed on the entire 3D model (Figure 20).

Quantification Analyses
For all the aforementioned experiments, the identified eroded surfaces were estimated as a percentage of the total area of interest.In particular, for the approaches based on 2D images, the percentage was calculated as a comparison between the number of pixels classified as eroded and the total number of pixels in the segmented image.On the other hand, in the case 3D data, the percentage resulted as the number of points belonging to the deteriorated areas over the total number of 3D points.Table 3 reports the eroded areas computed for one wall of the tomb (4.9 m 2 ), with both percentages and square meter areas shown.This type of metric results could be beneficial for monitoring and restoration purposes.

Quantification Analyses
For all the aforementioned experiments, the identified eroded surfaces were estimated as a percentage of the total area of interest.In particular, for the approaches based on 2D images, the percentage was calculated as a comparison between the number of pixels classified as eroded and the total number of pixels in the segmented image.On the other hand, in the case 3D data, the percentage resulted as the number of points belonging to the deteriorated areas over the total number of 3D points.Table 3 reports the eroded areas computed for one wall of the tomb (4.9 m 2 ), with both percentages and square meter areas shown.This type of metric results could be beneficial for monitoring and restoration purposes.

Conclusions and Future Works
This paper presents a pipeline to classify 3D heritage data, either working on the texture or directly on the 3D geometry, depending on the needs and scope of the classification.With the proposed methods, archaeologists, restorers, conservator and scientists can automatically annotate 2D textures of heritage objects and visualize them onto 3D geometries for a better understanding.The existence of a vast variety of building techniques and the usage of diverse ornamental elements introduces an extra barrier in generalizing segmentation techniques to heritage case studies.Moreover, a monument can be subject to different types of degradation depending on its exposure under various conditions, hence increasing the efficiency of the classification tasks.A machine learning-based approach becomes beneficial for speeding up classification tasks on large and complex scenarios, provided that the training datasets are sufficiently large and diverse.
The advantages of the proposed method are: • shorter time to classify objects with respect to manual methods (Table 1); • over-segmentation results useful for restoration purposes to detect small cracks or deteriorated parts; • replicability of the training set for buildings of the same historical period or with similar construction material (e.g.roman walls); • visualization of classification results onto 3D models from different points of view, using unwrapped textures; • possibility to compute absolute and relative areas of each class (Table 3), useful for analysis and restoration purposes; and • applicability of the pipeline to different kinds of heritage buildings, monuments or any other kind of 3D models.
On the other hand, lesson learned and open critical issues can be summarized as: • Difficult identification of the classes of analysis case by case (e.g.problems with the drainpipes classified as columns): the choice of the right classes during the training phase becomes fundamental.

•
Misinterpretation of the shadows can introduce errors in the classification: the use of different color spaces from the classic RGB one, e.g.HSV and Lab, makes the lighting differences less problematic during the segmentation phase.

•
Over-segmentation results in many classes, commonly useless in semantic analysis: this implies the need to make the regions more uniform in a post processing phase.
For future works, the authors aim to work across different heritage buildings, improving the generalization of the classification of some basic classes (e.g.windows, doors, and columns).To do that, it will be necessary to increase the number of labeled images and exploit more complex machine learning algorithms, in particular deep neural networks.
We will also tackle the objective of increasing the homogeneity of the segmentation to minimize and ideally avoid any post-processing phase.
Finally, we will work on new case studies, applying both techniques presented (from 2D to 3D and directly on 3D), to better understand the advantages of each method with respect to the other.

Figure 1 .
Figure 1.Example of a segmented and classified point cloud.

Figure 1 .
Figure 1.Example of a segmented and classified point cloud.

Figure 3 .
Figure 3. Synthetic representation of the segmentation and classification methods.The 3D model represents the classification of the archaeological elements of the Neptune temple in Paestum [26].

Figure 3 .
Figure 3. Synthetic representation of the segmentation and classification methods.The 3D model represents the classification of the archaeological elements of the Neptune temple in Paestum [26].

Figure 4 .
Figure 4. Segmentation of 3D point cloud by geometric primitive fitting.

Figure 4 .
Figure 4. Segmentation of 3D point cloud by geometric primitive fitting.

Figure 5 .
Figure 5. Framework for 3D scene analysis: a 3D point serves as input and the output consists of a semantically labeled 3D point cloud [18].

Figure 5 .
Figure 5. Framework for 3D scene analysis: a 3D point serves as input and the output consists of a semantically labeled 3D point cloud [18].

Figure 6 .
Figure 6.Schematic representation of the developed segmentation and classification methodology: 3D model of a portion of Circus Maximus Cavea in Rome, Italy, (a); 3D model after re-meshing (b); UV map (c); manually identified training areas on the unwrapped texture (d); supervised classification results (e); and re-projection of the classification results onto the 3D model (f).

Figure 6 .
Figure 6.Schematic representation of the developed segmentation and classification methodology: 3D model of a portion of Circus Maximus Cavea in Rome, Italy, (a); 3D model after re-meshing (b); UV map (c); manually identified training areas on the unwrapped texture (d); supervised classification results (e); and re-projection of the classification results onto the 3D model (f).

Figure 7 .
Figure 7.The case studies of the work to validate the semantic classification for analyses and restoration purposes: Pecile's wall of Villa Adriana in Tivoli, Italy (a); Renaissance building in Bologna, Italy (b); Sarcophagus of the Spouses, National Etruscan museum of Villa Giulia in Rome, Italy (c); Etruscan tomb in Tarquinia, Italy (d).

Figure 7 .
Figure 7.The case studies of the work to validate the semantic classification for analyses and restoration purposes: Pecile's wall of Villa Adriana in Tivoli, Italy (a); Renaissance building in Bologna, Italy (b); Sarcophagus of the Spouses, National Etruscan museum of Villa Giulia in Rome, Italy (c); Etruscan tomb in Tarquinia, Italy (d).

Figure 9 .
Figure 9.The original (a); and classified (b) model of the Pecile's wall long ca 60 m.A closer view is also reported to better show the classification results with random colors (c); or dedicated symbols (d).

Figure 9 .
Figure 9.The original (a); and classified (b) model of the Pecile's wall long ca 60 m.A closer view is also reported to better show the classification results with random colors (c); or dedicated symbols (d).

Figure 9 .
Figure 9.The original (a); and classified (b) model of the Pecile's wall long ca 60 m.A closer view is also reported to better show the classification results with random colors (c); or dedicated symbols (d).

Figure 9 .
Figure 9.The original (a); and classified (b) model of the Pecile's wall long ca 60 m.A closer view is also reported to better show the classification results with random colors (c); or dedicated symbols (d).

Figure 10 .
Figure 10.Manually identified training areas on the unwrapped texture of the porticoes portion (a); and classification results based on the selected 10 classes (b).

Figure 10 .
Figure 10.Manually identified training areas on the unwrapped texture of the porticoes portion (a); and classification results based on the selected 10 classes (b).

Figure 10 .
Figure 10.Manually identified training areas on the unwrapped texture of the porticoes portion (a); and classification results based on the selected 10 classes (b).

Figure 11 .
Figure 11.3D model (a); and from 2D to 3D classification results of historical porticoes in Bologna (b).

24 Figure 11 .
Figure 11.3D model (a); and from 2D to 3D classification results of historical porticoes in Bologna (b).

Figure 12 .
Figure 12. 3D model and classification results of historical porticoes in Bologna (b).

Figure 13 .
Figure 13.Manually identified training areas on the unwrapped texture of the sarcophagus (a); and related classification result on the texture (b).

Figure 14 .
Figure 14.Texturized (a) and segmented (b) 3D model of the Sarcophagus of the Spouses.

Figure 13 .
Figure 13.Manually identified training areas on the unwrapped texture of the sarcophagus (a); and related classification result on the texture (b).

24 Figure 13 .
Figure 13.Manually identified training areas on the unwrapped texture of the sarcophagus (a); and related classification result on the texture (b).

Figure 14 .
Figure 14.Texturized (a) and segmented (b) 3D model of the Sarcophagus of the Spouses.

Figure 14 .
Figure 14.Texturized (a) and segmented (b) 3D model of the Sarcophagus of the Spouses.

Figure 15 .
Figure 15.Mimetic cement areas of the sarcophagus of the Spouses highlighted in red (ca.12% of the model).

Figure 15 .
Figure 15.Mimetic cement areas of the sarcophagus of the Spouses highlighted in red (ca.12% of the model).

5. 4 .
The Etruscan Bartoccini's Tomb in Tarquinia, Italy Tarquinia was one of the most ancient cities of the Etruscan civilization.The necropolis, situated in the areas of Monterozzi and Calvario, is composed of some 6000 tombs, 60 of which are decorated with paintings.The Bartoccini tomb, dated to around the 4th century B.C.; was discovered in 1959.

Figure 16 .
Figure 16.Texture of a wall of the Bartoccini's tomb with RGB (a) and Lab* colors (b); ground truth of eroded areas (c); clustering results (d); and marking (pink color) of not identified eroded areas by the automatic clustering method (e).

Figure 17 .
Figure 17.Panoramic image of one room of the tomb Lab* color space (b); and automatically identified eroded areas by clustering segmentation (c).

Figure 16 . 24 Figure 16 .
Figure 16.Texture of a wall of the Bartoccini's tomb with RGB (a) and Lab* colors (b); ground truth of eroded areas (c); clustering results (d); and marking (pink color) of not identified eroded areas by the automatic clustering method (e).

Figure 17 .
Figure 17.Panoramic image of one room of the tomb (a); Lab* color space (b); and automatically identified eroded areas by clustering segmentation (c).

Figure 17 .
Figure 17.Panoramic image of one room of the tomb (a); Lab* color space (b); and automatically identified eroded areas by clustering segmentation (c).

Figure 18 .
Figure 18.Section of the tomb (a); and visualization of the 2D segmentation results mapped on the 3D model (b).

Figure 18 .
Figure 18.Section of the tomb (a); and visualization of the 2D segmentation results mapped on the 3D model (b).

24 Figure 18 .
Figure 18.Section of the tomb (a); and visualization of the 2D segmentation results mapped on the 3D model (b).

Table 1 .
Accuracy results and elapsed time for various classifier applied to an orthoimage at 1:20 scale.

Table 2 .
Normalized confusion matrix to analyze the results of the supervised classification of a portion of Pecile's wall at scale 1:20.On the left of the table are reported precision, recall and F1 calculated for each category.

Table 1 .
Accuracy results and elapsed time for various classifier applied to an orthoimage at 1:20 scale.

Table 2 .
Normalized confusion matrix to analyze the results of the supervised classification of a portion of Pecile's wall at scale 1:20.On the left of the table are reported precision, recall and F1 calculated for each category.

Table 3 .
Computed eroded areas (and volumes) with the aforementioned approaches.

Classifier % Eroded Surfaces Area/Volume Time for Elaborations
Remote Sens. 2019, 11, x FOR PEER REVIEW 19 of 24

Table 3 .
Computed eroded areas (and volumes) with the aforementioned approaches.

Table 3 .
Computed eroded areas (and volumes) with the aforementioned approaches.

Table 3 .
Computed eroded areas (and volumes) with the aforementioned approaches.

Table 3 .
Computed eroded areas (and volumes) with the aforementioned approaches.