A Hierarchical Machine Learning Approach for Multi-Level and Multi-Resolution 3D Point Cloud Classiﬁcation

: The recent years saw an extensive use of 3D point cloud data for heritage documentation, valorisation and visualisation. Although rich in metric quality, these 3D data lack structured information such as semantics and hierarchy between parts. In this context, the introduction of point cloud classiﬁcation methods can play an essential role for better data usage, model deﬁnition, analysis and conservation. The paper aims to extend a machine learning (ML) classiﬁcation method with a multi-level and multi-resolution (MLMR) approach. The proposed MLMR approach improves the learning process and optimises 3D classiﬁcation results through a hierarchical concept. The MLMR procedure is tested and evaluated on two large-scale and complex datasets: the Pomposa Abbey (Italy) and the Milan Cathedral (Italy). Classiﬁcation results show the reliability and replicability of the developed method, allowing the identiﬁcation of the necessary architectural classes at each geometric resolution.


Introduction
In the last 20 years, various research activities, based on active and passive sensors, provided reliable methodologies for the acquisition and generation of dense point clouds and textured 3D models of heritage structures [1,2].These 3D data are typically used for accurate documentation, digital preservation, and visualisation [3][4][5][6].Nowadays, the amount of available 3D datasets has exponentially increased but, at the same time, it is becoming very important to enrich point clouds with semantic information of the digitised objects [7,8].The association of semantic meaning to geometric data, using machine or deep learning (ML/DL) methods, leads to a simplification in 3D reading, also accelerating data management and interpretation.In general, 3D point clouds are coupled with RGB colours, intensity and other information depending on the used acquisition instrument and technique.If, on one hand, this kind of data benefits from accurate metric information, on the other hand it lacks structured information such as semantics and hierarchy between parts.For this reason, the introduction of semantic meaning, employing automated classification procedures, can play an important role in the data usage and analysis, useful, e.g., for heritage understanding, restoration, valorisation, etc.
Apart from being time consuming, in most cases, the introduction of manual intervention for subdividing the datasets brings a certain degree of subjectivity.In recent years, relevant progress has come in automatic classification processes by using Artificial Intelligence methods (ML and DL) that, in contrast, are objective and replicable.
In supervised ML, the algorithms take some manually annotated part of the datasets and hand-crafted features (i.e., geometric and/or radiometric attributes) as inputs, from which they learn patterns that are then predicted to the whole dataset.On the other hand, DL refers to those processes (neural networks-NN) that directly learn features and semantics from a large quantity of annotated data, which is generally not available in the heritage sector.To cope with this problem, the research proposed by [32] aims to facilitate the annotation process necessary to train DL algorithms.The authors, through a series of rule-based functions, isolate some specific architectural classes within the point cloud, such as columns and beams.
More specifically to the heritage field, a supervised learning approach which transfers the classification information from 2D textures to 3D models is proposed in [21].Grilli et al. [33] presented a classification approach that works directly on point clouds, training a Random Forest (RF) classifier with geometric features.The method iteratively extracts the most relevant features considering a set of geometric characteristics strictly related to the architectural element dimensions.The same author has then verified the possibility to generalise the classification model across different architectural scenarios in [34].
A popular DL approach relies on convolutional neural networks (CNN) that recombine sets of neurons in different layers, in order to process a set of input vectors to a known set of outputs [35,36].PointNet [37], and its later improvement PointNet++ [38], is a unified architecture that learns both global and local point features and is suitable to perform classification, part segmentation, and semantic scene segmentation.Results are encouraging, featuring an overall accuracy of around 90%, but the selected classes and training sets are constituted of simple objects with replicated shapes (mug, plane, table, car, etc.).This latter point proved to be critical when applying CNN to CH point clouds.
Due to the inherent complexity and uniqueness that each CH object has, it is really complicated to have a well-distributed and representative heritage training set.Nonetheless, some application exists, built for specific case studies.Malinverni et al. [5] proposed a DL approach based on PointNet++ neural network, which has been re-trained on data coming from a real survey.The dataset needs to be manually segmented by domain experts and must be broad enough to comprise enough classes for each case study, which in the CH context is quite tricky.Recently, a Dynamic Graph Convolutional Neural Network (DGCNN) supported by meaningful features (normal and HSV colours) has been employed in [39].The DGCNN has been trained on the ArCH dataset [40], which includes 10 manually labelled point clouds subdivided into 11 classes.The resulting model has then been tested in two different ways-over a partially labelled dataset and on an unseen scene-providing promising results.
The first attempt of comparison between machine and deep learning approaches for heritage classification has been presented in [41].The authors demonstrated that, for some specific case studies, the machine learning approaches could achieve better results in shorter times.

Milan Cathedral
Milan Cathedral (Figure 1) is one of the most important monumental heritages in Italy.It is a late Gothic Cathedral, whose construction began in 1386 and finished in 1805, with some final details completed in 1965.It is the largest church in Italy, the third largest in Europe and the fifth in the world.The external length of the Cathedral is 158 m, the transept is 93 m long, and the maximum height of the Cathedral (from the internal floor to the head of the Madonnina) is 108.50 m.Overall it covers an area of approximately 12,000 m 2 and a gross volume of 440,000 m 3 .As with all gothic cathedrals, it is very rich in decorations, counting in total 3400 statues, 135 gargoyles and 700 figures that decorate the internal spaces as capitals, altars and windows.On the external facades, there are 135 spires and 30 decorative reverse arches that seem to support the flanks of the impressive building.
Remote Sens. 2020, 12, x; doi: FOR PEER REVIEW 3 of 30 A popular DL approach relies on convolutional neural networks (CNN) that recombine sets of neurons in different layers, in order to process a set of input vectors to a known set of outputs [35,36].PointNet [37], and its later improvement PointNet++ [38], is a unified architecture that learns both global and local point features and is suitable to perform classification, part segmentation, and semantic scene segmentation.Results are encouraging, featuring an overall accuracy of around 90%, but the selected classes and training sets are constituted of simple objects with replicated shapes (mug, plane, table, car, etc.).This latter point proved to be critical when applying CNN to CH point clouds.
Due to the inherent complexity and uniqueness that each CH object has, it is really complicated to have a well-distributed and representative heritage training set.Nonetheless, some application exists, built for specific case studies.Malinverni et al. [5] proposed a DL approach based on PointNet++ neural network, which has been re-trained on data coming from a real survey.The dataset needs to be manually segmented by domain experts and must be broad enough to comprise enough classes for each case study, which in the CH context is quite tricky.Recently, a Dynamic Graph Convolutional Neural Network (DGCNN) supported by meaningful features (normal and HSV colours) has been employed in [39].The DGCNN has been trained on the ArCH dataset [40], which includes 10 manually labelled point clouds subdivided into 11 classes.The resulting model has then been tested in two different ways-over a partially labelled dataset and on an unseen sceneproviding promising results.
The first attempt of comparison between machine and deep learning approaches for heritage classification has been presented in [41].The authors demonstrated that, for some specific case studies, the machine learning approaches could achieve better results in shorter times.

Milan Cathedral
Milan Cathedral (Figure 1) is one of the most important monumental heritages in Italy.It is a late Gothic Cathedral, whose construction began in 1386 and finished in 1805, with some final details completed in 1965.It is the largest church in Italy, the third largest in Europe and the fifth in the world.The external length of the Cathedral is 158 m, the transept is 93 m long, and the maximum height of the Cathedral (from the internal floor to the head of the Madonnina) is 108.50 m.Overall it covers an area of approximately 12,000 m 2 and a gross volume of 440,000 m 3 .As with all gothic cathedrals, it is very rich in decorations, counting in total 3400 statues, 135 gargoyles and 700 figures that decorate the internal spaces as capitals, altars and windows.On the external facades, there are 135 spires and 30 decorative reverse arches that seem to support the flanks of the impressive building.The Cathedral is built with bricks and covered with Candoglia marble.The highest parts, where the structures are thinner and lighter, are directly made of marble.Marble is the most critical aspect that affects the lifetime of the Cathedral and its preservation state as it degrades very quickly due to the mineralogical composition.Therefore, many parts of the Cathedral, especially the external ones, must be replaced periodically, which implies continuous and non-stop maintenance work.The institution responsible for these activities is the Veneranda Fabbrica del Duomo di Milano that for more than 630 years has been in charge of the ordinary maintenance of the structures, including cleaning, vegetation removal, and scheduled inspections in order to guarantee the safety of the hanging structures.

3D Survey
To support the exceptional maintenance activities, a number of focussed survey operations were conducted in the last 10 years, producing 3D data and 2D classical representation at a 1:50 scale for the Main Spire, the altars of the transept, and the Dome Cladding [42].More recently, detailed point clouds of the entire Cathedral at an average and uniform resolution of 5 mm (Figure 2 and Table 1) were produced using Terrestrial Laser Scanning (TLS) for the interior spaces [43], photogrammetry for the exteriors [44] and integrating both techniques in narrow service spaces [45,46].

Classification Needs
Given the necessary continuous maintenance works and the large amount of data to be handled, the semantic classification of the Milan Cathedral's point cloud can become a digital support for conservation activities.In particular, the following activities can be foreseen:

•
Derivation of measurements and 2D representations from anywhere in the Cathedral.

•
Identification, counting and visualisation of single architectural elements.

•
A better interpretation of the architectural structures at point cloud level, avoiding long and tedious modelling processes.

•
Keeping track of every restoration activity, treating the point cloud as a complete 3D navigable information system where it is possible to reference information, data, and a catalogue of archive documents.

•
Generation of a BIM-like web-based information system platform, usable in the field within a mixed-reality system.

Pomposa Abbey
The IX century Benedictine monastery of Santa Maria di Pomposa is situated in the province of Ferrara and represents one of the most important Abbeys of Northern Italy.The main building of the complex is the Basilica, consecrated in 1026 AD, that presents a layout that recalls the typical Ravenna style, in which the inner space is subdivided into three naves through continuous colonnades, bounded with relative pulvini (Figure 3).The central nave ends with a hemispherical apse that corresponds to an outer pentagonal shape.The actual framework is the result of different significant transformations occurred through the centuries, and that have basically changed the original Roman aspect [47,48].Several survey campaigns have been conducted in the past [49,50] in order to analyse and monitor the structural behaviour, preserving the Abbey from possible damages.These inspections verified that the massive bell tower caused distortions of the church.In order to reduce this structural problem, a sequence of walls was added in the side aisles [51,52].The actual church frame represents a non-regular case study from the geometric point of view, including columns inserted in transverse masonries, irregular structural walls and roof elements, and roto-translation of single beams.
The IX century Benedictine monastery of Santa Maria di Pomposa is situated in the province of Ferrara and represents one of the most important Abbeys of Northern Italy.The main building of the complex is the Basilica, consecrated in 1026 AD, that presents a layout that recalls the typical Ravenna style, in which the inner space is subdivided into three naves through continuous colonnades, bounded with relative pulvini (Figure 3).The central nave ends with a hemispherical apse that corresponds to an outer pentagonal shape.The actual framework is the result of different significant transformations occurred through the centuries, and that have basically changed the original Roman aspect [47,48].Several survey campaigns have been conducted in the past [49,50] in order to analyse and monitor the structural behaviour, preserving the Abbey from possible damages.These inspections verified that the massive bell tower caused distortions of the church.In order to reduce this structural problem, a sequence of walls was added in the side aisles [51,52].The actual church frame represents a non-regular case study from the geometric point of view, including columns inserted in transverse masonries, irregular structural walls and roof elements, and roto-translation of single beams.

3D Survey
A 3D survey campaign of the Abbey was conducted to acquire the global structure of the church and its annexed buildings.The survey project was an opportunity to test different approaches and technologies, aiming at finding the best solution for the high-resolution documentation of a big architectonic complex [53].The survey campaign included both image-and range-based techniques.
The photogrammetric technique was devoted to roofs data acquisition with a UAS platform, while laser scanning was used for the internal and external architectural elements.The exterior was digitized with a pulsed TOF Leica C10 from 21 stations, selecting a 0.5-cm sampling step for the main

3D Survey
A 3D survey campaign of the Abbey was conducted to acquire the global structure of the church and its annexed buildings.The survey project was an opportunity to test different approaches and technologies, aiming at finding the best solution for the high-resolution documentation of a big architectonic complex [53].The survey campaign included both image-and range-based techniques.
The photogrammetric technique was devoted to roofs data acquisition with a UAS platform, while laser scanning was used for the internal and external architectural elements.The exterior was digitized with a pulsed TOF Leica C10 from 21 stations, selecting a 0.5-cm sampling step for the main façade and a 2-cm sampling step for the other parts.On the other hand, the interior spaces were digitized using a phase-shift TOF Faro Focus X 120 placed in 31 different positions (Figure 4 and Table 2).The final point cloud was obtained with an ICP alignment of internal and external clouds, subsampled at 2 and 5 cm in order to optimise data management.The point cloud classification has considered only the interior datasets of the church (Table 2), which presents several interesting architectural features and a more complex geometric framework compared with the exterior part.The enrichment of the point cloud with semantic information can be useful in particular for the following: • Derivation of measurements and 2D representations.
• Monitoring the building behaviour over time, by verifying masonries, columns and roof rototranslations: a comparison of 3D point clouds acquired over time can be considered a valuable solution to exactly and completely describe the whole fabric.• Geometric quality check of the individual elements belonging to the same class, highlighting possible alterations in the structural composition of the building.• Quantification of the building from a material and functional point of view, a crucial step for both a conservative intervention and damage evaluation (e.g., after a destructive event).

Classification Needs
The point cloud classification has considered only the interior datasets of the church (Table 2), which presents several interesting architectural features and a more complex geometric framework compared with the exterior part.The enrichment of the point cloud with semantic information can be useful in particular for the following:

•
Derivation of measurements and 2D representations.

•
Monitoring the building behaviour over time, by verifying masonries, columns and roof roto-translations: a comparison of 3D point clouds acquired over time can be considered a valuable solution to exactly and completely describe the whole fabric.

•
Geometric quality check of the individual elements belonging to the same class, highlighting possible alterations in the structural composition of the building.
• Quantification of the building from a material and functional point of view, a crucial step for both a conservative intervention and damage evaluation (e.g., after a destructive event).

Developed Methodology
The complexity and uniqueness of the chosen dataset highlighted some limits in applying a DL approach for these specific studies.Indeed, the richness in elements and their variety, even among the same classes (i.e., all the capitals of the Milan Cathedral are unique and different from each other) pointed out the difficulty in collecting a representative annotated dataset which was large enough to train a neural network.Moreover, even considering the possibility to find representative datasets [41], the time necessary to manually annotate them would exceed the time required to classify the case studies entirely.
Starting from these considerations, the developed classification method uses a Random Forest (RF) classifier [54], following the successful supervised approach based on geometric features introduced in [34] and the study on features importance in Random Forest [20,33].Compared to DL approaches, RF methods do not need a significant amount of manually annotated datasets to be effective.On the other hand, it becomes fundamental to select features able to highlight the discontinuities between the different architectural elements.
The high-density, high-resolution point clouds considered in this paper allow for a precise geometrical description of the CH object.However, if these the 3D representations allow the richness of construction details (i.e., decoration, small statues, beams, etc.) to be appreciated, on the other hand huge point clouds of large and complex structures have two main problems:

•
The size of the dataset and the number of geometrical features to be extracted makes the computational process challenging.If the latter can be solved choosing only essential features, a subsampling of the acquired point cloud could help to handle the high number of points, but it would lower the details visible in the dataset.

•
The large number of semantic classes to be identified might induce misclassification issues: initial experiments have shown that with a higher number of classes a lower accuracy of the classification is achieved (Sections 7.1.1 and 7.2.1).
Considering these problems, it becomes difficult to classify very large datasets in one single step, as traditional classification approaches do.Therefore, a multi-level and multi-resolution (MLMR) approach is proposed (Figure 5), which follows these steps: 1.
The full-resolution point cloud is subsampled in various geometric levels.

2.
Given certain manually annotated areas corresponding to the classes of interest of the first geometric level, a RF model is trained and then all classes are predicted on the entire point cloud.

3.
Classification results coming from this first step are back interpolated (BI) on a higher resolution version of the dataset, using a nearest neighbourhood algorithm.To do so, the class value assigned to each point is transferred to a certain number of elements evaluated inside a specific cluster; the number of nearest neighbourhoods to be evaluated depends on the point cloud resolution.The higher the resolution, the higher the number of points in the neighbourhood.

4.
The next geometric level is considered with its new classes (e.g., columns are divided in base, shaft and capitals) and a new classification procedure is applied.

5.
Classification results are again back interpolated on a higher resolution version of the point cloud until the full geometric resolution is reached.
It should be mentioned that each architectural element can be classified in more and more details, according to the employed point cloud resolution at the corresponding level, and then back interpolated on a higher resolution version.The process is iterative, and the last classification level will correspond to the full-resolution point cloud.Moreover, the subsampling of the point cloud will depend on both the complexity of the object to be classified and the elements to be recognised.Macro-categories of objects allow for a lower resolution, while the higher resolution is used for details.
The proposed hierarchical methodology is graphically summarised in Figure 5.The diagram aims also to provide for a general indication in terms of geometric resolution of the point cloud and geometric feature ready to be used at each classification step.These values, and the methodology itself, can be considered sufficiently generalisable to be used for the classification of architectural elements in the monumental field.The parameters (resolution and feature radii) can vary case by case according to the characteristics and dimensions of the objects.However, it can be seen that they correspond to the smallest detail that can be represented at a given representation scale and its metric tolerance, in a logic of representation at an increasing level of detail.As a final and additional step, after the full classification of the point cloud, it is possible to apply the so-called instance segmentation to each architectural class.In this way, for example, each column of the dataset can have its own index.To do so, the label connected component function available in CloudCompare [55] was used.Still, instead of working on pixels arranged in a grid, the octree structure of the point cloud from which it defines the grid step is used to perform the search in the 3D space.The tool separates the cloud into smaller ones based on a minimum distance between elements and a minimum number of points that a component must have.As a final and additional step, after the full classification of the point cloud, it is possible to apply the so-called instance segmentation to each architectural class.In this way, for example, each column of the dataset can have its own index.To do so, the label connected component function available in CloudCompare [55] was used.Still, instead of working on pixels arranged in a grid, the octree structure of the point cloud from which it defines the grid step is used to perform the search in the 3D space.The tool separates the cloud into smaller ones based on a minimum distance between elements and a minimum number of points that a component must have.

Milan Cathedral
Following the Veneranda Fabbrica conservation and restoration rules as well as the hierarchical subdivision of the monument in areas, zones, sectors, architectural elements and marble blocks reported in [56], we consider an automatic classification of the cathedral into three different levels (Figure 6).In the second level of classification, the previously identified architectural elements are divided into sub-components (e.g., columns are split in its bases, capitals and shafts).This step is performed using a 2-cm resolution point cloud, reducing to 10 cm and 1 m the min/max feature search radii.
Finally, the third and most challenging level of classification aims at the subdivision of each component in its ashlars (i.e., monolith elements as statues, gothic decoration, holes, etc.).For this, the full resolution point cloud is used with a min and max search radius of 0.5 and 5 cm, respectively.

Pomposa Abbey
Considering monitoring needs, the classification has been subdivided into three different levels (Figure 7).
The first level refers to the main structural framework of the church, defining a global functional subdivision of the entire dataset: floor, façades, columns, arches, and roof.In order to recognise these macro-categories, the point cloud has been processed at a 5-cm resolution.
The second level is devoted to a more in-depth classification process, recognising multiple subclasses within the categories coming from the first level of classification.Façades are divided into walls and windows, columns into bases, shafts and capitals, and roof are broken down into dome, side, central and front cover.At this level, each architectural element was recognised at the 2-cm resolution point cloud.
The third level refers to the roof which presents complex and variable structures.A diverse classification process was run for each different roof structure, highlighting its main structural elements (e.g., rafters, purlins, panels, tie beams, wall plates, crown posts, etc.).To finalise this structural study, the suitable resolution of the point cloud was 2 cm.The first level of classification has the function to identify macro-architectonic elements.The classification process is performed on the point cloud at a 5-cm resolution, extracting geometric features with radii between 20 cm and 2.5 m.
In the second level of classification, the previously identified architectural elements are divided into sub-components (e.g., columns are split in its bases, capitals and shafts).This step is performed using a 2-cm resolution point cloud, reducing to 10 cm and 1 m the min/max feature search radii.
Finally, the third and most challenging level of classification aims at the subdivision of each component in its ashlars (i.e., monolith elements as statues, gothic decoration, holes, etc.).For this, the full resolution point cloud is used with a min and max search radius of 0.5 and 5 cm, respectively.

Pomposa Abbey
Considering monitoring needs, the classification has been subdivided into three different levels (Figure 7).
The first level refers to the main structural framework of the church, defining a global functional subdivision of the entire dataset: floor, façades, columns, arches, and roof.In order to recognise these macro-categories, the point cloud has been processed at a 5-cm resolution.
The second level is devoted to a more in-depth classification process, recognising multiple sub-classes within the categories coming from the first level of classification.Façades are divided into walls and windows, columns into bases, shafts and capitals, and roof are broken down into dome, side, central and front cover.At this level, each architectural element was recognised at the 2-cm resolution point cloud.
The third level refers to the roof which presents complex and variable structures.A diverse classification process was run for each different roof structure, highlighting its main structural elements (e.g., rafters, purlins, panels, tie beams, wall plates, crown posts, etc.).To finalise this structural study, the suitable resolution of the point cloud was 2 cm.

Geometric Features Extraction
The necessary features for the classification are based on the covariance matrix [57] computed within a local neighbourhood of a 3D point.For both case studies, the considered local neighbourhoods are directly related to the dimension of the architectural elements.Moreover, as various features affect the classification results in different ways, the geometric features with the smallest impact on the training process have been iteratively removed [33].The experiments showed that, independently from the object to be classified and the point cloud resolution, the features with a more significant impact on the classification results are (Figure 8): • Anisotropy: allows the recognition of 3D elements such as columns, buttresses and spires over 2.5D elements such as walls and vaults.• Planarity: highlight linear and planar items such as chains and floors and their vertical counterparts such as column shaft and walls.• Linearity: similarly to planarity, it helps in identifying linear structures.
• Surface variation: it emphasises changes in the shapes allowing, for example, to detect corners or edges.

Geometric Features Extraction
The necessary features for the classification are based on the covariance matrix [57] computed within a local neighbourhood of a 3D point.For both case studies, the considered local neighbourhoods are directly related to the dimension of the architectural elements.Moreover, as various features affect the classification results in different ways, the geometric features with the smallest impact on the training process have been iteratively removed [33].The experiments showed that, independently from the object to be classified and the point cloud resolution, the features with a more significant impact on the classification results are (Figure 8):

•
Anisotropy: allows the recognition of 3D elements such as columns, buttresses and spires over 2.5D elements such as walls and vaults.
• Planarity: highlight linear and planar items such as chains and floors and their vertical counterparts such as column shaft and walls.

•
Linearity: similarly to planarity, it helps in identifying linear structures.

•
Surface variation: it emphasises changes in the shapes allowing, for example, to detect corners or edges.The used search radii are reported in brackets.

Training and Classification
Beside extracting geometric features, some portion of each dataset-at each classification levelhas been manually annotated in order to train the RF algorithm.Among these covariance features, planarity and linearity, and surface variation and sphericity behave pairwise similarly.However, all of them have been used at the same time to improve the performance of the RF classifier.In fact, as the algorithm splits all the available information among its decision trees, redundancy in geometric features allows us to have the correct relations in every tree, allowing for a better classification.

Training and Classification
Beside extracting geometric features, some portion of each dataset-at each classification level-has been manually annotated in order to train the RF algorithm.

Results and Discussion
Considering the reduced dimension of the manually annotated portions used to train the classifier, the achieved results in both case studies were successful (Table 4 and Table 7).The metrics used to evaluate these results are Precision, Recall and F1-score.They were computed by comparing, for each dataset, a manually annotated portion with the same portion automatically predicted.
In the next sections, the achieved results will be discussed, focussing on advantages and criticalities related to the semantic segmentation of such complex architectures.Section 7.1.1 and Section 7.2.1 will show the classification results achieved with a traditional classification approach (non-hierarchical, but in one step with all semantic classes) in comparison to the proposed MLMR method.

One-Step Classification
The Cathedral dataset was classified in one step using a Random Forest method and all 18 semantic classes (see Figures 13 and 14).To perform a one-step (non-hierarchical) classification, due to the huge dimension of the dataset (more than 3 bil.points), the data were subsampled at a 5-cm resolution, unavoidably lowering its details.The training dataset contained all 18 classes and the classification result got an average F1-score of 67%.Results (Figure 12) show that with this approach it is difficult to distinguish elements with similar geometry but belonging to different classes (e.g., ornaments on the capitals of the main nave and ornaments on the choir area).Furthermore, many detailed elements present discontinuities that are equal or smaller than 5 cm, making their recognition complicated.

Results and Discussion
Considering the reduced dimension of the manually annotated portions used to train the classifier, the achieved results in both case studies were successful (Table 4 and Table 7).The metrics used to evaluate these results are Precision, Recall and F1-score.They were computed by comparing, for each dataset, a manually annotated portion with the same portion automatically predicted.
In the next sections, the achieved results will be discussed, focussing on advantages and criticalities related to the semantic segmentation of such complex architectures.Sections 7.1.1 and 7.2.1 will show the classification results achieved with a traditional classification approach (non-hierarchical, but in one step with all semantic classes) in comparison to the proposed MLMR method.

One-Step Classification
The Cathedral dataset was classified in one step using a Random Forest method and all 18 semantic classes (see Figures 13 and 14).To perform a one-step (non-hierarchical) classification, due to the huge dimension of the dataset (more than 3 bil.points), the data were subsampled at a 5-cm resolution, unavoidably lowering its details.The training dataset contained all 18 classes and the classification result got an average F1-score of 67%.Results (Figure 12) show that with this approach it is difficult to distinguish elements with similar geometry but belonging to different classes (e.g., ornaments on the capitals of the main nave and ornaments on the choir area).Furthermore, many detailed elements present discontinuities that are equal or smaller than 5 cm, making their recognition complicated.
Therefore, dividing the classification into different geometric levels facilitates the distinction of similar elements as well as allows to process only specific parts of the dataset at the full 5-mm resolution.Figures 13 and 14 show some examples of the subsequent levels of classification, relative to both Cathedral interiors and exteriors.Table 4 shows an average of the F1-score values, computed at each classification level.The classification results, applied to the whole dataset, are shown in Figure 15.Therefore, dividing the classification into different geometric levels facilitates the distinction of similar elements as well as allows to process only specific parts of the dataset at the full 5-mm resolution.

MLMR Classification
Figures 13 and 14 show some examples of the subsequent levels of classification, relative to both Cathedral interiors and exteriors.Table 4 shows an average of the F1-score values, computed at each classification level.The classification results, applied to the whole dataset, are shown in Figure 15.
Remote Sens. 2020, 12, x; doi: FOR PEER REVIEW 15 of 30 Therefore, dividing the classification into different geometric levels facilitates the distinction of similar elements as well as allows to process only specific parts of the dataset at the full 5-mm resolution.Figures 13 and 14 show some examples of the subsequent levels of classification, relative to both Cathedral interiors and exteriors.Table 4 shows an average of the F1-score values, computed at each classification level.The classification results, applied to the whole dataset, are shown in Figure 15.At the first level of classification, it was possible to identify the main elements of the construction visible in a 5-cm resolution point cloud (i.e., for the interior: floors, columns, chains, vaults, walls, and choir; for the exterior: walls, buttresses, roofs, and street).The use of a low-resolution point cloud allowed us to process the whole dataset at once.This kind of operation would have been critical from the hardware and computation point of view if using the full resolution point clouds.In Tables 5 and  6, it is possible to observe the classification metrics achieved per class in the first level of classification (level 1).The main classification errors in the first level of classification were due to the following areas: (i) too small with respect to the resolution level, (ii) ambiguous from a geometrical point of view, or (iii) too similar to elements belonging to a different class.For example, it has been difficult to classify chains, which are approximately 8 cm thick, at a 5-cm resolution point clouds.

MLMR Classification
A second type of problem was found in the area which surrounds the choir.The class choir comprises different architectural elements including pipe organs, benches and altars.The decision to classify them in one class is coherent with the Veneranda Fabbrica rules.However, the heterogeneity of its parts and noise in the 3D point model caused classification problems in neighbouring regions.In fact, as the RF classifier relies on geometrical features of the local points to classify them, it was not able to easily distinguish classes with similar characteristics.An analogue issue has been encountered on the outside part of the Cathedral.Here, the RF classifier could classify only those buttresses that At the first level of classification, it was possible to identify the main elements of the construction visible in a 5-cm resolution point cloud (i.e., for the interior: floors, columns, chains, vaults, walls, and choir; for the exterior: walls, buttresses, roofs, and street).The use of a low-resolution point cloud allowed us to process the whole dataset at once.This kind of operation would have been critical from the hardware and computation point of view if using the full resolution point clouds.In Tables 5 and 6, it is possible to observe the classification metrics achieved per class in the first level of classification (level 1).The main classification errors in the level of classification were due to the following areas: (i) too small with respect to the resolution level, (ii) from a geometrical point of view, or (iii) too similar to elements belonging to a different class.For example, it has been difficult to classify chains, which are approximately 8 cm thick, at a 5-cm resolution point clouds.
A second type of problem was found in the area which surrounds the choir.The class choir comprises different architectural elements including pipe organs, benches and altars.The decision to classify them in one class is coherent with the Veneranda Fabbrica rules.However, the heterogeneity of its parts and noise in the 3D point model caused classification problems in neighbouring regions.In fact, as the RF classifier relies on geometrical features of the local points to classify them, it was not able to easily distinguish classes with similar characteristics.An analogue issue has been encountered on the outside part of the Cathedral.Here, the RF classifier could classify only those buttresses that were entirely surveyed.Errors appeared in those parts where, due to building site requirements, it was impossible to survey one of the two buttresses' faces.In that case, the classifier confused the buttresses with walls, both being 2.5D.In Figure 16, it is possible to see the different behaviour of the same geometric features on complete and incomplete elements.The main classification errors in the first level of classification were due to the following areas: (i) too small with respect to the resolution level, (ii) ambiguous from a geometrical point of view, or (iii) too similar to elements belonging to a different class.For example, it has been difficult to classify chains, which are approximately 8 cm thick, at a 5-cm resolution point clouds.
A second type of problem was found in the area which surrounds the choir.The class choir comprises different architectural elements including pipe organs, benches and altars.The decision to classify them in one class is coherent with the Veneranda Fabbrica rules.However, the heterogeneity of its parts and noise in the 3D point model caused classification problems in neighbouring regions.In fact, as the RF classifier relies on geometrical features of the local points to classify them, it was not able to easily distinguish classes with similar characteristics.An analogue issue has been encountered on the outside part of the Cathedral.Here, the RF classifier could classify only those buttresses that were entirely surveyed.Errors appeared in those parts where, due to building site requirements, it was impossible to survey one of the two buttresses' faces.In that case, the classifier confused the buttresses with walls, both being 2.5D.In Figure 16, it is possible to see the different behaviour of the same geometric features on complete and incomplete elements.Before transferring the classification results from levels 1 to 2, misclassification errors have been manually adjusted.In fact, errors at top levels would severely affect classification at the subsequent, more detailed steps.
At the second level of classification, the precision of the results increased (see also Table 4) as the geometric resolution of the point cloud increased.This allowed to identify more peculiar architectural elements.Finally, the third classification level (5-mm resolution) included the most complex shapes to be classified (e.g., statues incorporated inside the capitals): the accuracy metrics slightly decreased due to the complex shapes but also the presence of occlusions in the point clouds as well as geometric similarities between elements (e.g., statues and gothic pyramids).
The final step of the classification process was the instance segmentation, applied to all the single and repeated elements belonging to the same classes (Figures 17 and 18).The instance segmentation allowed to associate a different index to an architectural component of the same class (e.g., statues, capitals, bases, shafts, etc.).This indexing allows for (i) a better management in a HBIM context and (ii) a quick and precise identification of the elements on which to intervene (in accordance to Veneranda Fabbrica intervention rules).
Remote Sens. 2020, 12, x; doi: FOR PEER REVIEW 18 of 30 Before transferring the classification results from levels 1 to 2, misclassification errors have been manually adjusted.In fact, errors at top levels would severely affect classification at the subsequent, more detailed steps.
At the second level of classification, the precision of the results increased (see also Table 4) as the geometric resolution of the point cloud increased.This allowed to identify more peculiar architectural elements.Finally, the third classification level (5-mm resolution) included the most complex shapes to be classified (e.g., statues incorporated inside the capitals): the accuracy metrics slightly decreased due to the complex shapes but also the presence of occlusions in the point clouds as well as geometric similarities between elements (e.g., statues and gothic pyramids).
The final step of the classification process was the instance segmentation, applied to all the single and repeated elements belonging to the same classes (Figures 17 and 18).The instance segmentation allowed to associate a different index to an architectural component of the same class (e.g., statues, capitals, bases, shafts, etc.).This indexing allows for (i) a better management in a HBIM context and (ii) a quick and precise identification of the elements on which to intervene (in accordance to Veneranda Fabbrica intervention rules).

One Step Classification
A traditional non-hierarchical classification was applied to the Abbey dataset searching for 18 classes of architectural interest (see Figure 21).Results, shown in Figure 19, seems quite encouraging, with an average F1-score of 75%.However, in order to be used in daily practice, the classification results need some manual misclassification adjustments (Figure 20).The relative acceptable results can be explained by the fact that Pomposa Abbey, even if complex, present less ambiguity among classes with respect to the Milan's Cathedral.The RF can distinguish among walls and structural Before transferring the classification results from levels 1 to 2, misclassification errors have been manually adjusted.In fact, errors at top levels would severely affect classification at the subsequent, more detailed steps.
At the second level of classification, the precision of the results increased (see also Table 4) as the geometric resolution of the point cloud increased.This allowed to identify more peculiar architectural elements.Finally, the third classification level (5-mm resolution) included the most complex shapes to be classified (e.g., statues incorporated inside the capitals): the accuracy metrics slightly decreased due to the complex shapes but also the presence of occlusions in the point clouds as well as geometric similarities between elements (e.g., statues and gothic pyramids).
The final step of the classification process was the instance segmentation, applied to all the single and repeated elements belonging to the same classes (Figures 17 and 18).The instance segmentation allowed to associate a different index to an architectural component of the same class (e.g., statues, capitals, bases, shafts, etc.).This indexing allows for (i) a better management in a HBIM context and (ii) a quick and precise identification of the elements on which to intervene (in accordance to Veneranda Fabbrica intervention rules).

One Step Classification
A traditional non-hierarchical classification was applied to the Abbey dataset searching for 18 classes of architectural interest (see Figure 21).Results, shown in Figure 19, seems quite encouraging, with an average F1-score of 75%.However, in order to be used in daily practice, the classification results need some manual misclassification adjustments (Figure 20).The relative acceptable results can be explained by the fact that Pomposa Abbey, even if complex, present less ambiguity among classes with respect to the Milan's Cathedral.The RF can distinguish among walls and structural

One Step Classification
A traditional non-hierarchical classification was applied to the Abbey dataset searching for 18 classes of architectural interest (see Figure 21).Results, shown in Figure 19, seems quite encouraging, with an average F1-score of 75%.However, in order to be used in daily practice, the classification results need some manual misclassification adjustments (Figure 20).The relative acceptable results can be explained by the fact that Pomposa Abbey, even if complex, present less ambiguity among classes with respect to the Milan's Cathedral.The RF can distinguish among walls and structural parts in the trusses because they are geometrically different (contrary to the statues on the choir and those on the capitals in Milan's Cathedral).Moreover, the point cloud resolution is low and does not necessarily requires an initial subsampling step.

MLMR classification
The developed MLMR classification approach was applied with three levels (Figure 21).Table 7 reports the F1-scores achieved in the three levels, with an accuracy always over 90%.Working on different levels has facilitated the process of architectural content recognition, avoiding errors that are common when more classes share the same properties (e.g., columns and crown posts).The slight decrease in accuracy for level 3 is due to the fact that, despite an increase of the classes (from 9 to 18), the geometric resolution of the point cloud is like level 2 (2 cm).
Table 7. Pomposa Abbey: the F1-scores average values for all three classification levels (in brackets the geometric resolutions of the point clouds).Please consider that each level has a different number of classes.The classification levels 2 and 3 are performed on the point cloud at a 2-cm resolution, i.e., the native resolution.

Level 1 (5 cm) Level 2 (2 cm) Level 3 (2 cm) F1 (%)
95.1 97.8 94.6 At the first classification level, the most complex parts to classify were those columns that are directly inserted in the transverse walls.As shown in Figure 22a, geometric features such as planarity or linearity, that generally help in highlighting columns, responded differently on columns and semi-

MLMR classification
The developed MLMR classification approach was applied with three levels (Figure 21).Table 7 reports the F1-scores achieved in the three levels, with an accuracy always over 90%.Working on different levels has facilitated the process of architectural content recognition, avoiding errors that are common when more classes share the same properties (e.g., columns and crown posts).The slight decrease in accuracy for level 3 is due to the fact that, despite an increase of the classes (from 9 to 18), the geometric resolution of the point cloud is like level 2 (2 cm).
Table 7. Pomposa Abbey: the F1-scores average values for all three classification levels (in brackets the geometric resolutions of the point clouds).Please consider that each level has a different number of classes.The classification levels 2 and 3 are performed on the point cloud at a 2-cm resolution, i.e., the native resolution.

Level 1 (5 cm) Level 2 (2 cm) Level 3 (2 cm) F1 (%)
95.1 97.8 94.6 At the first classification level, the most complex parts to classify were those columns that are directly inserted in the transverse walls.As shown in Figure 22a, geometric features such as planarity or linearity, that generally help in highlighting columns, responded differently on columns and semi-

MLMR classification
The developed MLMR classification approach was applied with three levels (Figure 21).Table 7 reports the F1-scores achieved in the three levels, with an accuracy always over 90%.Working on different levels has facilitated the process of architectural content recognition, avoiding errors that are common when more classes share the same properties (e.g., columns and crown posts).The slight decrease in accuracy for level 3 is due to the fact that, despite an increase of the classes (from 9 to 18), the geometric resolution of the point cloud is like level 2 (2 cm).At the first classification level, the most complex parts to classify were those columns that are directly inserted in the transverse walls.As shown in Figure 22a, geometric features such as planarity or linearity, that generally help in highlighting columns, responded differently on columns and semi-columns.Still, the integration with other covariance features, such as Surface Variation (Figure 22b), allows quite precise identification of semi-columns.
Remote Sens. 2020, 12, x; doi: FOR PEER REVIEW 20 of 30 columns.Still, the integration with other covariance features, such as Surface Variation (Figure 22b), allows quite precise identification of semi-columns.columns.Still, the integration with other covariance features, such as Surface Variation (Figure 22b), allows quite precise identification of semi-columns.Once columns and semi-columns have been isolated (Figure 23), the subsequent level of classification, based on a higher resolution, allowed the identification of further elements and details, such as base, capital, shaft, etc. (Figure 24).Once columns and semi-columns have been isolated (Figure 23), the subsequent level of classification, based on a higher resolution, allowed the identification of further elements and details, such as base, capital, shaft, etc. (Figure 24).The second classification level split the roof into the dome, central, front and side cover, as each part presents a different structural behaviour.Therefore, these roof parts and wooden structures were considered in the last classification level.The parts could be further subdivided, extracting all their various components (Figure 25), but using specific features for each part.
In all cases, the Surface Variation, extracted at radii proportional to the size of the beams, has been essential to highlight the various elements.But, after a strategic re-orientation of the parts, the X and Y coordinates were used as features for the central and front cover, respectively, to distinguish the principal rafters.This latter strategy allowed us to notice the presence of deformations in the central cover, starting from the observation of some anomalies in the classification results (Figure 26).Once columns and semi-columns have been isolated (Figure 23), the subsequent level of classification, based on a higher resolution, allowed the identification of further elements and details, such as base, capital, shaft, etc. (Figure 24).The second classification level split the roof into the dome, central, front and side cover, as each part presents a different structural behaviour.Therefore, these roof parts and wooden structures were considered in the last classification level.The parts could be further subdivided, extracting all their various components (Figure 25), but using specific features for each part.
In all cases, the Surface Variation, extracted at radii proportional to the size of the beams, has been essential to highlight the various elements.But, after a strategic re-orientation of the parts, the X and Y coordinates were used as features for the central and front cover, respectively, to distinguish the principal rafters.This latter strategy allowed us to notice the presence of deformations in the central cover, starting from the observation of some anomalies in the classification results (Figure 26).The second classification level split the roof into the dome, central, front and side cover, as each part presents a different structural behaviour.Therefore, these roof parts and wooden structures were considered in the last classification level.The parts could be further subdivided, extracting all their various components (Figure 25), but using specific features for each part.
In all cases, the Surface Variation, extracted at radii proportional to the size of the beams, has been essential to highlight the various elements.But, after a strategic re-orientation of the parts, the X and Y coordinates were used as features for the central and front cover, respectively, to distinguish the principal rafters.This latter strategy allowed us to notice the presence of deformations in the central cover, starting from the observation of some anomalies in the classification results (Figure 26).With regards to the side cover classification, in early stages of experimentation, different problems were encountered as complete and incomplete beams (coming from occlusions in the dataset) responded differently to the feature extraction (Figure 27a).To cope with this issue, a temporary rotation was applied to the side covers, to make them horizontal, so that we could benefit from the Z coordinates for distinguishing rafters, purlins, and panels (Figure 27b,c).
Classification metrics per classes at level 3 are reported in Tables 8-10.This level includes very With regards to the side cover classification, in early stages of experimentation, different problems were encountered as complete and incomplete beams (coming from occlusions in the dataset) responded differently to the feature extraction (Figure 27a).To cope with this issue, a temporary rotation was applied to the side covers, to make them horizontal, so that we could benefit from the Z coordinates for distinguishing rafters, purlins, and panels (Figure 27b,c).With regards to the side cover classification, in early stages of experimentation, different problems were encountered as complete and incomplete beams (coming from occlusions in the dataset) responded differently to the feature extraction (Figure 27a).To cope with this issue, a temporary rotation was applied to the side covers, to make them horizontal, so that we could benefit from the Z coordinates for distinguishing rafters, purlins, and panels (Figure 27b,c).
Classification metrics per classes at level 3 are reported in Tables 8-10.This level includes very fine details of the roof (purlins, tie beams, etc.) which were classified at the same geometric detail of level 2; hence, the reached accuracy metrics are slightly worse.Classification metrics per classes at level 3 are reported in Tables 8-10.This level includes very fine details of the roof (purlins, tie beams, etc.) which were classified at the same geometric detail of level 2; hence, the reached accuracy metrics are slightly worse.Finally, the instance segmentation of the whole wooden roof (Figure 28) provided additional results useful for conservation and monitoring activities, allowing to identify components that have to be replaced.Moreover, the abstraction of structural elements could become preliminary to simulations with finite element methods/analysis systems (FEM/FEA).

Conclusions
The paper presented a new hierarchical classification procedure (MLMR) to semantically enrich 3D point clouds of complex heritage structures.The achieved classification results show how enriched point clouds could support a better understanding of complex heritage architectures as well as operations like restoration, communication and on-site facility management.The cognitive contribution of an expert operator is fundamental at the beginning of the process.The following are still required from an expert operator: (i) the identification of the classification rules, (ii) the class definition, and (iii) the choice of training and validation sets (data annotation).These steps are crucial

Conclusions
The paper presented a new hierarchical classification procedure (MLMR) to semantically enrich 3D point clouds of complex heritage structures.The achieved classification results show how enriched point clouds could support a better understanding of complex heritage architectures as well as operations like restoration, communication and on-site facility management.The cognitive contribution of an expert operator is fundamental at the beginning of the process.The following are still required from an expert operator: (i) the identification of the classification rules, (ii) the class definition, and (iii) the choice of training and validation sets (data annotation).These steps are crucial to adapt the process to different case studies.
The innovative aspects of the presented work are as follows: • The use of machine learning techniques to quickly classify large and complex 3D architectures without the need of large training datasets.

•
The definition of general rules (e.g., identification of geometric features), replicable in various heritage scenarios, in terms of relations among classification levels, point cloud resolution and minimum/maximum feature search radii.

•
The hierarchical segmentation (until single instances) of 3D surveying data which could facilitate HBIM processes.

•
The speed of the process: once training and validation sets are defined, the prediction to the entire dataset is achieved in a few minutes.

•
The objectivity of the classification procedure: objective rules are applied uniformly throughout the entire process, making the process repeatable and independent from subjective choices of an operator.
As possible lines of future research, some aspects may deserve further attention and development: • Better investigation of the relationship between classification levels, point cloud resolution and features search radii: it is necessary to understand if the automatic classification with specific features can be generalised concerning data density, or if it is case dependent.

•
Verification of the usefulness of the classification process for the scan-to-BIM process, checking if the extracted semantic structures and instances facilitate the preparatory work for the construction of BIM models.

•
Checking if the semantically segmented point clouds could facilitate the generation of polygonal meshes.

•
Extension of the instance segmentation, not only to repeated and separated elements but also to those classes that present differences in composition or material, even if contiguous and similar in shape (e.g., walls).

•
Creation of a more user-friendly classification framework to be used by non-experts in the sector.

•
Testing the possibility to automatically process the data acquired on-site with mobile scanner instruments for real-time monitoring applications.

•
Improvement of the classification details by integrating information coming from images, which generally feature higher resolution, hence allows for a better identification/distinction of small elements (e.g., classification of each single marble block composing the Milan Cathedral).

Figure 1 .
Figure 1.(a) Milan Cathedral main facade.(b) A detail of the complex roof with repetitive buttresses.(c) Internal view of the main nave and its capitals.

Figure 2 .
Figure 2. The entire 3D point cloud of the Milan Cathedral (more than 3 billion points).TLS data are shown with their intensity colours (green, yellow, orange) whereas photogrammetric point clouds have RGB colour information.

Figure 2 .
Figure 2. The entire 3D point cloud of the Milan Cathedral (more than 3 billion points).TLS data are shown with their intensity colours (green, yellow, orange) whereas photogrammetric point clouds have RGB colour information.

Figure 3 .
Figure 3. Two views of the Abbey complex (a) as seen from a drone/UAS platform.(b) Main nave of the church.

Figure 3 .
Figure 3. Two views of the Abbey complex (a) as seen from a drone/UAS platform.(b) Main nave of the church.

Figure 4 .
Figure 4. Survey schema of the indoor acquisition campaign and three visualisations of the interior coloured point cloud of the Abbey.

Figure 4 .
Figure 4. Survey schema of the indoor acquisition campaign and three visualisations of the interior coloured point cloud of the Abbey.

30 Figure 5 .
Figure 5. Multi-level and multi-resolution (MLMR) workflow.The diagram provides also general indications in terms of point cloud resolution and minimum/maximum search radius of the geometric features, which have to be chosen at each step of the classification process.BI stands for Back Interpolation, i.e., classification results at a certain level are back interpolated to a higher resolution level of the point cloud until the full geometric resolution is reached.

Figure 5 .
Figure 5. Multi-level and multi-resolution (MLMR) workflow.The diagram provides also general indications in terms of point cloud resolution and minimum/maximum search radius of the geometric features, which have to be chosen at each step of the classification process.BI stands for Back Interpolation, i.e., classification results at a certain level are back interpolated to a higher resolution level of the point cloud until the full geometric resolution is reached.

Figure 6 .
Figure 6.Classification levels and classes for the Milan Cathedral.

Figure 6 .
Figure 6.Classification levels and classes for the Milan Cathedral.

Figure 7 .
Figure 7. Classification levels and classes for the Pomposa Abbey.

Figure 7 .
Figure 7. Classification levels and classes for the Pomposa Abbey.

•Figure 8 .
Figure 8. Visual comparison of geometric features.The colour of the plot represents the feature scale.The used search radii are reported in brackets.
Figures 9 and 10 show the training sets used at the first level of classification (5-cm resolution) of Milan Cathedral and Pomposa Abbey, respectively.For the first case study (30 mil.points), the training sample is composed of ca 2.5 mil.annotated points for the exterior surfaces and ca 2.6 mil.points for the interior spaces.For the Pomposa Abbey, some 115,444 points are used for the training set out of ca 1.1 mil.points composing the whole cloud.Table 3 reports numbers of used points in the training and classification times for all three classification levels.Figure 11 indicates the percentage of time spent in each working phase.It is important to say that the computation time for the extraction of the geometric features varies in relation to the dimension of the point cloud, the number of features and their search radius.Even if the processing time increases with the dimension of a given training sample, it can still be considered negligible if compared to DL approaches.Regarding the first level of classification, on an 18-core processor workstation, the training phase for the Milan Cathedral took about 5 minutes (2.5 mil.points), while 43 seconds were necessary to classify the remaining exterior point cloud (12 mil.

Figure 8 .
Figure 8. Visual comparison of geometric features.The colour of the plot represents the feature scale.The used search radii are reported in brackets.

30 Figure 9 .
Figure 9. Annotated portions at classification level 1 for the Milan Cathedral.

Figure 9 . 30 Figure 9 .
Figure 9. Annotated portions at classification level 1 for the Milan Cathedral.

Figure 10 .
Figure 10.Annotated portions at classification level 1 for the Pomposa Abbey.Figure 10.Annotated portions at classification level 1 for the Pomposa Abbey.

Figure 10 .
Figure 10.Annotated portions at classification level 1 for the Pomposa Abbey.Figure 10.Annotated portions at classification level 1 for the Pomposa Abbey.

Figure 11 .
Figure 11.Comparison of normalized time necessary for the different phases of the classification process, from manual annotation to final classification of the dataset.

Figure 11 .
Figure 11.Comparison of normalized time necessary for the different phases of the classification process, from manual annotation to final classification of the dataset.

Table 4 .
Milan Cathedral: the F1-scores average values for all three classification levels (in brackets the geometric resolutions of the point clouds).Please consider that each level has a different number of classes.

Figure 13 .
Figure 13.MLMR classification levels (till capital details) for the Milan Cathedral.

Table 4 .
Milan Cathedral: the F1-scores average values for all three classification levels (in brackets the geometric resolutions of the point clouds).Please consider that each level has a different number of classes.

Table 4 .Figure 13 .
Figure 13.MLMR classification levels (till capital details) for the Milan Cathedral.Figure 13.MLMR classification levels (till capital details) for the Milan Cathedral.

Figure 13 .
Figure 13.MLMR classification levels (till capital details) for the Milan Cathedral.Figure 13.MLMR classification levels (till capital details) for the Milan Cathedral.

Figure 14 .
Figure 14.MLMR classification levels for the external walls of the Milan Cathedral.

Figure 14 .
Figure 14.classification levels for the external walls of the Milan Cathedral.Remote Sens. 2020, 12, x; doi: FOR PEER REVIEW 17 of 30

Figure 15 .
Figure 15.Classification results (nine classes) at level 1 for the entire Milan Cathedral.

Figure 15 .
Figure 15.Classification results (nine classes) at level 1 for the entire Milan Cathedral.

30 Figure 15 .
Figure 15.Classification results (nine classes) at level 1 for the entire Milan Cathedral.

Figure 17 .
Figure 17.Example of instance segmentation on the pillars of the main nave.

Figure 18 .
Figure 18.Example of instance segmentation on a capital of the main nave in order to distinguish the various statues (S1, S2, etc.).

Figure 17 .
Figure 17.Example of instance segmentation on the pillars of the main nave.

Figure 17 .
Figure 17.Example of instance segmentation on the pillars of the main nave.

Figure 18 .
Figure 18.Example of instance segmentation on a capital of the main nave in order to distinguish the various statues (S1, S2, etc.).

Figure 18 .
Figure 18.Example of instance segmentation on a capital of the main nave in order to distinguish the various statues (S1, S2, etc.).
Remote Sens. 2020,12,  x; doi: FOR PEER REVIEW 19 of 30 parts in the trusses because they are geometrically different (contrary to the statues on the choir and those on the capitals in Milan's Cathedral).Moreover, the point cloud resolution is low and does not necessarily requires an initial subsampling step.

Figure 21 .Figure 22 .
Figure 21.MLMR classification (18 classes overall) for the Pomposa Abbey.Please note that tie beam, panel and purlin are repeated among the sub-classes.

Figure 21 .
Figure 21.MLMR classification (18 classes overall) for the Pomposa Abbey.Please note that tie beam, panel and purlin are repeated among the sub-classes.

Figure 21 .Figure 22 .
Figure 21.MLMR classification (18 classes overall) for the Pomposa Abbey.Please note that tie beam, panel and purlin are repeated among the sub-classes.

Figure 23 .
Figure 23.Classification results (five classes) at level 1 for the entire Pomposa abbey.

Figure 24 .
Figure 24.Classification results (11 classes) at level 2 for the entire Abbey (a).A closer view of a column with its sub-elements (b).

Figure 23 .
Figure 23.Classification results (five classes) at level 1 for the entire Pomposa abbey.

Figure 23 .
Figure 23.Classification results (five classes) at level 1 for the entire Pomposa abbey.

Figure 24 .
Figure 24.Classification results (11 classes) at level 2 for the entire Abbey (a).A closer view of a column with its sub-elements (b).

Figure 24 .
Figure 24.Classification results (11 classes) at level 2 for the entire Abbey (a).A closer view of a column with its sub-elements (b).

Figure 25 .
Figure 25.Results at classification level 3 for the Pomposa Abbey's roof: (a) central; (b) front; (c) side cover.Colour legend as in Figure 18.

Figure 26 .
Figure 26.Top view with x-axis deviation highlighted and subsequent anomalies in the classification results of the wooden roof.Some closer views: (A) correct; (B) error in properly distinguishing the elements.

Figure 26 .
Figure 26.Top view with x-axis deviation highlighted and subsequent anomalies in the classification results of the wooden roof.Some closer views: (A) correct; (B) error in properly distinguishing the elements.

Figure 27 .
Figure 27.Bottom-up views with relative sections of the side cover: (a) surface variation behaviour with critical areas highlighted; (b) scalar field of the original and (c) rotated Z coordinates.

Figure 27 .
Figure 27.Bottom-up views with relative sections of the side cover: (a) surface variation behaviour with critical areas highlighted; (b) scalar field of the original and (c) rotated Z coordinates.

30 Figure 28 .
Figure 28.Instance segmentation of the roof structures.

Figure 28 .
Figure 28.Instance segmentation of the roof structures.

Table 1 .
Summary of 3D survey data acquisition and process for the Milan Cathedral.

Table 1 .
Summary of 3D survey data acquisition and process for the Milan Cathedral.

Table 3 .
Examples of training and classification times in relation to the number of used points in some areas of the two case studies.

Table 3 .
Examples of training and classification times in relation to the number of used points in some areas of the two case studies.

Table 5 .
Classification metrics at level 1 for the interiors of Milan Cathedral.

Table 6 .
Classification metrics at level 1 for the exteriors of Milan Cathedral.

Table 5 .
Classification metrics at level 1 for the interiors of Milan Cathedral.

Table 6 .
Classification metrics at level 1 for the exteriors of Milan Cathedral.

Table 7 .
Pomposa Abbey: the F1-scores average values for all three classification levels (in brackets the geometric resolutions of the point clouds).Please consider that each level has a different number of classes.The classification levels 2 and 3 are performed on the point cloud at a 2-cm resolution, i.e., the native resolution.

Table 8 .
Classification metrics at level 3 for the central cover of the Pomposa Abbey's roof.

Table 9 .
Classification metrics at level 3 for the front cover of the Pomposa Abbey's roof.

Table 10 .
Classification metrics at level 3 for the side cover of the Pomposa Abbey's roof.