1. Introduction
Landscape morphologies are now usually considered a concept linked to the physical constitution of a portion of land and the processes that create it [
1]. In other words, they correspond to the landscape’s material formation, including its shaping evolution, connected to the social and cultural structures with which it is associated. Although this research is oriented towards looking for anthropogenic forms in the landscape, it is possible to consider that the research tools, both concerning primary data, airborne LiDAR (Light Detection and Ranging), and, above all, those of semi-automatic semantic detection of the signs themselves (unsupervised and supervised macro-class classification), operate in the scenario of the definition and mapping of the microtopography of the land.
The concept of microtopography is also complex, as it considers terrain parameters such as slope, aspect, flow path, ruggedness index, wetness index, and curvature that characterize the surface. Still, in general, the microtopography studies in the literature converge on the preeminent consideration of the variation in elevation of the ground surface between 1 cm and 1 m [
2]. Starting from this consideration, the studies focus on the use of both active (LiDAR) and passive (digital photogrammetry) remote sensing (RS) technologies for the basic definition of microtopographic characteristics starting from highly accurate elevation data over relatively large areas.
Therefore, DEMs are considered in the literature as the characteristic 3D reconstruction, derived from aerial acquisition by sensors equipped on vehicles (planes, helicopters, drones, etc.), and can be derived from LiDAR and photogrammetric methods. Their use is traditionally related to the most varied application sectors, reducing time acquisition and ensuring high and accurate productivity, especially for urban 3D modeling or terrain morphology analysis.
Various studies can be cited among the environmental sectors most influenced by adopting automatic remote sensing techniques. These research works relate to hydrographic networks and morphological variations of watercourses concerning floods [
3]; forest management applications for forestry censuses, canopy height, and biomass estimation [
4,
5]; coastal protection [
6]; glacier monitoring [
7]; and recognition and monitoring of avalanches and landslides [
8,
9]. Moreover, even Cultural Heritage (CH) domain disciplines related to archaeological studies have benefited from RS technology applications, specifically for airborne DEM data classification [
10].
Concerning digital photogrammetric techniques, Unmanned Aerial Systems (UAS) have been recently recognized as a consolidated approach [
11]. The theoretical and methodological developments in the direction of UAS photogrammetry have been determined by the need to continuously improve the density and accuracy of the results. Moreover, balancing the full exploitation of the radiometric data with the necessity to cover large surface extensions is equally essential. Integrating airborne far-range and close-range photogrammetry approaches can fully satisfy the needs of 3D documentation at the scale of archaeological landscapes highly characterized by human impact [
12,
13]. UAS photogrammetry also proved very efficient and effective in situations of different vegetation complexity and soil type, again in the environmental field, such as examples of peatland demonstrated [
14]. Furthermore, the evolution of technologies has led to the consideration of integrating LiDAR sensors using UAS vectors. These compact solutions now optimize the on-flight stability issues and improve the estimation of the navigation path with real-time kinetic (RTK) GNSS. Moreover, the potential to estimate the ground surface is comparable, if not superior, to that obtainable using SfM-based UAS photogrammetry [
15]. The recent diffusion of commercial UAS platforms equipped with compact LiDAR sensors provides cost-effective solutions for experimentation on landscape and heritage high-scale 3D mapping [
16,
17]. Despite the success of image/range-based UAS technology due to its low cost, the superiority of Airborne Laser Scanner (ALS) technology for detecting earthworks in forested areas has been well-established for a long time. This superiority is particularly evident when using a full waveform recording system. It is crucial to distinguish between filtering issues and the effective potential of the sensor for archaeological analysis applications, as highlighted in Ref. [
18].
Today, automation in classification tasks acquires crucial importance both in the field of geospatial information aimed at urban contexts and 3D city models [
19] and also in the vast application fields that search for artificial landforms using primary data consisting of both remotely sensed images and high-resolution DEMs [
20]. Consistent and constant updating is provided to process 3D and 2.5D unstructured data. The semantic classification of these increasingly common data leads to a higher accuracy possibility in the digital twinning of real-world objects, whether they are anthropogenic artifacts or natural terrain features. Further detailed examination of classification approaches is addressed in
Section 1.1.
In this framework, the presented research aims to support the documentation of artificial terrain shapes by automatically classifying high-scale aerial LiDAR survey data. Yet, this current study deals with the second step of previous research and, presently, focuses precisely on the classification phase. The first phase, according to the best practices of geomatic documentation and archaeological research, validated the LiDAR dataset acquired from a helicopter by comparing the results with those obtained on the ground using various and integrated methods such as terrestrial laser scanning and SfM-based UAV photogrammetry [
21]. In addition, the previous work also anticipated the study of geomorphological analyses and the use of semi-automatic analysis of the DSM from ALS data, both of which are now extensively developed.
In this research, the proposed workflow focuses on studying the Spina Verde Park area in Como, Lombardia, Italy. The case study pertains to a landscape Cultural Heritage context: the area is in a hilly, densely forested region (
Figure 1) in the southern part of Como Municipality, surrounded by the city’s urban edges.
Numerous geoarchaeological sites, linked to a network of proto-historic settlements already active during the first millennium BCE, can be found in this area. Moreover, the hiking paths of the park, vital for accessing archaeological sites and enhancing the park’s accessibility, are tangible traces of the ancient communication routes between the various settlements. The manuscript is organized as follows:
Section 1.1 deepens the related works on LiDAR applications and DEM processing based on 3D data or imagery classification approaches, particularly comparing supervised and unsupervised methods.
Section 2, Materials and Methods, describes the airborne LiDAR dataset acquisition and preprocessing and presents the classification approaches for Digital Terrain Model (DTM) generation: filtering, deep learning (DL) strategies, and Machine Learning (ML) classification exploiting multiband raster composition.
Section 3, Results and
Section 4 Discussion, methodically describes the classification accuracy of the MLCs and the DLMs, evaluating and discussing the comparison of predictions with ground truth labels. Conclusions and future perspectives follow in
Section 5.
1.1. Related Works
The literature has highlighted that employing ALS data to structure semantic information is a powerful technology, especially within densely forested areas. In such a context, the efficient enhancement of landscape morphology investigation is strictly determined by tuning the methodological approach. Specifically, good domain-related knowledge and proper classification of the considered features are crucial aspects of this task [
22]. The discriminating factors related to the classification approach for deriving DTMs are mainly related to the nature of preliminary unstructured points and the complexity of the context and their microtopographic features [
23]. The definition of the semantic content of unstructured data is fundamental for interpreting and implementing surface object detection from digital elevation models (DEMs). The American Society of Photogrammetry and Remote Sensing (ASPRS) defines LAS specifications, describing the characteristics of the LiDAR sensor, whose recording returns data from the pulse embedded in the point cloud *.las format, as scalar fields, which can be useful for classification purposes [
24].
In the framework of DSM classification, filtering terrain, and off-ground objects, the literature has largely identified and tested a wide range of LiDAR-based methods and applications [
25], providing reference methods and integrated strategies for DTM generation based on filtering approaches [
26]. More recently, specifically for LiDAR-based DSMs, using data richness derived from the full waveform LiDAR has been demonstrated as an improved and promising approach [
27]. Defining and extracting land-surface parameters are the primary tasks for any classification workflow implementation [
28]. The richness of the semantic content of unstructured data plays an important role in point classification for reliable labeling of 3D points at different scales [
29]. Integrating semantic information from image-based data, as in Ref. [
30], contributes to workflow efficiency.
As introduced, the heritage and landscape domain has greatly benefited over the last decades from using remote sensing approaches to investigate and interpret anthropogenic evidence on terrain morphology. In this context, LiDAR data potential has been largely explored from both terrestrial and aerial points of view [
31,
32,
33]. In archaeological mapping, airborne DSM data have been used also for surface filtering and terrain extraction for 3D mapping and anthropic settlement analysis [
34]. Further, surface reconstruction from DSM and DTM classification is fundamental for extracting anthropogenic traces [
35]. In Ref. [
36], multiple visualizations of the DSM and multispectral data have been used to analyze and cluster the distribution of ancient Maya urban settlements in Mexico and Guatemala. Ref. [
17] proposes a pipeline for 3D semantic segmentation of UAS-based LiDAR data for microrelief classification and extraction of ancient structure traces.
In recent epochs, much attention has been devoted to the study of Artificial Intelligence’s (AI) contribution to optimizing classification pipelines and efficient learning tasks. Numerous Machine Learning (ML) and deep learning (DL) methods have been developed and tested to increase automation and ensure classification workflow accuracy. However, the countless variety of LiDAR or image-based methodological approaches developed in the literature in the past decades demonstrates that integrated approaches based on multi-source data prove to be a successful approach [
27].
The following paragraphs specify the two main categories of ALS data classification that the authors considered: image-based classification approaches using raster data and a 3D unstructured point cloud.
1.1.1. ML Classification Applications Based on Imagery Data
Although the automatic learning approaches have been explored in research fields involving raster-based data and 3D data processing, imagery classification on optical data and grid DSM framework have been more comprehensively explored, primarily based on remote sensing applications. The necessity to enhance the automation of image classification tasks led to the development of various methodologies, each based on the specific criteria assumed by different algorithm applications.
Considering the different categories of image classification methods deeply examined by Ref. [
37], the high efficiency of non-parametric, per-pixel, and supervised strategies for remotely sensed imagery and DSM has been established. Research of the literature demonstrates that AI techniques based on Machine Learning classifiers (MLCs) are particularly suitable for geospatial imagery [
38]. However, some recent studies have investigated ML classification techniques adopting an integration strategy on DSM using optical sensor data and ALS 3D data [
17,
39,
40].
Among the variety of available ML algorithms, such as random forest (RF) [
41], Artificial Neural Networks (ANNs) [
42,
43,
44], and support vector machine (SVM) [
45], it is possible to underline two extensively used approaches that will be implemented in the workflow of this present research (
Section 2.3).
ML classification approaches have been efficiently investigated for treating SAR data using SVM methods [
46]. To classify satellite data, Ref. [
46] presented the Sentinel-2 Time-Series, which exploited the Google Earth Engine computational platform. Moreover, [
47] effectively used SVM for hills micro-landform classification based on grid DTM.
A specific category of raster processing and interpretation, related to the present research, is the use of geomorphological layer (GML) methods, composite raster data, and visualization techniques (VTs) [
48]. It is identified for DSM analysis and classification tasks and employed as a training dataset for Machine Learning classification algorithms. Ref. [
49] presented an application of ML and multiple layer combination for a multicollinearity analysis for the classification of potential areas for groundwater, with an accuracy of almost 80%. Furthermore, as preliminarily investigated in Ref. [
21], ML and VTs have already been experimented with for supervised automated classification of micro-topography in Como Park.
Several specific disciplinary sectors have widely explored the potential connected to DL applications related to image classification. Recent studies have also investigated DL research for multi-class semantic classification, as in Ref. [
50], where starting from 1 billion masks in 11 million images, a classification model characterized by a significant generalization capability was developed, enabling the possibility to segment a massive variety of images with heterogeneous features. Concerning environmental studies, Ref. [
51] developed a solution to detect crop pest species, implementing semantic enrichment and comparing the results obtained using RGB and NIR UAS imagery. Also, heritage and archaeological domains have widely benefited from the advancement of technology in the automatic classification framework. For example, in Ref. [
52], starting from a manually annotated dataset, a Convolutional Neural Network (CNN) was trained to enhance the photogrammetric pipeline for the 3D reconstruction of heritage objects. Moreover, Ref. [
53] exploited an integration of CNN UNet-3 and Watershed algorithms while defining a methodological pipeline to generate a digital vectorized representation of decorative apparatus.
1.1.2. Integrated Classification Approaches of 3D Unstructured Data
The features and performance of 3D point-based classification methods differ from those of raster-based approaches for imagery classification, mainly due to the unstructured nature of the primary data [
54,
55]. In this regard, point cloud semantic segmentation and classification is a crucial task today, characterized by continuous technological and methodologic advancements [
56]. Several experiments have been carried out to enhance the heritage documentation field, where the semantic contribution is crucial for evaluating automation approach performances both in ML and DL directions [
57,
58].
In this framework, DL networks and unsupervised algorithms can implement learning tasks directly in the 3D point cloud data. Among the available solutions, the geometric voxel analysis defines one of the most diffuse approaches [
19]. A broad spectrum of unsupervised filtering algorithms related to geometric relationship features can be mentioned. The competition among them is mainly based on the accuracy of the results related to the context-related capabilities, the nature of the starting data, and the objectives of the application. In this context, a Simple Morphology Filter (SMRF) aims to segment the ground class points, operating a division of the point cloud into grids along an XY plane [
59]. Moreover, in Ref. [
60], a method based on the cloth simulation technique is used to extract the ground points in this framework. This algorithm has been implemented in Open-Source software platforms, such as CloudCompare v2.12.0.
A combination of geometric features can be used to recognize, extract, and label contour detector candidates from 3D point clouds based on geometric features (for example, for 3D modeling purposes [
60]). Today, this issue is crucial in overcoming the need for Digital Twin Cities (DTCs) [
61]. In this regard, several experiments on the ISPRS H3D benchmark dataset [
62] have been conducted to enhance the segmentation of urban 3D unstructured data and ensure the semantic content’s correctness in imbalanced class frequency datasets. For example, in Ref. [
63], the integration of generalized-class point-transformer segmentation and a refined object-based approach fully address the imbalanced class distribution problem. In Ref. [
64], a RF model was trained using geometric, radiometric, and echo support features derived from point clouds and mesh data.
Also, concerning supervised methods, several recent approaches explore DL techniques applied to 3D point cloud data, showing effective results [
65,
66,
67,
68,
69]. Following the necessity of DTCs to constantly update 3D point cloud urban data, in Ref. [
70], a semantic segmentation urban model was trained using the RandLA-Net architecture.
However, in this framework, a significant bottleneck is represented by the generation of reference data for the training and validation issues of a DL model. This task is demanding and time-consuming, requiring significant effort. Furthermore, the training process requires Python libraries that need powerful and updated graphic hardware (CUDA enabled). Memory consumption is thus a relevant issue while training a DL model [
71]. In Ref. [
72], a methodological example for training and validation data generation is addressed. Also, a reference dataset preparation strategy has been proposed in the literature for DL 3D point classification model evaluation, balancing the training and validation dataset in 80–20% [
73].
This research also addresses the experimentation of some of the most innovative semantic segmentation solutions of point clouds in the heritage domain context based on supervised deep learning approaches.
2. Materials and Methods
The integrated methodology proposed in this present contribution (
Figure 2) involves applying semi-automatic segmentation and classification approaches on aerial LiDAR datasets based on Machine Learning technologies through a combination of methods.
The aim is to adequately document the landscape morphology and to detect and classify ground-related features connected with anthropogenic landforms. The proposed integrated methodology thus aims to leverage heterogeneous 3D data to map these non-documented landscape morphological features, bridging the gap in the existing geographic datasets and updating and enriching the domain-related spatial database. In fact, within the complex framework of Cultural Heritage documentation, this work aimed at integrating comprehensive, current methodologies for the semantic classification of 3D point clouds [
58] and raster data [
38]. Therefore, the whole research is evaluated and validated, considering each approach from a holistic perspective.
After the first phase related to the data acquisition, the second part of the methodology consists of integrating unsupervised geometric filters with supervised Neural Network (NN) models for a generalized macro-class automatic segmentation of 3D point cloud data (
Section 2.2). The macro-classes are intended to be compatible with the ASPRS specifications, even if, in this case, an exception has been addressed. The vegetation class encompasses various entities other than buildings and terrain. The third step provides an overview of the application of ML classification techniques using unconventional non-optic imagery data, such as DTM and geomorphological analysis (
Section 2.3). The results of both approaches will be discussed in
Section 4.
2.1. Primary Dataset (Airborne LiDAR)
The research area, located on the southern side of the Como Lake, features a vast dimension (almost 36 km
2), with a dense and irregular forest coverage of the Spina Verde Park, and evergreen plants in certain areas resulting in varying densities of point clouds. The area elevation is between 164 m and 597 m, and it is featured by steep gradient slopes with 0° min, 66° max, and 13° mean (
Figure 3). Given these characteristics, the application of a low-altitude integrated LiDAR/photogrammetric flight survey was meticulously tailored to facilitate the efficient acquisition of comprehensive and uniform data and to achieve an optimal point density for the micro topography investigation. To mitigate the effects of dense vegetation on hilly terrain, the flight was performed before the spring season to minimize the vegetation growth. Moreover, as a terrestrial complement, an extensive survey campaign made by ground-based surveys has been planned to monitor individual archaeological sites. These ground-based data ensured a multi-scale approach and enriched the comparative analysis between aerial and terrestrial datasets, as already extensively discussed in Ref. [
21]. An integrated UAS photogrammetric and terrestrial LiDAR campaign was carried out to document the park landscape and its archaeological sites. In the specific framework of this research, the campaign primarily delivered GNSS measurements that were used to validate the ALS dataset.
The flight was performed using a Eurocopter/Airbus AS350 equipped with multi-sensor hardware composed of the Litemapper 6800 LiDAR System with a RiEGL LMS-Q680i full waveform laser head operating with a maximum LPRR (Laser Pulse Repetition rate) of 400 kHz at a 60° scan angle, with nominal laser beam divergence < 0.5 mrad. The aerial flight positioning and attitude computation were performed using an IGI AEROControl GNSS/IMU composed of a 256 kHz IMU and a full GNSS Septentrion. Finally, a 150 MP PhaseOne iXM-RS150F medium format aerial camera (50 mm focal length) for stereo pair images was equipped to the platform.
Since the external aerial LiDAR POD is engineered to be removed each time, a dedicated calibration flight is mandatory to obtain a reliable dataset. For this reason, after takeoff, a special pattern composed of eight cross-flight strips at different AGLs over the helipad base was executed to calculate a boresight calibration. The helipad was fully resolved from a topographic/geodetic point of view to furnish Ground Control Points, Check Points, and Tie Points useful for estimating boresight misalignments (different for each flight) and solving camera parameters.
The flight was planned to provide a 70–50% overlap in front and lateral directions, considering an average AGL (Above Ground Level) of 1600 ft (approx. 490 m). The flight pattern comprised eleven flight strips, nine with NNE-SSO and two orthogonally to them, used for on-site boresight calibration purposes. Two cross-strips were added over the survey area to check and improve boresight (roll, pitch, heading, and mirror scale) parameters. The area was overflown to obtain high spatial resolution images with a GSD between 3 and 4.8 cm/pixel. Finally, a dense point cloud was generated with an approximate density of 40–50 pts/m2 over border areas and more than 100 pts/m2 over the top hilly area (with smaller AGL). The average density of data is thus estimated at 75 pts/m2.
The trajectory on aerial LiDAR application in this case is solved in post-processing using Precise Point Positioning (PPP), i.e., when no GNSS ground station of a geodetic reference frame is present. The presence of a GNSS CORS (Continuously Operating Reference Station) over the survey area permitted to solve aerial N, E, and H separation (256 Khz IMU with two epochs per second) with an RMSE (Root Mean Square Error) of +/− 0.030 (ETRF2000—ellipsoidal). After that, to reduce the ellipsoid height to a local orthometric datum, an official geoid model provided by the IGM (Italian Military Geographic Institute) based on the ITALGEO2005 geoid model was applied (declared accuracy is +/− 0.035 m at 1σ or +/− 0.100 m at 3σ).
Validation was finally verified using 27 Ground Control Points (GCPs) taken with GNSS real-time positioning techniques all around the survey areas, used to check the plano-altimetric consistency of the point cloud. This was first presented in Ref. [
21]. In particular, a rubbersheet algorithm based on GCPs
Z-axis residuals was used to smooth the altimetric fluctuation of the point cloud (
Table 1). The GCPs were used as east–north constraints for the absolute orientation of the photogrammetric block. Consequently, a model keypoint ground surface was generated and used for image projection and the final orthomosaic generation.
2.2. Supervised and Unsupervised Approaches Macro-Class Classification Approaches for DTM Generation
Two different approaches for the classification of the ALS point cloud have been followed to generate effective and accurate DTM for the analysis of the landscape topology. The intention was to establish a methodological low-time consumption pipeline to semantically classify an ALS LiDAR point cloud into a highly generalized three-class model: ground, building, and vegetation [
24]. However, it is crucial to specify that the vegetation class was not intended to contain only the points referring to the vegetation but also all the other points remaining from the ground and building classes (e.g., powerlines). The first attempt used a Simple Morphology Filter (SMRF) filter for ground extraction and a tailored voxel-based analysis to filter the points pertaining to building and vegetation classes (see
Section 3.1.1). The second involved DL approaches to train a transferable classification model for airborne LiDAR point clouds (
Section 3.1.2). For the development of both methods, it was crucial to have access to adequately structured LiDAR data and process the dataset in such a way as to save computational power. Especially concerning DL approaches, as widely demonstrated by the state of the art [
67], using original-resolution LiDAR point clouds with global coordinates and non-normalized scalar fields can impact processing times and consume excessive memory.
2.2.1. Unsupervised Approach: SMRF and Geometric Features Filter Integration
Regarding the geometrical filter application, the objective was to evaluate the effectiveness of combining different unsupervised filters: the consolidated SMRF filter [
59] for the ground extraction step (
Figure 4) and a tailored geometric filter to distinguish between buildings and vegetation.
The SMRF filter divides the point cloud into grids along an XY plane. The lowest elevation value is computed within each grid element to create a minimum elevation surface map. Through an iterative segmentation of the surface map and the elevation and slopes’ calculation from an estimated DTM, it is possible to distinguish between ground and non-ground pixels referring to the optimal threshold for elevation difference and slope tolerance. The segmentation mask obtained from this process is then applied to the original point cloud, removing non-ground points. The integrated unsupervised filter uses the eigen-based geometric features of normal (λ
3) and curvatures (1 m radius neighborhood) to distinguish between points pertaining to vegetation or building classes. In this function, the variance in normal and curvature values was used to differentiate between buildings and vegetation. Buildings typically exhibit greater planarity conditions than vegetation, resulting in less curvature variation and relative normal differences among neighboring points in a certain cluster. In contrast, vegetation points are scattered, leading to higher curvature variations than buildings. The result of the filtering process returns a classified point cloud. The filtering process could then be optimized for every case study using eigen features to disambiguate with more accurate results. The final structure of the point cloud is
x,
y,
z,
I, and
C:
In this case, a reduced field structure has been conceived to evaluate a transferable model for ALS point clouds. One of the aims was to evaluate a strategy using the least number of parameters possible, avoiding storing radiometric and return pulse values.
The purpose of the steps previously described was to validate an approach to the use of DL methods evaluating an alternative to the use of unsupervised filters, as well as to assess the use of the filters themselves as a replacement or aid for manual segmentation in data preparation to train ANN classification models (DLMs) of ALS point clouds. In this specific case study, the authors evaluated the method’s reliability using the graphical user interface (GUI) solution in ArcGIS Pro 2.9 instead of a common code-based solution. Today, software houses like ESRI are deepening the research of integrating Machine Learning tools inside software to allow spatial data scientists to easily include GeoAI processing tasks in the same environment. Regarding the processing, two different workstations were used to train the models (
Table 2), both with CUDA-enabled NVIDIA RTX graphic cards.
2.2.2. Supervised Approach: Training and Validation Datasets
The first step in the supervised classification workflow is to train deep learning models (DLMs) aiming at converting unstructured 3D datasets into a semantically organized dataset. This dataset contains significant semantic information about the class of each point, which is used as primary data input for the ANN.
Two separate neural architectures were used in the training process: RandLA-Net [
67] and PointCNN [
66]. A CE (Cross-Entropy) loss function was used for both architectures that pertain to point-based approaches. As previously mentioned, RandLA-Net and Point CNN significantly differ for two reasons. Firstly, PointCNN is a point convolution network using only coordinates as the input, while RandLA-Net is a point-wise multilayer perceptron architecture. Moreover, PointCNN is particularly suitable for analyzing small-scale object clouds. At the same time, RandLA-Net is designed to efficiently process urban or territorial-scale data, incorporating a random sampling phase for each network node. For this reason, the dataset was employed, adopting different downsampling strategies and normalization techniques for the point cloud values.
The PointCNN models were trained using sub-sampled data (2 pt/m2) with a minimum distance sub-sample strategy, while the RandLA-NET models were trained using the original resolution data (75 pt/m2). The main reason for adopting this sub-sample strategy was to reduce the inferential training time and memory consumption. Additionally, the intention was to evaluate the methodology’s scalability using a strong sub-sample strategy. Moreover, the classification of “older” datasets with lower density >5 pts/m2 represents a crucial issue in the environmental and heritage landcover mapping framework.
One of the aims was also to evaluate the impact of geographical coordinates in the training processing. Despite previous research experiences, this aspect has been considered extremely important to avoid memory exceeding failure [
57]. The training data were thus generated using shifted classified point clouds resulting from the integrated unsupervised filter, where some checks and few corrections related to the building class were applied. In this case, the aim was to limit the manual operator intervention, working on a reduced dataset that could be used as a training dataset (
Table 3).
Since the park area is characterized by dense vegetation, the T-V-a dataset was chosen to adequately represent the class distribution of the area. In fact, it is possible to observe from
Figure 5 that the most populated class is the one pertaining to high vegetation. The class imbalance issue observed in this dataset and its generalized class scheme is a well-known problem. This issue also arises when working with denser datasets [
62], where there is a need to provide a less generalized class schema [
74].
Two final predictive DLMs were thus trained for T-V-a and will be discussed in
Section 4.1:
Model 1, trained original resolution data (architecture: RandLA-Net);
Model 2, trained using sub-sampled data (architecture: PointCNN).
2.3. MLCs for Grid-Based DTM Classification
The following part of this present research develops a methodological approach based on 2.5D raster data from a previous work. It is worth understanding that this previous research [
21] was based on assessing the feasibility of using dense terrain models for feature extraction and, consequently, semantic segmentation aimed at automatically recognizing anthropogenic shapes (namely, trails). In this regard, as mentioned in
Section 1.1, employing ALS data represents a powerful technology to enhance the study and the analyses of the sites’ landscape morphology within densely forested areas [
22]. Moreover, even if in recent years it was stated that an image-based approach is particularly powerful in the framework of classification and semantic segmentation tasks, working in a densely forested area does not allow—in this present research—for the use of a photogrammetric approach for the generation of a dense surface model, in favor of ALS data [
22]. Thus, starting from the semantic segmentation of the ALS point clouds presented in
Section 2.2, an accurate DTM with a 20 cm Ground Sampling Distance (GSD) was generated (
Section 2.3.1), and geomorphological layers (GLMs) were exploited as primary data for ML classification approaches to improve the level of automation and quality in detecting the human-made terrain features (
Section 2.3.2).
2.3.1. DTM and Geomorphologic Analyses Preparation
Generally, various features related to the ground surface can be detected, such as trails, human earthworks, and similar elements present, even if not well represented, in the raw DTM. Considering the research aims, the interpretation of visual features related to the ground surface, such as anthropogenic elements, represents a relevant challenge. Therefore, the issues connected to difficulties in identifying and interpreting these visual features became evident when studying the DTM. In fact, it was not easy to unambiguously detect these features in ALS point cloud data and the subsequent raster interpolation of DTM. It is indeed clear from
Figure 6 that both the orthoimage and raw DTM were not helpful in reaching the research aims. Concerning the RGB orthoimage, the vegetation covers almost the entire underlying terrain, even though aerial photogrammetric acquisitions have been performed during the leafless period of the year.
To address the issues related to ground feature identification, an investigation was carried out to generate geomorphological raster datasets that could enhance the interpretation of traces in the terrain morphology.
Different geomorphologic analyses have been conducted to determine the most significant geometrical and numerical signatures, with the aim of detecting the significant ground shapes (
Figure 6). The shaded visualization, as well as the slope, aspect, and flow direction analyses, were generated from the DTM surface.
Moreover, from a Focal Statistic (FS) analysis on a five-pixel neighborhood cluster of the DTM, a Relative Roughness index r
i [
75] raster was estimated using the raster calculator.
From a spatial analysis of the 3D point cloud, the direction value of the magnetic north was calculated for each point. This function calculates the orientation angle for each point using the supplementary attributes of magnetic north vectors acquired during the flight [
76]. The resulting scalar field is then interpolated to create a shaded false-color visualization. From this visualization, human artifacts such as trails become more visible. This method is considered an effective alternative to simple DTM Hillshade, which often fails to highlight man-made paths, especially under vegetation.
From the visual inspection of the geomorphological analyses (
Figure 6), it is evident how specific ground features such as trails, depressions, and rock formations are detectable. On the contrary, the elevation model and the orthoimage have not proven suitable for this type of task since the searched features are not unambiguously evident. Therefore, geomorphological layers (GMLs) were used as training datasets for applying Machine Learning (ML) classification algorithms.
After evaluating the classification accuracy (
Section 3.2) using the individual geomorphological analyses, several composite raster datasets were generated by combining the GMLs (
Figure 7).
In this case, the aim was to assess whether generating these composite rasters could enhance the classification effectiveness. For this reason, five composite raster data sets were generated. The combination of GMLs in each composite raster is reported in
Table 4.
2.3.2. Geomorphological Layer (GLM) Data Preparation Strategies for ML Classifiers
Since it has been established that Machine Learning classifiers (MLCs) are particularly suitable for geospatial imagery [
38], the integration of Machine Learning techniques within the methodological workflow was tested. However, in contrast to what has been developed so far, some recent studies [
17,
39,
40] have investigated ML classification techniques adopting an optical sensor data integration strategy with digital elevation model data. In this step of the pipeline, the goal was to achieve a high level of automation for the macro-scale land-cover mapping task, which was the ultimate objective of the proposed methodology, thus adopting the use of MLCs. These classification algorithms belong to the supervised learning category, where a predictive model is generated from an input set of training samples. Additionally, it is widely acknowledged that MLCs can handle large datasets effectively, producing more accurate results for complex data than parametric classifiers [
38]. Parametric classifiers, such as maximum likelihood, require the assumption of normal data distribution and are slightly influenced by training data. Moreover, the landscape’s morphology complexity led to noisy classification results [
37]. Thus, this study tested two different nonparametric, per-pixel, and supervised Machine Learning classifiers to reach the desired aim.
SVM is a Machine Learning method based on supervised statistical learning theory exploiting the theory of small samples [
77]. For classification learning tasks, SVM evolved as a nonlinear probabilistic algorithm focusing on the distribution and distinction of training samples (support vectors) among two classes based on optimal hyperplane [
45]. SVM thus efficiently suits remote sensing applications for high-dimension pattern recognition and binary classification tasks [
77].
RF combines multiple Decision Tree classifiers (DTs) [
41]. Each DT uses a random sample of the training data to reduce the variance, finally assigning a unique class. The final prediction is then assigned by considering all the DT results through the majority voting method.
As demonstrated, the training samples have a crucial role in the effectiveness of the MLCs, even if there is not a generalized model for the training data preparation. However, it can be assumed that the numerosity and size of training samples must be planned accurately, depending on the classification algorithm, the number of class variables, the quality of the primary data, and the variety within the whole spatial extension [
38]. The area and the support vectors that have been used during the training data preparation are displayed in
Figure 8. For this task, a predefined dual-class classification scheme was employed to categorize anthropogenic landforms (namely trails) and the ground. The first objective was to minimize human involvement in the labeling process while maximizing label coverage across all regions of the training area.
The main aim was to optimize the labeling process by using the minimum number of samples up to the recommended minimum [
38]. In this sense, the number of labels is >10 per class, and the training samples’ class frequency is coherent with the distribution of the features within the raster extension (
Table 5). The results of this approach will be discussed in
Section 3.2.
3. Results
Following the classification processes outlined in
Section 2, the performance of the predictive models and algorithms was evaluated to assess their effectiveness. The classification accuracy of the DLMs, unsupervised filter (
Section 2.2), and the MLCs (
Section 2.3) was evaluated by comparing the predictions with the ground truth labels. Starting from the data reported in the confusion matrices (true positives, true negatives, false positives, false negatives), the following metrics [
78] have been estimated to evaluate the achieved results: Accuracy (2), Precision (3), Recall (4), F1 score (5). The metrics have been calculated as follows:
3.1. Point Cloud Macro-Class Classification Evaluation: Unsupervised and Supervised Approaches
This section reports the results relative to the methods developed in
Section 2.2. The performances of the DLMs (trained with the T-V-a dataset) and unsupervised filters were evaluated using the confusion matrix derived from the validation dataset. Finally, the DL models were tested on four different datasets, and the results are available in
Appendix A.
3.1.1. Unsupervised Geometric Filter Results
Concerning the application of the integrated filtering approach based on the SMRF filter (ground) and geometric feature-based filter (off-ground) for deriving the three classes (
Section 2.2.1), the following results can be analyzed.
Considering the unsupervised approach (
Figure 9), positive outcomes have been experienced for terrain and vegetation classes. In contrast, the building class is the one characterized by the worst results in terms of metrics (
Table 6). The high Accuracy metric achieved (≈99%) is due to the heavy numerical imbalance of points belonging to this class. For this reason, the very high number of true negatives (10,776,575 out of 10,990,089, ca. 98% of total points) significantly affects the Accuracy of this class. However, considering also the Precision, Recall, and F1 score, it can be observed that the building class is the one characterized by the lowest performance. This highlights the difficulties of the employed unsupervised algorithm in unambiguously recognizing the considered class. Overall, the results can be considered adequate for the accurate generation of a digital model of the terrain. Specifically, a sensitive evaluation of the results for individual classes demonstrates how the SMRF filter [
59] manages to apply good generalization in ground points. However, cases of under-prediction are observed in some areas (Precision ≈ 76%).
Integrating a geometric feature filter algorithm allows for disambiguation between building classes using only the eigenvalue of normal (λ
3) and geometric values of sphericity, which was then evaluated. In this case, the significant imbalance between the two classes does not lead to easily interpretable results, especially regarding points related to buildings. However, it is evident from
Table 6 that the vegetation class does not demonstrate an optimal sensitivity (Recall ≈ 86%) if compared to the one achieved for the ground class (Recall ≈ 99%). As shown in
Figure 10, a high percentage of over-prediction can be observed, particularly localized on building roof areas with higher noise and characterized by architectural objects deviating from the average surface (e.g., chimneys). Similar behavior occurs in the case of the vertical elements related to facades, which are generally less dense and noisier than the horizontal ones. Due to this topological difference, the points pertaining to these surfaces are less assimilable as belonging to a plane and are thus associated with the vegetation class.
However, considering that the main objective was to segment ALS data in order to evaluate the effectiveness of the unsupervised integrated filter, this objective is not precluded or influenced by the results obtained. As observable by the evaluation metrics related to the ground class and reported in
Table 6, this step of the proposed methodology represents a valuable solution for generating accurate DTMs.
The classification results of the unsupervised filter have been thus evaluated with a visual inspection of a human operator and, where necessary, refined in order to be used as the primary dataset to train a DL classification model.
3.1.2. Deep Learning Models Results
As stated in
Section 2.2.2, two DLMs were trained from the finally selected T-V-a training dataset (
Figure 5): Model 1 and Model 2. While Model 1 was trained using the original resolution dataset, Model 2 was derived instead from a sub-sampled dataset.
The DLM 1, trained with the RandLA-Net architecture, stopped on the 25th epoch after approximately 7 h, resulting in an overall accuracy of 92% (
Table 7) observed in the classified validation dataset (
Figure 11).
Considering the comparisons between Ground Truth, the predictions of each area, and the results of individual classes [
Table 7], the sensitivity (Recall) of the ground class appears lower compared to the vegetation class. At the same time, the specificity (Precision) is higher, contrary to what happened with the use of unsupervised filters (
Table 6). As for class 6 (buildings), the results did not meet the desired expectations. This is probably due to the limited number of points belonging to this class, and although part of the training dataset is related to an urban area, this led to an imbalance in the training data.
The DLM 2 (derived from the PointCNN architecture) finished the training in approximately 27 h at the end of the 25th epoch. Like DLM 1, DLM 2 was then evaluated on the validation dataset (
Figure 12) and tested on four different areas (
Figure A2). The performances were statistically analyzed regarding the overall and individual class evaluation metrics (
Table 8).
The evaluation of Model 2 was carried out following the same methodology used for Model 1 (RandLA-Net classification model). From the evaluation metrics reported in
Table 8, the ground class is more sensitive to over-predictions, unlike class 5 (vegetation), which is generally under-predicted. The metrics observed for the “building” class are consistent with the results obtained with Model 1, highlighting that this class is generally not predicted adequately in both cases. This is evident from the evaluation metrics—the lowest among the considered classes in the case of both models’ predictions.
Considering the achieved outcomes of unsupervised and supervised approaches for the DEM generation, a further discussion related to the comparison of the filtering algorithm and the validation of the trained DL semantic segmentation models is proposed in
Section 4.1.
3.2. Evaluation of MCLs for Grid-Based DTM Classification
This section reports the results relative to the methods developed in
Section 2.3.
In order to assess the most effective classification technique for detecting the anthropogenic forms of the terrain, the MLC outcomes were analyzed for each prepared geomorphological layer. According to the error-test pipeline, this step was crucial to assess the suitability of the GMLs and to determine which one was the most effective as primary data for classification. Moreover, another significant aim was to evaluate the effectiveness of combining various geomorphological analyses as an alternative to the use of individual bands. An initial assessment of the evaluation metrics was carried out for the RF classifier, and the results are presented in
Table 9, where it is possible to observe that Hillshade GML is the best-performing one. Yet, it is the only one to reach sufficient metrics (all the observed values are approximately ≥50%). However, as observable from the comparison between the evaluation metrics reported in
Table 9 and
Table 10, the SVM classifier outperformed the RF approach. However, the RF classifier demonstrates a slight training time superiority over the SVM. The elapsed time for RF is usually below 10 min, while SVM models are trained in 13 to 15 min.
Since the superior effectiveness of the SVM over the RF classifier has been confirmed, further analyses were addressed. In particular, it can be noted from the SVM results that several GMLs reached sufficient metrics and, therefore, can be considered suitable for this kind of classification (Aspect, Flow Direction, Hillshade).
Similar to the analyses carried out in the previous paragraphs, the evaluation metrics were also compared to evaluate the employed classifiers and, specifically, which composite raster performs better for the aimed classification task. Considering the evaluation metrics reported in
Table 11 and
Table 12, it is evident that the SVM classification results outperformed again the RF results. While the Recall results are comparable, a significant improvement in terms of Accuracy (+16%), Precision (+17%), and F1 score (+19%) was observed.
From the evaluation metrics reported in
Table 11—and from a visual inspection, as observable in
Figure 13—it is evident that the RF classifier applied on the composite raster datasets has not been effective for detecting the pixels belonging to the Trail class.
In fact, the classification of composite images achieved results characterized by approximately the same order of magnitude. None of these results outperformed the others, but overall, the metrics are insufficient. In particular, the Trails class is consistently over-predicted, resulting in a high level of noise after the classification task.
Regarding the SVM approach, a slight under-prediction can be observed, as indicated by the relatively low Precision value, which is always the lowest metric across all the other values. Generally, the data that led to the best results adopting the SVM approach are composite e (Hillshade, Hillshade, Aspect). This result is consistent with what was observed while analyzing the classification task on the individual GMLs, where the bands that enabled the achievement of better generalization were Hillshade and Aspect.
Contrarily, this does not apply regarding the RF approach, as the classification carried out on the Hillshade GML produced only adequate results, effectively outperforming the strategy based on the use of composite images.
Considering the achieved outcomes of the MLCs on the training area, a further discussion related to the validation of the trained ML predictive models is proposed in
Section 4.2.
5. Conclusions
Finally, consideration can be given to the lessons learned and the opportunities offered by integrating the various proposed methodologies and also some identifiable bottlenecks.
This study underscores the significance of integrating diverse approaches while critically assessing their performance and applicability. In this context, valuable insights can be addressed, contributing to advancements in remote sensing data analysis, environmental monitoring, land use planning, and archaeological research. Moreover, the flexibility of the proposed integrated methods can also be underlined. While the proposed research focuses on detecting historical trails through the generated predictive models, it should be noted that this strategy could yield comparable results for identifying different types of artifacts (e.g., terraces or other archaeological evidence). The versatility of the explored technologies could provide a powerful investigation tool for researchers working in the archaeological and landscape heritage framework.
In the proposed pipeline, the semantic enrichment of 3D unstructured data represents a valuable opportunity to enhance automation in the processes related to landscape and terrain morphology knowledge. Here, the results obtained from the DL approach highlight how ANN represents a valid and powerful alternative to the more consolidated geometric unsupervised filter. While the geometric-based filters are effective compared to the DL approach, they are strictly dependent on the topography and the vegetation of each considered site, as well as the urban and architectural morphology. Supervised methods, such as ANN-based methods, can achieve higher levels of abstraction and, consequently, higher generalization capability, highlighting the significant perspectives associated with using this technology.
Moreover, the supervised methods allow for less generalized semantic classes to be assigned, leading to more effective segmentation. However, it should be underlined that ANNs require significant structured data for the training. Thus, generating an adequately structured training dataset can be a time-consuming and demanding bottleneck. Recently, this issue has been addressed, and several solutions have been proposed to overcome the criticality represented by the generation of the training dataset; for example, generating an artificial training dataset [
79,
80], using open-source benchmark datasets [
62,
72], or exploiting pre-trained ANN for fine-tuning purposes [
81,
82]. Generally, this research direction represents one of the main perspectives for future application development.
Instead of considering the raster-based classification approach and evaluating the performances of the two MLC algorithms, as expected [
38], the proposed methodology addressed the overall superiority of the SVM classifier compared to the RF classifier. However, it is worth mentioning that both the employed MLCs represent an adequate approach for developing automation solutions for the classification aims.
Moreover, it is worth emphasizing that the recognition of anthropogenic forms characterized by a high level of granularity represents a challenging task not only for supervised or unsupervised methods but also for traditional manual classification operations performed by a domain expert image analyst. In the framework of this type of landscape and urban-scale analysis, when a classification and detection task is required, a significant imbalance between classes is not uncommon due to the granularity of the features, as in the case of this present research. In this regard, it should be underlined that the effectiveness of unsupervised geometric filtering methods is not affected by class imbalance, contrary to what happens in supervised methods such as ML and DL. Balanced classes are preferable for supervised approaches to obtain optimal results [
44]. However, an imbalanced distribution of class population can lead to difficulties in the interpretation of metrics, as for classes composed of a limited number of samples, a few incorrect predictions can heavily influence the values of evaluation metrics. In fact, despite relatively low metric values (≤ 60%), upon visually inspecting the classified images, it is worth underlining that the classification models adequately detected and classified the anthropogenic forms in the raster datasets.
Another aspect that deserves attention is the high costs of this type of sensor, i.e., the use of a full waveform ALS solution applied to the landscape heritage documentation framework. It enables the penetration of the sparse vegetation and thus its suitability in areas similar to the Spina Verde. Nevertheless, this limitation can be overcome by the recent development of low-cost UAS-based LiDAR commercial solutions (e.g., DJI Zenmuse L1). These systems have been increasingly used in recent years and can guarantee greater accessibility, ensuring sustainable data acquisition for geomorphological analysis purposes. However, the penetration capabilities of these more compact solutions are still being evaluated and validated in the literature.
Moreover, a reflection is crucial in training data preparation, to generate the minimum number of support vectors [
38]. The main aim is to minimize operator manual activities while enhancing automation in classification processes. In this framework, the proposed methodology could represent a significant contribution.
Furthermore, regarding the future perspectives of this research (
Figure 18), one of the most interesting directions is the possibility of applying the tested methodology to different case studies. These differ in terms of extension, density, and morphology. As evidenced in
Section 4.1, the pipeline is suitable also for sparse low-scale LiDAR datasets.
Finally, considering the effectiveness of the trained classification models in detecting point classes related to anthropogenic forms and the necessity in the framework of heritage studies of enhancing automation processes, an interesting perspective is represented by the possibility of exploiting specific databases related to heritage and archaeological domains to train and validate predictive models for the automatic object detection—and semantic enrichment—of cultural and anthropogenic assets.