1. Introduction
This work focuses on mapping a particular type of archaeological site, burial mounds, using a deep learning approach and remote sensing data. Burial mounds are typically characterized by a rounded geometric shape that contrasts with their surroundings. They often have a central depression, a distinctive feature that helps in their identification. This work takes a crucial step forward by incorporating field validation of AI inferences, building on existing research that addresses the potential and limitations of AI-based burial mound site detection [
1,
2,
3,
4,
5,
6,
7]. Deep learning uses complex algorithms to automatically identify patterns in vast numbers of data, especially images. This is crucial in archaeology, where manually identifying archaeological features from remote sensing data can be time-consuming. Archaeologists are employing these algorithms to detect archaeological sites and features in various data types [
8,
9,
10,
11,
12,
13], such as LiDAR data, multispectral, hyperspectral, and satellite aerial imagery. By allowing them to identify and map potential archaeological sites or features, the output of such algorithms allows archaeologists to plan targeted surveys, saving both time and resources. Despite these advantages, it is important to acknowledge limitations. Due to the inherent variability of archaeological data, these algorithms can generate a significant number of False Positives, where natural formations or modern structures are misinterpreted as archaeological sites. Therefore, integrating deep learning with the expertise of archaeologists remains crucial to accurate interpretation and successful application [
14]. Distinguishing between archaeological features and other natural or artificial shapes is a challenge not just for AI, but also for humans. This limitation contributes to the high number of False Positives seen in AI-based methods to uncover archaeological sites and artifacts. Researchers are actively addressing this issue by refining the models and increasing the quality and quantity of available data. We hypothesize that by conducting fieldwork validation of AI-generated inferences, it is possible to gain crucial insights about the topographical context of both True Positives and False Positives. For example, field validation can reveal vegetation patterns or subtle changes in soil composition associated with True Positives, differentiating them from natural features misinterpreted as burial mounds. This knowledge can then be used to refine the algorithms for future archaeological surveys. The primary objective of this work is to iteratively refine the algorithm designed to detect burial mounds. This refinement process is based on insights gained from experienced archaeologists during field surveys and digital validation. The main goal is to enhance the algorithm’s performance and address the recurring issue of False Positives often encountered when deploying machine learning models on challenging aerial imagery. This combined field survey and data analysis approach has the potential to revolutionize archaeological site detection through AI, allowing faster, more accurate identification of burial mounds with a significant reduction in false positives.
Contextual information about this work is presented in
Section 2. The statistics and knowledge originating from the validation of the fieldwork are outlined in
Section 3. The algorithm refinement based on the knowledge obtained from fieldwork validation is discussed in
Section 4. The results obtained from the refined algorithms applied to the Alto Minho and Barbanza regions are presented in
Section 5. The discussion is presented in
Section 6. Finally, the concluding remarks are formulated in
Section 7.
2. Method
Building on our previous work, where we proposed a machine learning pipeline to uncover burial mounds in the Alto Minho region of northern Portugal [
7], this work is a deeper investigation. The LiDAR data used for Alto Minho (2.220 km
2) have a point cloud density of 2 points per square meter. Without conducting any reclassification or manual correction, the pre-existing classification of the point clouds available for this work was utilized. This was followed by TIN interpolation. In summary, from the classified LiDAR point clouds, 1-meter Digital Terrain Models (DTMs) were extracted and further divided into four tiles to facilitate analysis. From these, Local Relief Models (LRMs) [
15,
16] were generated to enhance the visualization of archaeological microtopographies, considered a robust and consistent visualization technique [
17,
18] for detecting burial mounds. We annotated around 276 known burial mounds [
19,
20] and automatically built an image dataset. The dataset was then augmented using a copy and paste data enhancement technique [
21] and used to train an object detection algorithm, namely, You Only Look Once (YOLO) [
22]. This algorithm was then deployed in the Alto Minho region, and all inferences went through a post-processing validation step. The post-processing part is responsible for removing potential False Positives. It is equipped with an algorithm inspired by the Location-Based Ranking (LBR) algorithm proposed by [
23]. The LBR assumes that the location of archaeological sites in the landscape is not random but is the result of certain characteristics of the past and present environment. Therefore, inferences that are located in improbable locations are discarded. Furthermore, post-processing is equipped with a Local Outlier Factor (LOF) [
24] algorithm which was trained on the raw LiDAR point clouds to remove inferences that do not present a similar 3D morphology to the burial mounds. The proposed work produced 648 burial mound inferences, which was a drastic 81% reduction from the original 3417 inferences. This reduction resulted from the deliberate attempt to mitigate False Positives, which was aligned to provide archaeologists with reliable inferences. This ensures that archaeological missions have a higher chance of discovering sites and features of archaeological relevance. Following this, four archaeologists with experience in remote sensing digitally confirmed 470 of the 648 features identified as potential burial mounds. More details can be found in our previous work [
7].
Typically, most works in the literature with similar endeavors conclude their work at this point, which was also the case for our previous work. However, we believe that obtaining additional empirical knowledge about the relationship between landscape and topographical context and AI behavior is possible by performing fieldwork validation. This knowledge can be used to refine models and algorithms. With this iterative process, it is conceivable to achieve increasingly reliable inferences, mitigating the False Positive problem. Therefore, four tumuli clusters that covered each of the four tiles were randomly selected for fieldwork validation in the Alto Minho region, containing 237 of the 470 digitally validated burial mounds. Two archaeologists with field survey experience conducted ground truth validation.
4. Algorithm Refinement
Based on the knowledge obtained during the fieldwork validation and described in
Section 3, the algorithms were refined.
Figure 2 presents a diagram of the proposed algorithm refinements.
To test the proposed algorithms, the Alto Minho and Barbanza regions were selected. The latter was chosen because we have data on True and False Positives, validated through fieldwork.
Figure 3 illustrates the LRM of Alto Minho, including the original 648 inferences and the 237 inferences covered in the fieldwork validation.
As for Barbanza, it is a deeply studied region, as most burial mounds are known, and other works in the literature also detected this type of archaeological site with AI [
6], rendering it adequate for comparative analysis.
Figure 4 illustrates the LRM of Barbanza, including the 164 inferences resulting from the methodology proposed in our previous work, using the high-resolution land use and land cover information system of Spain [
29] (SIOSE) for the LBR block. The chosen year for SIOSE was 2011 because the first LiDAR survey by IGN-PNOA covering the Barbanza region (450 km
2) was conducted in 2010. These LiDAR data have a point cloud density of 1 point per square meter. Without conducting any reclassification or manual correction, the pre-existing classification of the point clouds available for this work was utilized. This was followed by TIN interpolation.
As discussed in
Section 3, some False Positives were identified in areas with steep slopes. Taking into account this fact, the slope maps [
30] were derived from the LiDAR-derived DTMs. Essentially, the slope is the first derivative of the DTM, representing the rate of elevation change for each pixel at angles from 0° to 90°.
Figure 5 illustrates the slope maps of Alto Minho and Barbanza.
Slope maps were used to filter potential False Positives in areas characterized by significant topographic gradients. To achieve this filtering, the mean slope within a predefined neighborhood surrounding the inference was calculated. This neighborhood is defined by expanding the original bounding box of the inference by a specified distance (25 pixels) in all directions.
Figure 6 illustrates this concept.
To characterize the typical slope of the terrain surrounding burial mounds, we leveraged the 276 known and annotated instances used to train the models in our prior research.
Figure 7 illustrates the histogram obtained.
By establishing the characteristic slope of the terrain surrounding known burial mounds, we can eliminate inferences that exceed a predefined inclination threshold.
Fieldwork validation has revealed a subcategory of False Positives that present a greater challenge for removal due to their inconspicuous characteristics. To address these challenges, we propose to take advantage of the Vision Transformer (ViT) model [
31] for image classification as the final filtering stage. ViT models are a recent technology that essentially replaced the reliance on Convolutional Neural Networks (CNNs) for image processing tasks with transformers. A transformer is a deep learning architecture characterized by a self-attention mechanism responsible for weighing the importance of each element in the input sequence in relation to the others [
32]. This technology has seen great success in the context of natural language processing [
33], and ViT models are also matching or exceeding the state of the art in many image classification datasets [
34]. Due to the overall success of this technology conveyed in the literature, a ViT model may find some discernible characteristics within the False Positives that could have been challenging to identify during fieldwork validation. To test this hypothesis, an image dataset containing two classes was built: burial mounds and False Positives.
To obtain the necessary samples, the known and annotated 276 burial mounds of Alto Minho were considered for the burial mound class, and the 178 inferences that were digitally invalidated in our previous work were considered for the False Positive class. In this digital validation, four expert archaeologists visually classified the inferences as potential True Positives and False Positives. They used QGIS and were assisted by LiDAR-derived LRMs, Google Satellite images, Bing Aerial images, and aerial images from Direção-Geral do Território (DGT), the Portuguese territorial institution. The images were dated 2021, 2018, 2004–2006, and 1995. The RGB (red, green, and blue) bands of the DGT 25 cm orthophotos from 2021 were used [
35] to crop 224 × 224 images of burial mounds and False Positives. The selected orthophotos are more recent than the LiDAR data used, as they are significantly better quality.
Figure 8 illustrates samples of the dataset.
To increase the size of the dataset and achieve class balance, data augmentation was performed. Seven geometric transformations were considered: flip left to right, flip top to bottom, 90° rotation, 180° rotation, 270° rotation, transpose, and transverse.
Figure 9 illustrates the proposed data augmentation applied to a False Positive sample, and
Table 7 presents the dataset achieved that was used to train the ViT model.
A ViT model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224 × 224 [
31] was fine-tuned with this dataset. The optimizer used was AdamW [
36] (
= 0.9,
= 0.999,
= 1 × 10
−8) with a learning rate of 5 × 10
−5. The image resolution was set to 224 × 224, the batch size was set to 32, and the model was trained for 25 epochs, saving the best weights. The training was performed with an Nvidia GeForce RTX 3080 10 GB GDDR6X GPU and an AMD Ryzen 5 5600X 6-Core 3.7GHz CPU (Santa Clara, CA, USA), and it took 572 s. The best iteration achieved a validation accuracy of 0.91 and a validation loss of 0.43. The training of the model could be hindered by the challenge of distinguishing burial mounds and False Positives in some orthophotos, as they are obscured by dense vegetation.
Figure 10 illustrates some of these challenging samples.
6. Discussion
In
Section 5, the results obtained from the proposed inference pipeline are presented. This study introduces a fieldwork validation methodology, detailed in
Section 3, aimed at validating burial mound inferences from our previous work [
7]. The valuable insights provided by the experts who conducted the fieldwork and the digital validation enabled the algorithm to be refined by incorporating a slope filter and a ViT model. This refinement led to a significant increase in the F
1 score across both study regions discussed in this manuscript. This improvement showcased the importance of bridging the gap between archaeological expertise and machine learning to address the prevalent problem of False Positives resulting from aerial imagery processing.
However, despite this improvement, it is essential to consider the generalizability of this work. First, both studied regions share a similar topology, and the morphology of burial mounds is also comparable. Thus, even though the YOLOv5 and ViT models were trained exclusively on data from Alto Minho, they demonstrated reliable generalization when making predictions on data from Barbanza. However, this may not necessarily hold true if these models predicted data with different topologies and burial mound morphologies. In such a scenario, fine-tuning the models for this new data type would be necessary, a step that was not required in this work. Furthermore, despite the similarity in topology between Alto Minho and Barbanza, a significant disparity was observed in the slope of the areas that house these burial mounds, as shown in
Figure 11 and
Figure 12. These areas housing burial mounds in Barbanza are flatter than Alto Minho, making it more difficult to set a threshold for the slope filter that can be efficiently applied across regions.
Another notable point is the endless possibilities for post-process optimization of the results, as illustrated in
Figure 2. This diagram has the potential to expand further in line with emerging archaeological insights. At some point, this growth becomes unsustainable for any Geographic Information System (GIS), as it would necessitate handling numerous types of data and implementing systems for automated processing. Archaeological data are extensive and resource-intensive, which could cause storage and processing time issues. For further investigation in this field, a model-centric approach should be considered. This involves upgrading models to the most suitable and up-to-date versions for the data type at hand, exploring methods to enhance model performance to minimize the need for extensive post-processing. Furthermore, leveraging archaeological expertise to enrich training datasets could better guide models in their search for archaeological sites. Exploring digital validation methodologies is also essential, particularly considering the time and financial constraints associated with fieldwork validation. These methodologies must address these challenges and equip machine learning models with the required archaeological expertise to effectively detect archaeological sites.
7. Conclusions
This study investigated the powerful synergy between AI and fieldwork validation in archaeology, specifically focusing on the detection of burial mounds. This collaborative approach offers a promising solution to the well-documented issue of False Positives arising from AI-based detection of archaeological sites and features. The fieldwork validation detailed in
Section 3 provided critical knowledge about the landscape context of both True Positives and False Positives produced by our previous work when detecting burial mounds [
7]. Building on these insights, the algorithm proposed in our previous work was refined on the basis of the empirical knowledge gained from fieldwork validation. The data resulting from this validation, including information on True Positives and False Positives, enriched the datasets used to fine-tune the machine learning models for future archaeological surveys [
9,
38]. We used slope maps derived from LiDAR-derived DTMs to eliminate inferences in high-slope regions. In addition, a Vision Transformer (ViT) model was trained on digital orthophotos of confirmed burial mounds and previously identified False Positives. This ViT model acts as a final filter for removing False Positives. These refinements led to a significant improvement in the algorithm’s performance in both regions. The precision in Alto Minho increased from 27% to 47%. The F
1 score in Barbanza increased from 31% to 37%. A similar work conducted in Galicia [
6] obtained an F
1 score of 29% when identifying burial mounds in the Barbanza region, which is 8% less than the proposed approach.
In reality, the results shown could be better. In Alto Minho, potential burial mounds classified as Uncertain in
Section 3 were not considered in the calculations, as they still require further validation. In Barbanza, every detection that did not coincide with the known burial mounds contributed as a False Positive in the calculations, although some promising inferences could, in fact, be burial mounds. Although this work offers valuable improvements, the field requires continuous research and development. Using the knowledge gained from fieldwork and the digital validation of AI algorithms, we can significantly enhance this iterative process. This collaborative approach has immense potential to revolutionize archaeological research and cultural heritage management, leading to significant improvements in the accuracy of automatic site detection.