Locating Charcoal Production Sites in Sweden Using LiDAR, Hydrological Algorithms, and Deep Learning

: Over the past several centuries, the iron industry played a central role in the economy of Sweden and much of northern Europe. A crucial component of iron manufacturing was the production of charcoal, which was often created in charcoal piles. These features are visible in LiDAR (light detection and ranging) datasets. These charcoal piles vary in their morphology by region, and training data for some feature types are severely lacking. Here, we investigate the potential for machine automation to aid archaeologists in recording charcoal piles with limited training data availability in a forested region of Jönköping County, Sweden. We first use hydrological depression algorithms to conduct a preliminary assessment of the study region and compile suitable training data for charcoal production sites. Then, we use these datasets to train a series of RetinaNet deep learning models, which are less computationally expensive than many popular deep learning architectures (e.g., R-CNNs), allowing for greater usability. Together, our results demonstrate how charcoal piles can be automatically extracted from LiDAR datasets, which has great implications for improving our understanding of the long-term environmental impact of the iron industry across Northern Europe. Furthermore, our workflow for developing and implementing deep learning models for archaeological research can expand the use of such methods to regions that lack suitable training data.


Introduction
Over the last 2500 years, iron production has been a central economic industry in Sweden [1][2][3][4]. During the 17th century, there was a drastic increase in iron production that coincided with the immigration of Walloons, who brought new finery processes that produced a type of wrought iron known as iron rod, as well as new charcoal production and forestry techniques [5]. By the 16th and 17th centuries, production was accelerated by capitalist ventures and continued well into the 19th century [6], so much so that by the 18th century, iron production made up three quarters of Sweden's total exports [7]. During this period, the cost of charcoal has been estimated to have made up more than 50 percent of the production costs for wrought iron. As a result of major alterations in the structure of the iron industry, charcoal use decreased from the mid-19th century, when many of the small foundries were closed, and larger foundries were endorsed. Moreover, black coal was used to a higher degree, due to its higher energy content [5].
To understand the iron production industry in this region, knowledge of charcoal manufacturing is essential. To produce charcoal from wood, slow combustion and a controlled supply of oxygen is required. This is achieved through different types of constructions, either sunken (pits) or above ground (piles) [4]. These constructions are called charcoal pits, which date to between the late Iron Age through the Medieval Period, or charcoal piles that emerged in the post-Medieval Period around the 12th century [4, 5,8]. The exact morphological characteristics vary by local tradition, but most are defined by a pile of wood covered in turf and surrounded by post-holes where supporting structures . Historical image of resmila in use for coal production (credit: ref [19]). (B). Schematic drawing of resmila charcoal piles (credit: ref [5]). (C). Two examples of resmila in a LiDAR-derived hillshade map (indicated by yellow arrows). These features range in size (10-20 m) in terms of diameter.
Within Jönköping County, Sweden, the documentation of resmila in the National Database is limited, as only a handful were previously discovered in large part due to the difficulty in identifying them during ground surveys. As such, we developed an iterative approach for deploying advanced machine intelligence techniques in regions where information about these features is lacking ( Figure 2). Thus, our case study has implications for archaeologists around the world where automated site prospection is hampered by limited available training datasets. Particularly among studies of RCHs in Europe and North America, our methods can aid in the documentation of broad landscapes under tree canopies where these features are often abundant but poorly recognized on a landscape scale [13]. Using depression algorithms, we can (1) allow for relatively good initial assessment of certain archaeological features, identifying dozens to hundreds of these deposits. This then permits for (2) the detection of initial training datasets needed for the implementation of more sophisticated automation algorithms, such as deep learning. In what follows, we provide a case study from a 22 km 2 area of Jönköping County, Sweden, where charcoal piles were recently located by one of the authors (JL) by manually evaluating LiDAR data ( Figure 3).   . Historical image of resmila in use for coal production (credit: ref [19]). (B). Schematic drawing of resmila charcoal piles (credit: ref [5]). (C). Two examples of resmila in a LiDAR-derived hillshade map (indicated by yellow arrows). These features range in size (10-20 m) in terms of diameter.
Within Jönköping County, Sweden, the documentation of resmila in the National Database is limited, as only a handful were previously discovered in large part due to the difficulty in identifying them during ground surveys. As such, we developed an iterative approach for deploying advanced machine intelligence techniques in regions where information about these features is lacking ( Figure 2). Thus, our case study has implications for archaeologists around the world where automated site prospection is hampered by limited available training datasets. Particularly among studies of RCHs in Europe and North America, our methods can aid in the documentation of broad landscapes under tree canopies where these features are often abundant but poorly recognized on a landscape scale [13]. Using depression algorithms, we can (1) allow for relatively good initial assessment of certain archaeological features, identifying dozens to hundreds of these deposits. This then permits for (2) the detection of initial training datasets needed for the implementation of more sophisticated automation algorithms, such as deep learning. In what follows, we provide a case study from a 22 km 2 area of Jönköping County, Sweden, where charcoal piles were recently located by one of the authors (JL) by manually evaluating LiDAR data ( Figure 3).

Automation in Archaeological Remote Sensing
Machine learning and computer automation applications have advanced rapidly in the past two decades [20][21][22]. The methods applied for automated feature detection are widely varied, especially for landscape-scale datasets such as LiDAR and satellite imagery [23][24][25][26][27]. Among the many advances made by scholars in the past few years, the use of morphological characteristics and object-based classifiers has been widely applied, especially to LiDAR data (see [13]). More recently, developments in deep learning (DL) have made progress in improving the accuracy of automated prospection of archaeological materials (e.g., [17]).
DL is a branch of artificial intelligence (AI) that has been recently gaining popularity

Automation in Archaeological Remote Sensing
Machine learning and computer automation applications have advanced rapidly in the past two decades [20][21][22]. The methods applied for automated feature detection are widely varied, especially for landscape-scale datasets such as LiDAR and satellite imagery [23][24][25][26][27]. Among the many advances made by scholars in the past few years, the use of morphological characteristics and object-based classifiers has been widely applied, especially to LiDAR data (see [13]). More recently, developments in deep learning (DL) have made progress in improving the accuracy of automated prospection of archaeological materials (e.g., [17]).
DL is a branch of artificial intelligence (AI) that has been recently gaining popularity among computational archaeologists [24,[28][29][30][31]. DL has resulted in higher accuracy by increasing true positives while reducing false positive results compared to other machine learning and automation techniques [24,32,33]. Convolutional Neural Networks (CNN's), specifically, are a type of DL architecture that have led the charge in archaeological applications. CNNs input data from multidimensional matrices, which allow them to quantify patterns using neighboring pixels, which can affect the final identification [24,31]. As such, CNNs are often more sensitive to subtle patterns within datasets than other forms of machine learning.
DL applications, while now growing rapidly, have remained limited within archaeology, largely because of the computational power, amount of training data, and expertise required to successfully develop these models [34]. The need for large training datasets has been substantially reduced by transfer learning (e.g., [24]), which is a technique that uses previously trained DL models to train new (oftentimes unrelated) models [35]. Transfer learning has led to a breakthrough in archaeological research [32].
Within DL, there are two main architectural types for models: two-stage and one-stage. Two-stage detectors have been more popular among archaeologists, particularly those studying landscape scale datasets [29,36] because they often achieve greater accuracy than one-stage detectors. This stems from the ability of two-stage detectors to reduce false positives by using their two-stages to generate potential feature maps while simultaneously filtering out negative locations [37]. Two-stage detectors are computationally intensive, however, making their use more limited to those with high-computing power. Single stage detectors, in contrast, require less computational power but have historically struggled with reducing background noise, leading to lower performance. However, recent developments with one-stage DL architectures are yielding improved results. One example is the RetinaNet architecture [37,38]). Here, we specifically chose a RetinaNet architecture because it can be run relatively quickly on computers without high-performance GPUs (graphical processing units).
The learning curve associated with DL is also being reduced thanks to advances in GIS technology [31]. ArcGIS Pro [39], for example, has a suite of tools specifically designed for DL using remote sensing datasets. Here, we make use of hydrological algorithms-to generate the data needed to train a DL algorithm-and ArcGIS Pro's DL capabilities to design a RetinaNet model, which can be trained and implemented on computers without high-performance GPUs.

A Solution to Training Data Issues: Hydrological Depression Algorithms
Depression algorithms have been utilized by archaeologists for a range of purposes from crater detection to mound and shipwreck identification [27,[40][41][42][43]. In the past several years, researchers developed a depression algorithm designed for sinkhole detection that works using a contour-tree procedure [44] wherein a computer looks for changes in elevation contours to locate high and low points in a region. Specifically, the algorithm locates low-lying points and searches for subsequently higher elevations surrounding that point to identify sinks (or depressions) on a landscape [44]. Wu et al. [44] demonstrated that using a contour tree process results in an exponentially faster processing time than previous methods, such as the stochastic depression algorithm [45], which has been used by archaeologists (e.g., ref [19]). Recently, researchers used this contour-tree method to automatically detect shipwrecks in bathymetric datasets in North America [41], thereby demonstrating its utility for cultural heritage research.

Materials and Methods
Extensive historical production of charcoal leaves discernable topographic scars on the present landscape. The availability of LiDAR data from Lantmäteriet, a Swedish governmental agency, permits a detailed analysis of the extent of centuries of charcoal production activities throughout forested regions of Sweden. Advances in machine learning and automation, likewise, offer great potential to record these deposits with speed and precision throughout large areas. Although tens-of-thousands of charcoal piles are recorded in Sweden's National Heritage databases, local variations are not well documented. Therefore, we assessed the ability for a sinkhole detection algorithm to locate resmila charcoal piles because this method does not require training data. Then, we used the results of this analysis to compile training data to train a RetinaNet DL model.

Depression Algorithm
We used the Contour-Tree Depression Tool developed by Wu et al. [44,46] to detect small-circular depressions associated with charcoal production (see Figure 1c). The tool runs in ArcMap. We generated a Digital Elevation Model (DEM) with 1 m spatial resolution from LAS point data provided by Lantmäteriet with between 1 and 2 return points per square meter and used this DEM as input for the depression analysis. We ran the algorithm using a buffer size of 2 m and a minimum size of 5 m. Once the algorithm ran, we calculated the circularity of each detected depression using the equation: where C = circularity, A = area, and P = perimeter. Area and perimeter were calculated using the Calculate Geometry tool, and circularity was calculated using the field calculator tool in ArcGIS 10.8 [47]. Next, we calculated topographic roughness, which characterizes the topography of a given region, using the formula developed by Riley et al. [48]: where the neighborhood mean, minimum, and maximum values of the DEM are used to calculate roughness. Neighborhood values were generated using the focal statistics tool in ArcGIS 10.8 [47]. The values from the resulting raster were extracted for each location identified by the depression algorithm using zonal statistics. Finally, we reduced the results of the contour-tree depression analysis using circularity (>0.5) and topographic roughness values (mean > 60, range > 5).

Deep Learning with RetinaNet
RetinaNet architecture was predicated on a Feature Pyramid Network (FPN) and Focal Loss. FPNs are an architecture for processing data wherein low resolution data are combined with high-resolution data through a process of downsampling and upsampling (see Figure 4; ref [38]). Focal Loss alleviates the problem of having an overwhelming number of candidate objects, which often results in decreased accuracy for single-stage detectors [37].
RetinaNet architecture was predicated on a Feature Pyramid Network (FPN) and Focal Loss. FPNs are an architecture for processing data wherein low resolution data are combined with high-resolution data through a process of downsampling and upsampling (see Figure 4; ref [38]). Focal Loss alleviates the problem of having an overwhelming number of candidate objects, which often results in decreased accuracy for single-stage detectors [37].  RetinaNet operates on four main components ( Figure 4): a bottom-up pathway, a topdown pathway, a classification subnetwork, and a regression subnetwork. The bottom-up pathway calculates a set of features at specific locations (or feature maps) at multiple scales. The top-down pathway up-samples the coarser feature maps and merges them with the bottom-up feature maps of the same size. Next, the classification subnetwork calculates the probability of an object being present at a given location and belonging to a specific class (defined by training data). The regression subnetwork then creates a labeled object from that classification (For more information about RetinaNet, see Lin et al. [37,38]. More information on RetinaNet in ArcGIS Pro can also be found at https://developers.arcgis.com/python/ guide/how-retinanet-works/accessed on 25 October 2019).

Implementing RetinaNet CNN in ArcGIS Pro
We created a composite raster consisting of a DEM, hillshade, and slope map for the study region (following ref [31]) to create training data for the RetinaNet Model. ArcGIS Pro requires a multiband raster for deep learning applications. Using the pit features detected from the depression analysis, we labeled charcoal pit locations using the Label Training Data for Deep Learning Analysis tool in ArcGIS Pro 2.6 [39]. In total, we labeled 155 charcoal piles.
The number of documented charcoal piles was still somewhat limited, so we used augmentation procedures to increase our sample size. We augmented the training data using 90 • rotations, thereby creating synthetic data that increased our sample sizes by a factor of 4. Using augmentation, we created a total of 940 images of charcoal piles with a size of 200 × 200 pixels using the Export Training Data for Deep Learning tool in ArcGIS Pro [39]. Other image dimensions were tested (248 × 248, 100 × 100) but resulted in lower model performance.
Next, we trained four different RetinaNet models using a ResNet 34, ResNet 50, ResNet 101, and ResNet 151 backbone model ( Figure 5) with a batch size of 4, optimized learning rates, 10% validation, and 50 epochs. We also set the training to use an unfrozen model and to stop training if improvements stopped. We chose a batch size of 4 to ensure that the process could be implemented on computers without CUDA GPU processors, Remote Sens. 2021, 13, 3680 7 of 13 which are quite expensive and can limit the usability of this method. Models were trained on multiple computers, including a personal laptop with 8 GB of RAM, an Intel Core i7-4510U CPU @ 2.0 GHz with two cores and four logical processors to ensure our method can be utilized by researchers with a range of computational resource availability. ResNet (Residual Network, He et al. 2017) is a transfer learning architecture that is trained on the ImageNet dataset (consisting of over 1 million images) and contains different numbers of layers and helps to improve both the speed and accuracy of the model training process. Here, we used ResNet with 34, 50, 101, and 152 layers. We set up the training procedure to run each model 50 times (50 epochs) and to stop training when the model was no longer improving, in order to save time and processing power. To evaluate overfitting issues with the model, we withheld 10% of the training data for validation.
Next, we trained four different RetinaNet models using a ResNet 34, ResNet 50, Res-Net 101, and ResNet 151 backbone model ( Figure 5) with a batch size of 4, optimized learning rates, 10% validation, and 50 epochs. We also set the training to use an unfrozen model and to stop training if improvements stopped. We chose a batch size of 4 to ensure that the process could be implemented on computers without CUDA GPU processors, which are quite expensive and can limit the usability of this method. Models were trained on multiple computers, including a personal laptop with 8 GB of RAM, an Intel Core i7-4510U CPU @ 2.0 GHz with two cores and four logical processors to ensure our method can be utilized by researchers with a range of computational resource availability. ResNet (Residual Network, He et al. 2017) is a transfer learning architecture that is trained on the ImageNet dataset (consisting of over 1 million images) and contains different numbers of layers and helps to improve both the speed and accuracy of the model training process. Here, we used ResNet with 34, 50, 101, and 152 layers. We set up the training procedure to run each model 50 times (50 epochs) and to stop training when the model was no longer improving, in order to save time and processing power. To evaluate overfitting issues with the model, we withheld 10% of the training data for validation. Once the model was trained, we evaluated its performance by detecting objects within the boundaries of two LiDAR tiles from Jönköping County, Sweden, via the Detect Objects Using Deep Learning tool in ArcGIS Pro [39]. Newly detected charcoal piles were added to the training dataset, and we trained another RetinaNet Model using the best performing backbone architecture from the initial set of tests. Due to the ongoing COVID-19 pandemic, ground evaluation has not yet been conducted, but will constitute the focus of future research. For the purpose of this study, we are still able to demonstrate the utility of these techniques for expanding archaeological investigations on charcoal production in Northern Europe.

Depression Analysis Results
The Wu et al. [44] contour tree algorithm detected 768 possible depressions related to charcoal production. We checked each of these identifications manually by assessing morphology, location, and conformity with known examples of resmila. This manual eval- Once the model was trained, we evaluated its performance by detecting objects within the boundaries of two LiDAR tiles from Jönköping County, Sweden, via the Detect Objects Using Deep Learning tool in ArcGIS Pro [39]. Newly detected charcoal piles were added to the training dataset, and we trained another RetinaNet Model using the best performing backbone architecture from the initial set of tests. Due to the ongoing COVID-19 pandemic, ground evaluation has not yet been conducted, but will constitute the focus of future research. For the purpose of this study, we are still able to demonstrate the utility of these techniques for expanding archaeological investigations on charcoal production in Northern Europe.

Depression Analysis Results
The Wu et al. [44] contour tree algorithm detected 768 possible depressions related to charcoal production. We checked each of these identifications manually by assessing morphology, location, and conformity with known examples of resmila. This manual evaluation led to the confirmation of 243 depressions related to charcoal piles within the study area and 38 possibly related to charcoal production (accuracy = 33.3%). Future ground testing will confirm or reject these possible charcoal production locations. Among the 243 identified depressions were 180 charcoal pit sites. We used a selection of 155 of these detections to train a series of RetinaNet models (see Section 3.2). We excluded 25 depressions and retained them for validation of our RetinaNet models.

RetinaNet Results
Using a selection of the best-preserved charcoal pit features detected using the depression algorithm, we trained four different RetinaNet models using different transfer learning architectures. The results of these models are listed in Table 1. Using these models, we ran the ResNet50 and ResNet152 models on the same test area as the depression analysis to further evaluate each model's performance ( Figure 6).    When comparing the ResNet50 and ResNet152 models, both detected a similar number of features with minimal false positives (Figure 7). The model trained on ResNet152 identified three more training data locations than the ResNet50 model and both had similar numbers of false negatives. The ResNet50 model had fewer false positives than the ResNet152 model (n = 16 and n = 29, respectively). Following this analysis, we incorporated newly detected charcoal piles from the initial RetinaNet models as training data to improve the model trained on ResNet50. The new model was trained using a larger batch size of 8 because smaller batch sizes resulted in lower accuracy (<60%). This new model achieved a slightly higher accuracy of 64.15% Following this analysis, we incorporated newly detected charcoal piles from the initial RetinaNet models as training data to improve the model trained on ResNet50. The new model was trained using a larger batch size of 8 because smaller batch sizes resulted in lower accuracy (<60%). This new model achieved a slightly higher accuracy of 64.15% and lower loss values for both training and validation data (1.12 and 1.05, respectively), indicating greater utility across larger areas.

Discussion
We demonstrated how hydrological depression algorithms can be used to collect preliminary training data for DL models. The depression analysis resulted in many false positives, however, which require expert analysts to manually evaluate before choosing appropriate training datasets. Nevertheless, this procedure allowed us to generate more than 150 examples of charcoal piles within our study area which trained our initial DL model. This iterative detection process can aid in the identification of archaeological features at landscape scales with high degrees of accuracy and survey coverage, particularly in areas where available archaeological data are initially limited, or where knowledge about particular kinds of features are less well-established. In this study, our false positives decreased at each stage, from~67% with hydrological depression methods to less than 14% with our first ResNet50 model. False positives also appeared to decrease with each iteration but were present in all methods deployed (see Figure 7). Furthermore, by consistently reapplying DL models with new training data identified during previous model runs, we showed how automated detection can be utilized in an iterative manner to improve archaeological recording efforts (see Figure 8). This can greatly increase the utility of advanced automated methods of remote sensing analysis among archaeologists working in regions with limited landscape scale archaeological records.
with our first ResNet50 model. False positives also appeared to decrease with each iteration but were present in all methods deployed (see Figure 7). Furthermore, by consistently reapplying DL models with new training data identified during previous model runs, we showed how automated detection can be utilized in an iterative manner to improve archaeological recording efforts (see Figure 8). This can greatly increase the utility of advanced automated methods of remote sensing analysis among archaeologists working in regions with limited landscape scale archaeological records.  Despite expanding the utility of automated analysis methods to case studies with limited training data availability, one key limitation of many applications of DL is computational requirements. Most applications of DL for archaeological prospection utilize dual-stage model architectures (e.g., R-CNN). While these models result in high accuracy in many cases, they are also computationally expensive and require high-performance GPU processors to run effectively, placing their use outside of the realm of possibility for many researchers. Here, we made use of a single-stage DL model (RetinaNet) and demonstrated how these models can be trained without CUDA GPU processors. This opens the possibilities of DL applications to a greater number of archaeologists. In the future, we plan to expand our application of the DL models generated here to investigate the spatial extent and total quantity of resmila charcoal pit features across Sweden's forested landscape.
With continued analyses, the remains of charcoal piles can be placed in a comprehensive cultural-historical context with the foundry Järnbruket Hörle bruk, which is located nearby our study area. This means that there is likely to be a plethora of historical records that can be processed in order to reach conclusions on organization, as well as economic and social aspects of charcoal production. Moreover, spatial analyses of the locations of these charcoal piles remain an unexplored research topic, and having a more comprehensive record of the geographic distribution of charcoal piles could be achieved using the data generated from automated LiDAR analyses (e.g., [42,43,49,50]). Where clusters of charcoal pits remain, future work can also investigate if other constructions related to charcoal production, such as huts and storage piles, are located nearby, and attempts to automate the detection of these features can form future research objectives. Finally, with a systematic record of charcoal production for the area, future research can focus on paleoecological analysis based on pollen records to analyse the intensity of the production in a certain area based on ash deposits. Such work can also provide a better understanding of the development of vegetation on the landscape over time.
The identification and localization of these sites-especially if many prove to predate 1850-would result in them being registered as National Heritage Sites by the Swedish National Heritage Board, which would then protect them under Swedish law. Fieldwork conducted by one of the authors (JL) suggests that many of these remains would be overlooked by non-archaeologists because of their subtle appearance in a constantly changing landscape. Thus, the destructive nature of the forest industry may lead to many charcoal production features becoming damaged or completely unidentifiable in the future. The sheer number of these remains, along with the fact that they are overlooked, places them at risk of destruction. We hope that our methods for quickly identifying these structures in LiDAR assist in increasing attention towards these features and lead to many more charcoal piles being registered in archaeological site databases for future research.

Conclusions
This case study demonstrates the utility of single-stage RetinaNet DL models for identifying charcoal pit features in the forested regions of Sweden. Using a topographic depression analysis procedure, we collected the necessary training data to implement a DL model that successfully detects these charcoal features, demonstrating the power of an iterative approach to site prospection making use of multiple techniques. The ability to train single-stage DL models for archaeological site prospection will likely open the door for many more researchers to implement these methods because of their lower computational requirements when compared to most dual-stage methods commonly used within archaeology (e.g., R-CNN, Faster R-CNN, Mask R-CNN, etc.). Within Northern European and North American archaeology, generally, this research shows the potential for analyzing available LiDAR datasets in a systematic way to expedite cultural heritage monitoring and survey efforts, especially in the study of RCHs and other charcoal production methods. Furthermore, such techniques make it possible to examine more closely the history of anthropogenic landscape modifications in northern Europe and compare resource extraction between different areas.