Detecting Associations between Archaeological Site Distributions and Landscape Features: A Monte Carlo Simulation Approach for the R Environment

: Detecting association between archaeological sites and physical landscape elements like geological deposits, vegetation, drainage networks, or areas of modern disturbance like mines or quarries is a key goal of archaeological projects. This goal is complicated by the incomplete nature of the archaeological record, the high degree of uncertainty of typical point distribution patterns, and, in the case of deeply buried archaeological sites, the absence of reliable information about the ancient landscape itself. Standard statistical approaches may not be applicable (e.g., X 2 test) or are di ﬃ cult to apply correctly (regression analysis). Monte Carlo simulation, devised in the late 1940s by mathematical physicists, o ﬀ ers a way to approach this problem. In this paper, we apply a Monte Carlo approach to test for association between Lower and Middle Palaeolithic sites in Hampshire and Sussex, UK, and quarries recorded on historical maps. We code our approach in the popular ‘R’ software environment, describing our methods step-by-step and providing complete scripts so others can apply our method to their own cases. Association between sites and quarries is clearly shown. We suggest ways to develop the approach further, e.g., for detecting associations between sites or artefacts and remotely-sensed deposits or features, e.g., from aerial photographs or geophysical survey.


Introduction
Associating archaeological sites with elements of the physical landscape (geological deposits, vegetation, drainage networks) are key goals of archaeological projects. This information is desirable for a variety of reasons, e.g., to understand the relationship of past human settlements to agricultural land in their territory (e.g., [1]), to trace patterns of hominid colonisation (e.g., [2]) or to identify key landforms or landscape types for future research or preservation from development (e.g., [3]). However, the task is not an easy one and a wide range of difficulties are typically encountered. A key problem relates to the incompleteness of the archaeological record as a result of differential survival or detection, so apparent associations with particular landscape types or features may not be what they seem-for example, sites discovered by aerial photography are often strongly biased towards geological deposits where cropmarks and soilmarks are most easily formed (see, e.g., [4]), which may not relate to the past human pattern of exploitation of the landscape.
For older archaeological periods, the problems are particularly acute. The absence of lasting structural remains throughout most of Prehistory means that locating traces of Prehistoric archaeology in the landscape is typically challenging. Increasingly, archaeologists are turning to geoarchaeology, deposit modelling [5] and geophysics in order to ascertain what the relationships are between artefacts and the contexts in which they are found. The basic premise on which such an approach is formulated is that the sites/artefacts lie within, and are associated with, palaeolandscapes [6]. Today these palaeolandscapes only exist as fragments of once extant geographies. Consequently, an understanding of the palaeolandscape context of the archaeological remains allows our interpretive horizons for recovered material to be enhanced and developed. Insight into the likely locations in the landscape in which we may expect to find evidence for our earliest ancestors [7] is highlighted as an important analytical tool. However, undertaking such investigations is far from easy.
Besides the difficulty of reconstructing lost geographies from fragmentary archaeological remains lies the major fundamental issue of relating artefacts to sequences. Statistical analysis and modelling approaches like regression that explore the correlation between sites and explanatory variables are not intuitive to researchers inexperienced in statistics, and may feel like overkill for simple cases. Standard statistical tests of association (e.g., the X 2 test) do not seem suitable, as the expected frequency of samples in each class is a function of the proportional size of the class, rather than the proportion of landscape area occupied by the landscape features. A further problem, which frequently confounds analysis of archaeological site distribution data, is the low degree of accuracy and precision of the point data within a dataset. This is especially a problem with large, integrated databases, like those maintained under the Valetta Convention, for example, the UK's county-level Historic Environment Records (HER). This is usually due to the low degree of precision of the original recorded coordinates (e.g., UK six-figure grid references), which are often derived from record cards or paper maps, or simply from the lack of precise knowledge about the location. While these problems are well known, satisfactory solutions remain elusive.
In this paper, we propose an approach which addresses these difficulties using Monte Carlo simulation to determine the probability that the observed patterns can be replicated by chance. This involves generating large numbers of randomly located simulated sites incorporating the appropriate level of spatial uncertainty and computing the coincidence between the feature layer of interest and the simulated sites. Though we code the problem in the popular Free and Open Source "R" software [8], the procedure is straightforward enough to be carried out with a desktop GIS program and a spreadsheet. The scripts used for the analysis are provided in the appendices and as supplementary data to the paper so that interested readers can use them to carry out their own analyses.

Overview
Archaeological site distributions are significant because they can tell us about past human activity in the landscape. However, making robust inferences from archaeological site distribution data is not straightforward. The problems are discussed in classic texts (e.g., [9,10]). Of the wide range of potentially useful statistical approaches, few seem immediately applicable to detecting association between archaeological point patterns and landscape features-for example, permutation tests, while useful for intra-site spatial analysis [11], would seem to require pairs of points (e.g., two different types of artefact) rather than points and features with which they might be associated, though a customized test could no doubt be developed. Likewise, cluster analysis, both the simplest forms (Clark and Evans test, Quadrat analysis, see, e.g., [12]) and more advanced approaches capable of dealing with uncertainty at spatial scales like local density analysis [13] and kernel density analysis [14], is more suitable for developing or testing hypotheses about spatial organisation of archaeological occupation areas than for testing the relationship between the sites and particular deposits or features. What remains, once the range of applicable techniques has been carefully filtered, are very simple, classic tests of association like the X 2 test, which are not really appropriate for spatial distributions, and more complicated approaches, like regression analysis, which require experience and care in application in order not to fall into one of a number of well-documented traps. Monte Carlo simulation, once difficult to do before the advent of powerful computers, is something of an intermediate approach, as it allows a test of association of one spatial data set against another in a way that is fast, intuitive and robust, and does not require immersion in statistical modelling. Monte Carlo approaches have been variously applied in archaeological contexts (see, e.g., [15,16]) but are not as widely used as they might be, probably because the procedure is not integrated as standard in GIS software, and a straightforward description of the process with worked examples has so far not been published. In this paper, we seek to address this gap by showing the utility of the approach by application to a simple research question-whether the spatial distribution of Lower and Middle Palaeolithic sites in Hampshire and Sussex can be shown to be associated with 19th and earlier 20th-century quarries recorded on historical topographic maps.
The paper is structured as follows. In the next section, we discuss the origins of Monte Carlo simulation and explain its utility for investigation of the association of archaeological sites with physical landscape features. We then describe how the method can be applied by taking the reader carefully through the steps in the R software environment, which is increasingly popular for spatial analysis [8]. We then link these steps into an R script (see Appendices A-C) and apply the script to a case study example dataset, demonstrating a clear association between Palaeolithic sites and quarries. Finally, we discuss the potential of this approach to other cases, and finish with some simple recommendations to enable our method to be improved.

Monte Carlo Simulation
The origins of the Monte Carlo method in modern science can be found in the work of Metropolis and Ulam [17], and Ulam [18]. Key to the work of these authors was the observation that complex phenomena could not usefully be described using ordinary methods of mathematical analysis, as the resulting system of integral or integral-differential equations, with numerous interdependent variables, would quickly become too complicated to solve. Alternatively, overarching theoretical approaches, such as statistical mechanics, suitable for analysis of systems comprising enormous numbers of objects, were insufficiently detailed to be of general use for description of smaller systems. Ulam and his colleagues therefore proposed an alternative approach, called the Monte Carlo Method [17], in which the application of probability theory allowed the outcome of a very complex set of circumstances to be determined correctly with great probability [18].
Though the approach was initially proposed for problems of mathematical physics, Monte Carlo-type approaches are clearly applicable to geographical problems. Just as particle collisions inside cosmic ray showers would be expected to occur with greater or lesser probability dependent on particular sets of circumstances, so too will the intersection or coincidence of geographical or archaeological features expressed by map geometry inside a GIS (a fundamental operation of geographical analysis) be less or more likely, depending on the characteristics of the map space and the individual map objects. [19] provide examples of solutions to intersection probability problems for map object types such as lines, circles and rectangles, in which probabilities can be calculated through relatively simple equations. More complex feature types, such as the irregular polygon areas derived from map representations of natural or landscape formations like vegetation, soils or geology, are much more difficult to express through equations. In such cases, as these authors point out, Monte Carlo simulations-"playing a game of chance" in the words of [17]-offer the possibility of obtaining estimates for intersection probabilities.
Monte Carlo approaches have been used in archaeological site location analyses in various ways. [15] used a Monte Carlo simulation to compare the area of visible landscape, known as the viewshed, from Bronze Age Cairns on the Isle of Mull, Scotland with the viewsheds from randomly generated non-sites; these approaches are essential in order to confirm that the alleged superior visibility or intervisibility of archaeological sites cannot have arisen by chance alone [20]. In a highly innovative study, [16] used a Monte Carlo approach to test hypotheses about changes in mid-Holocene hunter-gatherer settlement patterns in Japan by comparing spatio-temporal patterns based on real data with large numbers of randomly generated hypothetical spatio-temporal patterns. [21] used a pair correlation function to compare clustering of archaeological finds in highly irregular spaces (a tomb complex) with clusters of finds that had been randomly assigned through a Monte Carlo procedure.
In addition to these highly developed approaches, Monte Carlo simulation seems ideally suited to the apparently simpler task of detecting associations between mapped landscape features like geology, land cover or vegetation types. It does not seem easily possible to directly calculate the probability that discrete circles representing archaeological site locations will intersect with the many irregular polygons resulting from mapping these kinds of landscape features. This is a significantly more complex problem than any of the examples given by [19], which deal entirely with probabilities of intersection of regular geometrical features such as lines, circles and rectangles, within other regular objects (such as map tile boundaries) in Cartesian space. Of course, any complex polygon in a vector dataset can be generalised as a series of smaller rectangles, circles or triangles, but to do this across the whole map area, without significant loss in accuracy (even before calculation of all individual intersections and production of summed probabilities), seems like a very time-consuming exercise. Instead, the approach we demonstrate in this paper is to discover the likelihood that a given set of archaeological sites will intersect the irregular map of features we are interested in by "playing a game of chance". The approach involves generating, at random, the same number of sites as there are real archaeological sites within the same study area, counting the number of times the sites intersect the features layer, and repeating the analysis multiple times, in order to obtain a satisfactory degree of confidence in our outcome. For example, by repeating the simulation 100 times, we obtain the probability that the coincidence frequency of the real sites could have been generated by random error, in the form of a p-value familiar to users of conventional statistical tests; i.e. if, after 100 runs, it is not possible to attain the same, or larger, number of sites coincident with the relevant landscape features as in the real dataset, we can conclude that our sites are likely to be associated with the features of interest at a significance level of p = 0.01 [22].

Materials and Methods
To apply the Monte Carlo approach described above to test the association of archaeological sites with particular landscape features or units, we need to answer two key questions: (1) How many archaeological sites would we expect to find per landscape unit, in the case that there is no association between the sites and the landscape element we wish to investigate? (2) Does our sample differ from such an expected distribution in a way that would lead us to conclude that our site is associated with or attracted to the landscape features of interest?

Study Area
In line with the above discussion, we demonstrate the application of our approach with reference to the Lower and Middle Palaeolithic findspots database compiled by the Palaeolithic Archaeology of the Sussex/Hampshire Coastal Corridor project (PASHCC; [23]) and quarries extant between 1866 and 1924, digitised from historic editions of the Ordnance Survey maps.The original PASHCC project area comprised the lowland coastal corridor zone between Southampton (Hampshire) and Brighton (East Sussex) (Figure 1a,b), UK. For the purposes of the investigation presented here, the study area comprised the western half of the PASHCC project area, the Eastern Solent Basin (Figure 1c), which contains the densest concentration of Palaeolithic sites in the PASHCC project area. Evans [24] recorded several Palaeolithic findspots from the Pleistocene terrace gravels of the drowned Solent River around Southampton and east towards Gosport ( [25], p. 26), and throughout the second half of the nineteenth and first part of the twentieth centuries, the activities of observant amateur antiquarians brought Lower and Middle Palaeolithic artefacts to light in increasing numbers from the then-hand-worked gravel quarries and brick pits of the Eastern Solent Basin. However, since around 1950, the recovery of artefacts and investigation of artefact-bearing sequences have dramatically declined (though see, e.g., [26]), something that is, in the opinion of most (e.g., [27]. p. 49), a direct result of the increasing mechanisation of extractive industry after ca.1930.
In the Eastern Solent study area investigated here, the PASHCC project documented 57 discrete Lower or Middle Palaeolithic sites (out of a total of 98 in the whole PASHCC study area) recorded in Museum collections and Historic Environment Records databases. Sites ranged from single lithic findspots with no accompanying information, to important sites like Warsash where large lithic assemblages have been recovered and stratigraphic sequences have lately been reconstructed [28] ( Figure 1d). For the purposes of the analysis conducted in this paper, the terms "site" and "findspot" are used interchangeably throughout the manuscript, and refer only to the presence of one or more lithic artefacts recorded in the PASHCC database. The whole lithic artifact database is freely available for download from the archaeology data service (see Supplementary Materials); sample sites for the study region and the polygon layer of digitized quarries are also freely available for download at the link provided by the authors (see Supplementary Materials).
Geosciences 2020, 10, x FOR PEER REVIEW 5 of 21 In the Eastern Solent study area investigated here, the PASHCC project documented 57 discrete Lower or Middle Palaeolithic sites (out of a total of 98 in the whole PASHCC study area) recorded in Museum collections and Historic Environment Records databases. Sites ranged from single lithic findspots with no accompanying information, to important sites like Warsash where large lithic assemblages have been recovered and stratigraphic sequences have lately been reconstructed [28] ( Figure 1d). For the purposes of the analysis conducted in this paper, the terms "site" and "findspot" are used interchangeably throughout the manuscript, and refer only to the presence of one or more lithic artefacts recorded in the PASHCC database. The whole lithic artifact database is freely available for download from the archaeology data service (see Supplementary Material); sample sites for the study region and the polygon layer of digitized quarries are also freely available for download at the link provided by the authors (see Supplementary Material).

Study Aims
Given that the sites recorded in the PASHCC database are known to be mostly derived from quarrying activities in the region prior to mechanisation of the industry, we would expect to find an association between known locations of quarries that were worked during the period when the artefacts were collected and the recorded locations of the artefacts. However, the expectation was that such a hypothesis would be difficult to prove using a simple test of spatial coincidence due to the large number of confounding factors involved. Briefly, these can be defined as follows: • The location of the Palaeolithic finds is, in many cases, uncertain.

•
Many quarries are small-the fairly low precision of the recording of findspots means that finds might not coincide with the mapped location of the quarry from which they probably came. • Not all quarrying activities are recorded on the maps.

•
The fourth edition map was only partially available in digital form at the time the digitizing work was undertaken. Many tiles and areas of this map were missing from the dataset used, so later quarries are likely to have been omitted.

•
Not all Palaeolithic sites are directly derived from quarries. Some have arrived in the hands of collectors as a result of natural processes, such as erosion.

•
Quarries have been digitised without consideration for their depth or purpose, as determining this information would have been unreasonably time-consuming. Consequently, the dataset includes a number of quarries which do not impact any Pleistocene deposits (e.g., hilltop chalk quarries).
In this sense, the Monte Carlo simulation approach provides a useful way of testing the widely shared assumption that the spatial distribution of Palaeolithic sites of this period in the PASHCC study area (Figure 1) is mostly due to the activities of collectors or fieldworkers in areas of quarrying, without amassing a vast amount of knowledge about the specific context of each site. If we find no association between known locations of quarries that were worked during the period when the artefacts were collected and the recorded locations of the artefacts, we can suspect that the provenance of the finds is less exact than expected, or that the PASHCC collection contains many artefacts recovered by other kinds of excavation than quarrying (river or coastal erosion, subsidence, urban development, etc.). However, if we do find a strong association, we provide good confirmation that, as a whole, the location of the recorded finds is broadly accurate, despite the clear confounding factors noted above.The aim of our investigation was, therefore, to test empirically this widely shared assumption about the spatial distribution of the Palaeolithic sites in the study area, and by doing so, to demonstrate the usefulness of our proposed method.
In line with conventional statistical approaches, we can formulate a null hypothesis, which our analysis can test. This hypothesis broadly states that Palaeolithic sites will coincide no more often with quarrying areas than the same number of randomly-located sites. In the next sections, we describe in detail the steps undertaken to test this hypothesis using Monte Carlo simulation in R software.

Workflow
The procedure comprises 5 stages: (1) The relevant features of interest are digitised in suitable GIS software, e.g., ArcGIS, QGIS or similar; (2) buffers are added to the points to reflect the uncertainty derived from the degree of precision associated with the coordinates of the points; (3) a set of random points is generated within the same study area; (4) the random points are overlain onto the features map and the number of overlapping points are counted, and (5) the simulation is repeated 100 times, each time using a different set of random points.
The analysis can be implemented in two ways, according to the user's preference, (i) by simply using the R software to generate the random sites and export them in a GIS-compatible format, and then carrying out the overlay operations manually in GIS, e.g., using Select By Location in ArcGIS or (for raster data) the r.coin module in GRASS software. This is repeated as many times as required, depending on the significance value chosen as a cutoff; (ii) by carrying out the entire analysis from start to finish inside R. While the second of these two options is very much quicker and more efficient, the first may be preferred by users of GIS who are unfamiliar with script programming or the R environment, and also serves to illustrate the procedure more clearly. For users who prefer to avoid R entirely, a prototype of the procedure described here programmed in Microsoft Windows' visual basic scripting language (vbscript) can be found in the Appendices of [29]. For advanced GIS users, scripting can be avoided entirely. For example, in ArcGIS Pro, the "Create Random Points (Data Management)" tool can be used to create random points within the study area boundary. The "Select by Location" tool can then be used to select the recorded Palaeolithic sites and the random sites that intersect with the quarries layer. The ModelBuilder can be used to build these two operations into a loop to run the Monte Carlo test the required number of times.

Georeferencing and Digitising of Quarries
This first phase of the analysis involved on-screen (heads-up) digitising of all identifiable quarries recorded on four editions of the Ordnance Survey map (Figure 1)-First or Old edition (1866-9), Second or New edition (1896-7), Third edition (1909-11), and Fourth edition (1924) at an approximate scale of 1:10000 (6" to 1 mile). Digitising was carried out in vector software and exported to raster format. We used ArcView GIS 3.2, an outdated package which nonetheless remains useful for simple GIS operations.

Adding Buffers to Account for Uncertainty
Uncertainty in location is a very common issue with archaeological data, and can be especially problematic when imprecise coordinate information is used at large spatial scales. Findspot locations in our study were recorded as standard Ordnance Survey of Great Britain (OSGB) six-figure grid references (e.g., #29, 645,065), expanded to enable them to be plotted in map space by adding the easting grid offset of 400,000 to the first three figures (giving 464,500) and the northing grid offset of 100,000 to the last three figures (giving 106,500). Six-figure grid references do not determine a point at all, rather, they define a square of 100 m sides within which the findspot in question must lie. In fact, the findspot must lie not only within this 100 m square, but inside a circle of 100 m diameter within the square, whose centre point is defined by the intersection of each 100 m gridline. The region must be a 100 m-diameter circle, as any diagonal of a 100 m square would fall within the interval 100 ≤ X ≤ 141.42, and the findspot cannot lie further than 100 m from the intersection of the gridlines.
Thus, to account for this in the analysis, 100 m diameter buffers were generated around the Palaeolithic sites ( Figure 2). Though this was done in R (Appendix A and B), it can also be easily be accomplished in any standard GIS software.

Random Sites Generation
Through the st_sample operation in the sf package [30], we generate the same number of random points as there are sites. The st_sample operation uses the uniform distribution (the generated pseudorandom values are uniformly, not normally, distributed), in line with the csr command from the splancs package [31] and standard reference texts on Monte Carlo simulation (e.g., [32]).To create our random sites with the same degree of uncertainty as our real sites, we add 100 m diameter buffers to the randomly generated points (Figure 2).  Observed sites, random sites and uncertainty buffers. Inner (100 m diameter) buffers were used in this analysis. The greater the uncertainty, the larger the buffer, and hence, the higher the probability of intersection between sites and quarries.

Overlay and Frequency Determination
The final part of the analysis involves overlaying the set of 100 m diameter buffers derived from the recorded sites onto the quarries layer and then counting the number of times the sites buffers intersect the quarries. The process is then repeated, but this time using a set of 100 m diameter buffers derived from the random sites generated as described above. To turn this procedure into a Monte Carlo simulation, it is now only necessary to build the random point generation procedure into a loop that simulates sites enough times for the appropriate statistical confidence level to be reached. In this case, we choose to run the analysis 100 times, equivalent to p = 0.01.

Script Programming
The above-described steps 3.3.1-3.3.4 were incorporated into two R Scripts which run successively and together comprise the MCSites tool. The tool, together with example data, can be accessed at https://drive.google.com/uc?export=download&id=1m-7KcIUCD-Zx3dQV_NCTBMGYT3Sr_mT4.
Copy and paste this link into the address bar of your browser and you will be able to download the "MCSites.zip" file. All that is needed to run the tool is the multi-platform software R, which can be installed from https://www.r-project.org/.
Once R is installed, the R software package sf will also need to be installed from the command line, as follows: install.packages("sf") If you copy and paste from this document, take care to replace the inverted commas with ones from your keyboard. The package tcltk is also needed, but is usually available as part of base R and does not need to be installed separately. Installed packages can be loaded with the command: library(sf) library(tcltk) However, the necessary packages, once installed as described above, will be automatically loaded by the script file. To run the analysis from the script file provided at the line described above, open a new R window (or instance of RStudio), set the R working directory to the MCSites directory and execute the command "source ("scripts/main.R")".
setwd("C:\\Users\\rh\\Documents\\MCSites")##windows system setwd("/home/oct/MCSites")##linux system source("scripts/main.R") The script will run giving a series of prompts to import the relevant data and carry out the Monte Carlo simulation. To execute the Monte Carlo analysis successfully, the user needs to do only what is described in this Section 3.3.5. More details are provided for advanced users or those interested in the process of code development in Appendix A.

Results
Given the level of uncertainty expected in the dataset (Section 3.3.2), the results are surprisingly unambiguous. Figure 3 shows clearly that the null hypothesis (Section 3.2) can be firmly rejected. For the 100-run test described in Section 3.3.4, the highest number of coincident random sites was 6, less than one third of the number of Palaeolithic sites that are coincident with the same map areas. On this basis, we can suggest with at least a 1% level of confidence (a result of this magnitude will not occur randomly more often than once in every 100 passes) that there is a significant association between Palaeolithic sites within the study area and quarries. To confirm this result, the test was repeated for 1000 runs on five separate occasions (Table 1). This had the effect of increasing the maximum number of hits in the random sites datasets to 8, but still did not approach the number of recorded sites coincident with the quarries layer. To show the effect on the test of the higher level of uncertainty in site location, the buffer diameter was increased from 100 to 200, and a further 3,1000 run tests were carried out (Table 1). Even allowing for a much-higher-than-expected uncertainty, the results still firmly suggest an association between recorded Palaeolithic sites and quarries.
Geosciences 2020, 10, x FOR PEER REVIEW 10 of 21 uncertainty in site location, the buffer diameter was increased from 100 to 200, and a further 3,1000 run tests were carried out (Table 1). Even allowing for a much-higher-than-expected uncertainty, the results still firmly suggest an association between recorded Palaeolithic sites and quarries.  Table 1, below). Test No. 1001 is the 57 recorded Palaeolithic sites. Site/quarries intersection counts for Monte Carlo simulation shown in blue, recorded sites in red. While we cannot prove for certain that the location of the quarries is the cause of the spatial pattern (correlation is not causation), we show clearly that it is not likely to be independent from it. This offers a useful starting point for more in-depth examination of the individual sites, as well as  Table 1, below). Test No. 1001 is the 57 recorded Palaeolithic sites. Site/quarries intersection counts for Monte Carlo simulation shown in blue, recorded sites in red. Table 1. Monte Carlo Simulation test results for 8 simulation passes of 1000 runs. Note how increasing the size of the uncertainty buffer increases the number of hits for both recorded sites and randomly simulated sites. While we cannot prove for certain that the location of the quarries is the cause of the spatial pattern (correlation is not causation), we show clearly that it is not likely to be independent from it. This offers a useful starting point for more in-depth examination of the individual sites, as well as offering a reminder that, for this ancient archaeological period, the location of archaeological finds is only very rarely an indicator of nearby hominin settlement activity.

Concluding Discussions
Though we have demonstrated our Monte Carlo testing approach with reference to quarries, the method is clearly applicable to any other kinds of landscape feature for which an association with archaeological sites could be postulated. This might include other archaeological features like ditches or boundaries (i.e. testing to see if finds are associated with them) and environmental variables like superficial geology, soil formations, or vegetation. It provides a bridge between somewhat antiquated aspatial techniques like the X 2 test, which are difficult to successfully apply in a spatial context, and more developed analyses involving various kinds of regression. For testing association between archaeological sites and multiple explanatory variables, regression approaches are likely to be superior to this simple Monte Carlo test. However, the need to obtain and process suitable data, and the absence of obviously applicable explanatory variables for archaeological periods where climate, vegetation and landforms are all unrecognisably different from today, means that the Monte Carlo simulation approach described here is useful in a very wide variety of analysis contexts. For example, one very simple application, analogous to the case presented here, might be to investigate the association between cropland and archaeological sites detected by aerial photography, e.g., in the UK (e.g., [33]). As hot, dry years are known to provide ideal conditions for detection of archaeological sites from cropmarks on arable land, a clear association between arable land and archaeological sites might be expected. On the other hand, this would be expected to show some variation, depending on the type of crop and the type of site. A very strong association between sites and arable land might indicate a significant under-representation of archaeological sites on different land cover types, e.g., pasture, allowing different kinds of aerial remote sensing techniques (e.g., LIDAR) to be targeted to these areas.
The approach described also offers a way to account for spatial uncertainty in the form of imprecision in site location. Clearly, this is a less-than-perfect approximation because, in the absence of more precise coordinate information, the site must be represented as a probability zone rather than a single point, which makes it more likely that it will coincide with the selected features. However, if the same method is applied to the random points, they are also more likely to coincide with the selected features, and the two effects would be expected to cancel each other out. Despite the imperfect nature of the proposed solution, we feel that this approach is more honest than simply ignoring the issue. It is especially relevant for modern digital datasets which superimpose many objects recorded at differing scales and degrees of precision, like the point patterns produced by plotting HER records held by county archaeological offices in the UK. Finally, the approach described accounts only for spatial uncertainty. However, other kinds of uncertainty in archaeological data, such as temporal uncertainty, are also amenable to Monte Carlo approaches [34], and the development of a hybrid method that simultaneously addresses both spatial and temporal uncertainty would be an interesting direction for future work.
GIS approaches are widely used by archaeologists, and the R environment is an increasingly popular analysis tool. However, the GIS capability of R, recently enhanced with the addition of the simple features (sf) package [30], which we make use of in this analysis, remains under-appreciated. R is continually expanding and improving, and operations that were once convoluted or complex have become much easier over time. For example, Orton [35], describing the difficulty in finding software for point pattern analysis, lamented being unable to use the splancs package [36], because it required the purchase of S+, then an expensive proprietary statistics package. S+ has since been entirely integrated into R, and an early version of the Monte Carlo script described here made use of the splancs package. Mostly, the barrier to entry for archaeologists remains the perception that R is difficult to use (see, e.g., [37]). We hope to have dispelled this impression with the simple step-by-step "routemap" approach given here. For readers who wish to explore the analysis of archaeological point distributions in more detail, we recommend the spatstat package [38].
A final consideration, of key relevance to the topic of this Special Issue, is the high potential applicability of the proposed method as a rapid analysis tool for detection of association between archaeological surface or near-surface finds (e.g., recorded by an HER database or recovered from topsoil by metal detector survey or fieldwalking), and geophysical survey data. In its simplest form, as demonstrated here, the approach allows for very rapid comparison of a finds distribution against a map of geophysical anomalies. Clearly, if we are looking to identify an important archaeological feature, for example, the vicus associated with a Roman fort, being able to show association between particular finds and geophysical survey results is an important first step in any analysis. However, a more sophisticated application of the approach might seek to match scatters of finds of different dates with different geophysical plots corresponding to different techniques (ground-penetrating radar, magnetometer, etc.), or different depths.
Though we do not claim great sophistication (for more advanced applications of Monte Carlo simulation in archaeology, see, e.g., [16,21]), the approach we have demonstrated in this paper is statistically robust, free of the complexities associated with regression analysis, and simple to apply. This paper, therefore, hopes to serve as a starting point for development of more detailed studies incorporating Monte Carlo testing approaches for analysing patterns of association between archaeological sites and landscapes. Its potential seems particularly strong where landscapes are deeply buried or multiple uncertain interpretations exist, e.g., from palaeo-environmental reconstructions or results of geophysical surveys. We call for this approach to be more widely used in these contexts. The difficulties identified in the introduction to this article, especially around the potential of regional archaeological data to elucidate patterns of human activity in the landscape given the incompleteness of the archaeological record, deserve closer attention. Future work might usefully concentrate on the development of a toolbox of statistical approaches to address these issues more broadly.

Funding:
The archaeological data used in this analysis were collected by the Palaeolithic Archaeology of the Sussex/Hampshire Coastal Corridor project, which ran between 2005 and 2007 and was funded by the Aggregates Levy Sustainability Fund (ALSF) as distributed by English Heritage (now Historic England). No funding was received for the research described in this paper, for the preparation of this manuscript for publication, or for the publication costs incurred.

Acknowledgments:
The author(s) are grateful to COST Action SAGA: The Soil Science &Archaeo-Geophysics Alliance-CA17131 (www.saga-cost.eu), supported by COST (European Cooperation in Science and Technology), for the opportunity to participate in the Special Issue in which this paper appears.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A Technical Description of the MCSites Tool
Appendix A.1 Data Import and Analysis Preparation in R Geographical vector data, e.g., in shape file or geopackage format, can be imported into R in several ways. Here, we choose to do it with the sf package, as this package has recently been developed with the aim of unifying the GIS functionality provided across multiple packages and is now widely recommended as a replacement for earlier packages like maptools. The sample dataset provided with this paper (see additional data) is an ESRI shapefile of points expressed as xy coordinates under the UK Ordnance Survey national grid coordinate system (OSGB36). Points representing archaeological sites are imported as follows: sites <-st_read("data/example_sites.shp") plot(st_geometry(sites),pch = 18,col = "red") The same process is repeated for the study area, and for the features that the sites will overlap, the quarries are digitised from historic Ordnance Survey mapping. study<-st_read ("data/esolent_studyarea.shp") feats<-st_read("data/all_clipped_quarries.shp") These operations are carried out through a series of dialogs ( Figure A1) so the user does not need to execute the code by hand.
now widely recommended as a replacement for earlier packages like maptools. The sample dataset provided with this paper (see additional data) is an ESRI shapefile of points expressed as xy coordinates under the UK Ordnance Survey national grid coordinate system (OSGB36). Points representing archaeological sites are imported as follows: sites <-st_read("data/example_sites.shp") plot(st_geometry(sites),pch = 18,col = "red") The same process is repeated for the study area, and for the features that the sites will overlap, the quarries are digitised from historic Ordnance Survey mapping. study<-st_read ("data/esolent_studyarea.shp") feats<-st_read("data/all_clipped_quarries.shp") These operations are carried out through a series of dialogs ( Figure A1) so the user does not need to execute the code by hand. Figure A1. "Open" dialog, with shape file selected.
Once each of the three necessary layers (study area boundary, features to intersect with archaeological sites, archaeological sites) has been loaded, the "main.R" script file will plot each of the layers together, with sites shown in red, and prompt the user to continue. If the "OK" Option is selected, R will plot the selected layers ( Figure A2) and proceed to the generation of random sites through the script file "rangen.R". If the "Cancel" Option is selected, the dialog will close. Once each of the three necessary layers (study area boundary, features to intersect with archaeological sites, archaeological sites) has been loaded, the "main.R" script file will plot each of the layers together, with sites shown in red, and prompt the user to continue. If the "OK" Option is selected, R will plot the selected layers ( Figure A2) and proceed to the generation of random sites through the script file "rangen.R". If the "Cancel" Option is selected, the dialog will close.

A.2. Generation of Random Points Inside Study Area
Next, we generate the same number of random points as there are sites inside the study area boundary. This is accomplished using the st_sample command from the sf package, which applies a uniform sampling strategy as follows: Figure A2. MCSites tool overlay maps and "Proceed" dialog.