Integrating Point Process Models, Evolutionary Ecology and Traditional Knowledge Improves Landscape Archaeology—A Case from Southwest Madagascar

Landscape archaeology has a long history of using predictive models to improve our knowledge of extant archaeological features around the world. Important advancements in spatial statistics, however, have been slow to enter archaeological predictive modeling. Point process models (PPMs), in particular, offer a powerful solution to explicitly model both firstand second-order properties of a point pattern. Here, we use PPMs to refine a recently developed remote sensing-based predictive algorithm applied to the archaeological record of Madagascar’s southwestern coast. This initial remote sensing model resulted in an 80% true positive rate, rapidly expanding our understanding of the archaeological record of this region. Despite the model’s success rate, it yielded a substantial number (~20%) of false positive results. In this paper, we develop a series of PPMs to improve the accuracy of this model in predicting the location of archaeological deposits in southwest Madagascar. We illustrate how PPMs, traditional ecological knowledge, remote sensing, and fieldwork can be used iteratively to improve the accuracy of predictive models and enhance interpretations of the archaeological record. We use an explicit behavioral ecology theoretical framework to formulate and test hypotheses utilizing spatial modeling methods. Our modeling process can be replicated by archaeologists around the world to assist in fieldwork logistics and planning.


Introduction
Predictive modeling has been a staple of landscape-scale archaeological investigations for decades [1][2][3][4][5][6]. These models are critical not only for the identification of archaeological features but also for understanding the processes underlying their distributional patterns. The methods employed for the development of predictive models in archaeology are varied and often integrate expert knowledge, geographic information systems (GIS), remote sensing analysis [2,7,8], linear regression [9][10][11][12], and other approaches (see References [6,13]). Other spatial statistical methods can also be used for developing such models, but, to-date, have been underutilized in archaeology. Predictive modeling in archaeology is also often limited by the lack of explicit theoretical frameworks to interpret patterns in the archaeological record [6,14]. As we demonstrate below, the integration of theory from human behavioral ecology (HBE) into predictive models allows us to test hypotheses about processes driving human settlement on a landscape and improve our interpretations of settlement patterns.
In southwest Madagascar, communities traditionally practice a mixed economy consisting of foraging, fishing, herding, and farming [41][42][43][44][45]. Cultural identity in the region is tied to ancestral clan affiliation as well as to dominant subsistence practices [44]. The coastal region is occupied by Vezo (who are primarily fishers), Masikoro (who are primarily agropastoralists), and Mikea (who primarily occupy and forage in the dry forests further inland) [43,44]. While there is evidence that foraging populations have lived on Madagascar for thousands of years [38,40,46], the exact timing of human arrival in the southwest and elsewhere on the island is still subject to intense debate among archaeologists [37,39]. Based upon limited evidence, it is likely that these early human populations were small foraging communities that occupied areas around river valleys [47] and karst landscapes along the island's coasts [46,48]. To better understand settlement patterns on the southwest coast, Davis and colleagues [49] developed an HBE-based remote sensing/GIS predictive model for locating Late Holocene archaeological deposits in coastal southwestern Madagascar and interpreting their distribution ( Figure 1). The model developed by Davis et al. [49] is rooted in HBE, specifically the ideal-free distribution (IFD) model, which assumes that the greatest overall habitat suitability (measured by the availability of important environmental resources) will reflect where a population will preferentially choose to settle [50][51][52]. Using the theoretical predictions of an IFD (i.e., communities will first choose to settle the most suitable habitats, identified on the basis of resource availability, followed by lower suitability regions) the team compiled a list of important ecological variables, which were recorded in ethnohistoric records [53]. The invaluable insight provided by such traditional ecological knowledge (TEK) has been well established [54][55][56]. In particular, TEK has increased archaeology's ability to inform the present by tracing feedback loops between human behavior and environmental change [57,58]. Davis and colleagues [49] then digitized these features using satellite imagery and automated image analysis procedures and developed a predictive model rooted in the assumption that the closer a point on the landscape is to these resources, the higher the likelihood of people The model developed by Davis et al. [49] is rooted in HBE, specifically the ideal-free distribution (IFD) model, which assumes that the greatest overall habitat suitability (measured by the availability of important environmental resources) will reflect where a population will preferentially choose to settle [50][51][52]. Using the theoretical predictions of an IFD (i.e., communities will first choose to settle the most suitable habitats, identified on the basis of resource availability, followed by lower suitability regions) the team compiled a list of important ecological variables, which were recorded in ethnohistoric records [53]. The invaluable insight provided by such traditional ecological knowledge (TEK) has been well established [54][55][56]. In particular, TEK has increased archaeology's ability to inform the present by tracing feedback loops between human behavior and environmental change [57,58]. Davis and colleagues [49] then digitized these features using satellite imagery and automated image analysis procedures and developed a predictive model rooted in the assumption that the closer a point on the landscape is to these resources, the higher the likelihood of people settling these locations.
During ground testing of the algorithm, more than 1000 individual artifacts were recovered from 74 locations throughout the study area in southwest Madagascar, with most of these materials coming from locations ranked as having a "high probability" of containing archaeological deposits.
Nonetheless, in 20% of instances, the predictive model ranked an area as "high probability" where no archaeological materials were recorded during pedestrian survey. The occurrence of false positives leaves room for improvement in the development of the model and also challenges the starting theoretical assumptions of IFD that framed the model. One overlooked factor is that these false positives may result from the omission of important variables in the construction of the model. Another possibility is that certain resources are more important to populations in this area than others, meaning that a uniformly weighted model may not fit as well as a model where some variables are more highly ranked. Site preservation in this region is quite poor and so false positives could also be related to the disappearance of archaeological sites or visibility issues due to dense vegetation (especially in regions further inland). It is also possible that second-order properties (e.g., settlement nucleation or spacing) are leading to greater densities of archaeological material in some areas of high probability but not in others, which suggests that first-order trends do not fully account for spatial patterns on this landscape. This would support the possibility raised by Davis et al. [49] of an Allee-effect, or positive density dependence (a second-order property), on the distribution of sites, whereby areas with mid-level suitability will attract greater population densities because of habitat modifications and community aggregation [51,59,60]. Such community aggregation in lower-suitability environments can also be caused by territoriality, as predicted by the ideal despotic distribution (IDD) model (another model centered on second-order properties) [51,[61][62][63]. Allee-effects and IDDs will result in second-order properties (i.e., clustering) not accounted for by a first-order trend and can be characterized with PPMs.
By iteratively developing and evaluating different models, we can improve their accuracy and be better positioned to explain emerging patterns in the archaeological record ( Figure 2). In this article, we demonstrate how archaeologists can test the importance of different factors in a model for accurately predicting the location of archaeological deposits. This process allows us to improve the Davis et al. [49] predictive model to increase the number of true positive detections while limiting false positives when directing field surveys.
Geosciences 2020, 10, x FOR PEER REVIEW 4 of 26 settling these locations. During ground testing of the algorithm, more than 1000 individual artifacts were recovered from 74 locations throughout the study area in southwest Madagascar, with most of these materials coming from locations ranked as having a "high probability" of containing archaeological deposits. Nonetheless, in 20% of instances, the predictive model ranked an area as "high probability" where no archaeological materials were recorded during pedestrian survey. The occurrence of false positives leaves room for improvement in the development of the model and also challenges the starting theoretical assumptions of IFD that framed the model. One overlooked factor is that these false positives may result from the omission of important variables in the construction of the model. Another possibility is that certain resources are more important to populations in this area than others, meaning that a uniformly weighted model may not fit as well as a model where some variables are more highly ranked. Site preservation in this region is quite poor and so false positives could also be related to the disappearance of archaeological sites or visibility issues due to dense vegetation (especially in regions further inland). It is also possible that second-order properties (e.g., settlement nucleation or spacing) are leading to greater densities of archaeological material in some areas of high probability but not in others, which suggests that first-order trends do not fully account for spatial patterns on this landscape. This would support the possibility raised by Davis et al. [49] of an Alleeeffect, or positive density dependence (a second-order property), on the distribution of sites, whereby areas with mid-level suitability will attract greater population densities because of habitat modifications and community aggregation [51,59,60]. Such community aggregation in lowersuitability environments can also be caused by territoriality, as predicted by the ideal despotic distribution (IDD) model (another model centered on second-order properties) [51,[61][62][63]. Alleeeffects and IDDs will result in second-order properties (i.e. clustering) not accounted for by a firstorder trend and can be characterized with PPMs.
By iteratively developing and evaluating different models, we can improve their accuracy and be better positioned to explain emerging patterns in the archaeological record ( Figure 2). In this article, we demonstrate how archaeologists can test the importance of different factors in a model for accurately predicting the location of archaeological deposits. This process allows us to improve the Davis et al. [49] predictive model to increase the number of true positive detections while limiting false positives when directing field surveys.  Building on this previous work by incorporating PPMs allows us to refine the current predictive algorithm for Madagascar and improve its utility for future archaeological investigations. In particular, we demonstrate how PPMs can be used to characterize different environmental variables that predict the first-order intensity of archaeological deposits and also to what degree second-order clustering or dispersion properties improve predictive accuracy. This approach thus aids in understanding the relative importance of different variables and processes responsible for spatial patterns in the archaeological record.

Materials and Methods
To re-evaluate the Davis et al. [49] predictive model, we take the previous set of variables incorporated into this model as well as new variables that were previously excluded ( Table 1). The environmental variables used by Davis et al. [49] were generated via automated remote sensing analysis and vegetative index calculations. The same data are used here and are freely available from the Davis et al. [49] publication's supplemental documents (see https://doi.org/10.26207/1a47-pw11). The Davis et al. [49] algorithm considered the distance from any given point on the landscape to vegetatively productive regions (quantified via soil adjusted vegetation index (SAVI) values [64]), paleodune features (where many recorded archaeological sites in this area are located), offshore islands (which were important refuges from warfare and banditry, as well as important fishing grounds) and coral reefs. In addition to these variables, field observations indicate that rocky outcrops along much of the coastline appear associated with the presence of archaeological deposits. This is potentially related to freshwater access [65] as well as defense and survival strategies. Oral histories from coastal dwelling Vezo communities in the southwest of Madagascar indicate that limestone outcrops often provide good locations for hiding from marauders [66]. Ethnographic and archaeological sources also suggest that freshwater availability is particularly important when selecting village locations [34,42,45,66,67]. The outcrops act as rainwater reservoirs and are still used by communities in the region today for gathering freshwater. Because freshwater source data is not available for the study region, we also use depth to bedrock as a proxy [68,69]. The shallower the bedrock depth, the closer to the surface and more accessible groundwater reservoirs are expected to be. Depth to bedrock and soil data used for our analysis [70] are freely available from the International Soil Reference and Information Centre (ISRIC) Data Hub (https://data.isric.org/geonetwork/srv/eng/catalog.search#/home). While groundwater hydrology is far more complicated than a metric of depth to bedrock [69,71,72], shallow bedrock has been the source of water exploitation for populations in other parts of the world [68,73]. The geology of southwest Madagascar is largely sedimentary, with geologically recent deposits comprised of limestone, sand and other alluvial soils [74,75].
In order to evaluate the overall importance of first and second-order properties (e.g., Allee effect clustering on the archaeological record, as suggested by Davis et al. [49]), we use exploratory point pattern analyses and then build a series of PPMs. Archaeological survey data collected over the past several years by the Morombe Archaeological Project (MAP) [76] are used as the basis of knowledge about the archaeological record. Archaeological points were recorded as artifact clusters (comprised of a mix of materials including ceramic sherds, marine shell tools, faunal remains, and charcoal) during survey of the Davis et al. [49] model and thus each point can represent multiple artifacts at a given location. If some data points represent a single artifact while others represent multiple artifacts, density calculations and the overall intensity of the point pattern can be affected. To ensure that spatial tests are not being skewed due to this fact, we ran all spatial tests twice, once by incorporating the total number of materials as a spatial weight and once without weighting, to see if artifact counts had any spatial effects on the point pattern. Survey locations were chosen using a stratified random strategy and surveyors were not given information about the model's predictions of site locations, in order to avoid survey bias.

Point Process Modeling for First-Order Properties
First, we explore first-order trends between the intensity of archaeological deposits in our study area and different covariates (i.e., environmental variables) using nonparametric smoothing (rho-hat function in the spatstat package) [15]. The rho-hat function plots the intensity of a point pattern as a function of a given covariate [15]. In other words, the rho-hat function quantifies the degree to which the density of archaeological deposits is related to a specific variable (e.g., how archaeological deposits are accounted for by vegetation).
Next, we investigate whether the settlement pattern in our study area exhibits significant degrees of clustering or dispersion (i.e., second-order interactions) by performing K-, G-and pair correlation (PC) function tests on the archaeological point pattern. Each of these summary functions assesses the degree of clustering or dispersion in a point pattern but are somewhat distinct. The K-function [77] calculates the average number of points within a specified radius and is standardized by the intensity of those points (i.e., the number of points is divided by the total area encompassed by point locations). The G-function calculates nearest neighbor distance distribution values for each data point [15]. The PC function assesses whether a point-pattern is significantly more clustered or dispersed than expected for a random pattern but is not cumulative like the other functions [15,78]. The significance of potential clustering or dispersion is assessed by comparing the empirical archaeological patterns to simulated realizations of a random null model. We assess the significance of second-order interaction in these functions using simulation envelopes of a null model of complete spatial randomness (CSR) with 39 Monte Carlo iterations (equivalent to p = 0.05). Significant clustering is inferred when the empirical function (i.e., the archaeological point pattern) is above the simulation envelope. Likewise, significant dispersion between points is inferred when the empirical function is below the simulation envelope. Areas of the empirical function within the envelope are consistent with CSR (i.e., not significantly different from a random pattern).
Following these exploratory analyses of first-and second-order properties we develop a series of PPMs, following similar recent applications [17,20,22]. Our initial PPMs focus on modeling first-order trends and consist of: a homogenous Poisson model (which models the archaeological point pattern as a random process (i.e., CSR)) and two inhomogeneous Poisson models (which model the point pattern as a function of environmental covariates): the Davis et al. [49] model and a new model combining all of the assessed environmental variables (see Table 1). We identify the best fitting model for the settlement pattern using multi-model selection tools, specifically the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) [79][80][81] and a stepwise selection procedure. These model selection tools evaluate how well a statistical model fits a dataset given a tradeoff between model fit and overall complexity of the data [80]. The lower the difference in information criteria score (e.g., ∆AIC and ∆BIC) and the higher the weight (W i ), the better the model. Stepwise selection determines the best fitting model by evaluating how each covariate affects the fit of the overall model and dropping covariates if they result in a higher AIC or BIC score [82].
We assess the fit between the best-fitting models and the empirical data in multiple ways. We first evaluate the fit by calculating the raw and kernel-smoothed residual values of our best fitting model and the original Davis et al. [49] model to quantify any deviations between the first-order trend of the fitted model and the empirical data. In other words, the residuals illustrate the goodness-of-fit between the actual locations of archaeological deposits and the locations predicted by the PPM. Raw residuals estimate the bias within a model and kernel-smoothed residuals use a geographically weighted average of the residuals to account for spatial variance in the raw residual values [83]. We then evaluate the second-order component of the best-fitting models using residual-K and G-functions. These functions assess the fit between the second-order component PPM and the empirical point pattern (i.e., archaeological material distributions) by comparing their K-and G-function estimates with a "compensator," which is an estimate of the mean value of the K-or G-function [84]. In simpler terms, residual K-and G-functions assess the goodness-of-fit between the predictions of a PPM and the archaeological data in terms of their interpoint relationships (i.e., clustering or dispersion between archaeological points). If the PPM is a good fit to the data, the residual-K and G-functions should lie within the simulation envelope. If the empirical function lies above or below the simulation envelope, this suggests that the empirical pattern is more clustered or dispersed than is accounted for in the model, respectively. In such cases of poor second-order fit, we can improve the models by explicitly including a second-order interaction clustering or dispersion parameter.

GIS Analysis
Next, using the model selection results of the best-fitting inhomogeneous model, we generate rasterized probability maps (i.e., a series of pixel values situated in geographic space that contain probability values) of places where archaeological sites are expected in ArcGIS 10.7 [85]. Using raster maps, we evaluate the probability scores of different regions within the study area directly with locations of identified archaeological deposits. These maps, in turn, can be used to direct further ground surveys in Madagascar. We create different probability maps by assigning differential weights to each covariate in order of their relative importance. We determine differential weighted values for each covariate based on the rho-hat intensity functions and coefficient estimates from the best-fitting PPM. The greater the intensity measurement and the stronger the coefficient estimates, the greater that covariate's effect on the archaeological point distribution. By evaluating the difference in intensity and coefficient estimates associated with each covariate we can establish orders of magnitude by which each covariate influences the resulting archaeological point pattern. These values can then be translated into associated rankings (or weights) for each variable.
For example, in this study we are concerned with the relationship of covariates that explain intensities of archaeological deposits at different distances. As such, if we look at three different variables (A, B and C) we can examine: (1) whether all three variables are negatively or positively correlated with archaeological points (i.e., as distance from each variable increases, does the number of archaeological materials decrease or increase, respectively) based on coefficient estimates; and (2) the intensity values of the archaeological points that each variable accounts for based on the rho-hat function and fitted values of the model coefficients. For example, if variable A has a negative relationship with the archaeological point pattern that is three times as strong as variable B and B is twice as negatively related with C, we can use this information as a guide to which variables are most important. Following evaluation of the fitted coefficients, if variable A accounts for 3-times the intensity of a point pattern at small-distances than B, and B accounts for 2-times the intensity of C, we can weight (or rank) variable A at three times that of B and B at twice that of C.
We create different weighted rasters using the raster calculator tool in ArcGIS. Following the creation of these probability rasters, we assess each model by comparing surveyed locations from 2019 (which were used to test the Davis et al. [49] model) with the probability values assigned by the model. If archaeological deposits are recorded in areas that were not predicted (or had low probability values) this indicates a poor fit for that predictive model. In contrast, if deposits are recorded in areas with high probability values, this indicates a better-fitting predictive model. The GIS analysis can be performed in open source platforms as well (e.g., QGIS [86]) and necessary code used for raster calculations is provided in the Supplemental R-markdown document.
Ethnographic data are also incorporated into weighting decisions when explicit information pertaining to the importance of resources is available. For example, ethnohistoric records indicate that coastal Vezo communities in southwest Madagascar place great importance on freshwater availability, access to coral reefs and fisheries, and that rock outcrops have served as a defensive strategy for hiding from foreign invaders [42,43,53,66]. Marine and freshwater resource availability and defensibility of the area are a common rationale for settling an area, and the lack of freshwater or defensibility are most frequently referenced for why people move [53]. As such, results from the prior intensity metrics and coefficient estimates are compared with these ethnographic data and where discrepancies arise (e.g., statistical tests suggest that marine resources are not strongly influencing archaeological points but ethnohistoric records indicate a strong connection), favor is given to the ethnohistoric data. Before creating the new probability models in ArcGIS 10.7 [85], the depth to bedrock data is resampled using nearest neighbor interpolation from its original 250m resolution to a resolution of 10m to match the other datasets derived from Sentinel-2 satellite imagery. We resample the dataset iteratively in ArcGIS 10.7 [85] using the resample tool to achieve the best results, from 250 m to 10 m (resampling was conducted four times, first from 250 m to 150 m, second from 150 m to 75 m, third from 75 m to 35 m and fourth from 35 m to 10 m.). When not resampled, the inclusion of this data into raster calculations makes resulting datasets too coarse to direct meaningful ground investigations ( Figure 3).
Geosciences 2020, 10, x FOR PEER REVIEW 8 of 26 and defensibility of the area are a common rationale for settling an area, and the lack of freshwater or defensibility are most frequently referenced for why people move [53]. As such, results from the prior intensity metrics and coefficient estimates are compared with these ethnographic data and where discrepancies arise (e.g., statistical tests suggest that marine resources are not strongly influencing archaeological points but ethnohistoric records indicate a strong connection), favor is given to the ethnohistoric data. Before creating the new probability models in ArcGIS 10.7 [85], the depth to bedrock data is resampled using nearest neighbor interpolation from its original 250m resolution to a resolution of 10m to match the other datasets derived from Sentinel-2 satellite imagery. We resample the dataset iteratively in ArcGIS 10.7 [85] using the resample tool to achieve the best results, from 250 m to 10 m (resampling was conducted four times, first from 250 m to 150 m, second from 150 m to 75 m, third from 75 m to 35 m and fourth from 35 m to 10 m.). When not resampled, the inclusion of this data into raster calculations makes resulting datasets too coarse to direct meaningful ground investigations ( Figure 3). For all models, we take the inverse of the distances from all variables from any given point within the study area (except for bedrock, which is the depth not the distance). This follows Davis et al.'s ( [49]) assumption (and the theoretical assumption of IFD) that the closer a point is to these variables the higher the likelihood of human occupation.
Finally, the probability maps are evaluated against each other and the original model published by Davis et al. [49] to determine the most accurate predictive model for archaeological prospection. Accuracy is calculated based on the same 2019 surveys used to ground-truth the original model (see Reference [49]). In all surveyed regions we calculate the probability value assigned by each raster. Because each raster provides only the probability of archaeological materials and not a definitive "yes" or "no" to the presence of cultural deposits, we used a natural breaks method [87] to define "true" or "false" negatives and positives based on the threshold between "high," "medium" and "low" probability locations. "True positives" are considered as all surveyed locations identified as having "high" likelihood of containing archaeological deposits that did indeed contain artifacts, For all models, we take the inverse of the distances from all variables from any given point within the study area (except for bedrock, which is the depth not the distance). This follows Davis et al.'s ( [49]) assumption (and the theoretical assumption of IFD) that the closer a point is to these variables the higher the likelihood of human occupation.
Finally, the probability maps are evaluated against each other and the original model published by Davis et al. [49] to determine the most accurate predictive model for archaeological prospection. Accuracy is calculated based on the same 2019 surveys used to ground-truth the original model (see Reference [49]). In all surveyed regions we calculate the probability value assigned by each raster. Because each raster provides only the probability of archaeological materials and not a definitive "yes" or "no" to the presence of cultural deposits, we used a natural breaks method [87] to define "true" or "false" negatives and positives based on the threshold between "high," "medium" and "low" probability locations. "True positives" are considered as all surveyed locations identified as having "high" likelihood of containing archaeological deposits that did indeed contain artifacts, while "false positives" are all surveyed areas identified as "high" likelihood of containing archaeological materials that did not contain any artifacts. Medium and low likelihood areas are not included in these assessments because the algorithm expects that you might find material but you might not, thus it cannot be counted as a "true" positive or negative. All spatial analyses are conducted in R [88] using the spatstat package [15] and the code is freely available (see Supplemental File). We also use MuMIn [89], maptools [90], raster [91], rgdal [92], rgeos [93], sp [94,95] and MASS [82] packages.

Point Process Modeling for Second-Order Properties
The best-performing weighted raster and the unweighted raster are imported into R and used to create two PPMs with these rasters as the sole covariates. The unweighted raster created in ArcGIS (representing the best-fitting PPM) is compared against the weighted model, rather than direct comparison with the best fitting PPM to prevent issues in model selection resulting from differences in the number of covariates. A model with six covariates is inherently more complex than a model with one covariate, which can result in higher AIC/BIC scores for the more complex model.
To assess whether the weighted model is a better fit to the data, we compare it with the unweighted PPM using the same multi-model selection approach detailed above [79][80][81]. We then assess second-order interactions by building new models that include an Area Interaction process, which accounts for clustering or dispersion at multiple scales [96]. For these area interaction models, the value of the irregular parameter r is selected by minimizing AIC. We construct three PPMs that add this second-order parameter to the original Davis et al. [49] model, the best-fitting unweighted model, and the best-fitting weighted model. Each of these new PPMs is compared to the other models without second-order properties using multi-model selection. If second-order processes are influencing the archaeological distributional pattern (as predicted by an Allee effect or IDD model), then we expect that the best fitting model will consist of both first-order properties (i.e., environmental variables) and second-order interactions (e.g., clustering).

Results
We first present the rho-hat intensity functions, which measure the first-order intensity of archaeological deposits in relation to specific environmental covariates. We examine these patterns using both weighted and unweighted datasets. Figures 4 and 5 show that certain covariates have a greater effect on the intensity of the archaeological point pattern. Specifically, bedrock and rocky outcrops have the strongest negative relationship with the intensity of archaeological points (meaning that intensities are highest at lower distances), while the distance to the ocean, dunes and vegetation values have a weaker negative intensity relationship with archaeological points. These relationships fluctuate, however, as marine resources seem to cycle between increased and decreased intensity over a distance of 250 m. When accounting for the number of artifacts present in each area using a spatial weight, both unweighted ( Figure 4) and weighted ( Figure 5) results present very similar patterns. This suggests that the effects of different covariates are not strongly influenced by artifact count.    Figure 6 shows the results of K-, G-and PC-functions that test for second-order interactions in the archaeological data. All tests indicate significant clustering between points (p = 0.05). To determine whether spatial intensity is influenced by the amount of material present in a location, we also run these tests using artifact counts as a weighting factor. The results appear identical between weighted and unweighted tests (using artifact counts as a weight), suggesting that population weights are not significantly influencing the results (also see Supplemental File). As such, we proceed with PPMs that weight each point evenly in terms of artifact counts.  Figure 6 shows the results of K-, G-and PC-functions that test for second-order interactions in the archaeological data. All tests indicate significant clustering between points (p = 0.05). To determine whether spatial intensity is influenced by the amount of material present in a location, we also run these tests using artifact counts as a weighting factor. The results appear identical between weighted and unweighted tests (using artifact counts as a weight), suggesting that population weights are not significantly influencing the results (also see Supplemental File). As such, we proceed with PPMs that weight each point evenly in terms of artifact counts. Figure 6. Results of testing for second-order interaction using K-, G-and PC-functions compared to 39 simulated realizations of complete spatial randomness (CSR). Black line is the empirical function for archaeological deposits, the red-dashed lines is the theoretical expectation under the null model of CSR, the grey shaded region is the simulation envelope (equivalent to p = 0.05).  (Figure 7). We find that PPM3 and PPM4 are very similar and both are better models for archaeological settlement distribution than the Davis et al. [49] model in terms of first-order properties.    (Figure 7). We find that PPM3 and PPM4 are very similar and both are better models for archaeological settlement distribution than the Davis et al. [49] model in terms of first-order properties.  Finally, we assess PPM3 and PPM4 with PPM1 in terms of second-order properties using residual K-and G-functions. Figure 8 shows that both PPM3 and PPM4 are negligible in their difference and both perform better (i.e., are a better fit to the archaeological data) than PPM1. Nonetheless, both PPM3 and PPM4 underestimate second-order clustering in the data. As such, we chose PPM3 because it is a simpler model (BIC selection chooses the simplest best performing model), with a ΔAIC = 1.13 and Wi = 0.256 and ΔBIC = 0 and Wi = 0.848. Table 3 shows the covariate estimates for PPM3. Finally, we assess PPM3 and PPM4 with PPM1 in terms of second-order properties using residual K-and G-functions. Figure 8 shows that both PPM3 and PPM4 are negligible in their difference and both perform better (i.e., are a better fit to the archaeological data) than PPM1. Nonetheless, both PPM3 and PPM4 underestimate second-order clustering in the data. As such, we chose PPM3 because it is a simpler model (BIC selection chooses the simplest best performing model), with a ∆AIC = 1.13 and W i = 0.256 and ∆BIC = 0 and W i = 0.848. Table 3 shows the covariate estimates for PPM3.

GIS Analysis
We re-create the exact PPM developed in R (PPM3) in ArcGIS, where all variables are evenly weighted. The following formula (1) is used in the raster calculator to create the Unweighted Model: where Parch = the archaeological probability value, DpthBR = depth to bedrock, DRO = distance to rock outcrops, Di = distance to offshore islands, Dw = distance to water/ocean, Dc = distance to coral.

GIS Analysis
We re-create the exact PPM developed in R (PPM3) in ArcGIS, where all variables are evenly weighted. The following formula (1) is used in the raster calculator to create the Unweighted Model: where P arch = the archaeological probability value, Dpth BR = depth to bedrock, D RO = distance to rock outcrops, D i = distance to offshore islands, D w = distance to water/ocean, D c = distance to coral.
Next, we create a predictive raster with weights for each of the variables (see Table 4). Because the unweighted raster produced fewer true positives than the original Davis et al. [49] model, we turn to ethnohistoric records and reincorporate paleodunes and vegetation (measured by SAVI), which are known to be important for defense and terrestrial resource acquisition [43,44,66]. Paleodunes were ranked the lowest by coefficient estimation and rho-hat intensity metrics and so we weighted all variables based on their relative difference in intensity and coefficient estimates compared to dunes (the lowest ranking variable). As such, paleodunes were weighted at 1 and bedrock was 2.5 (because its coefficient estimates are strongest and rho-hat estimations show that the maximum intensity of bedrock is 2.5-3 times greater than that of paleodunes).
We find that Weighted Model 3 (which has differentially weighted variables) yields more true positive detections than the unweighted model (PPM4) and the original Davis et al. [49] model (Table 4). We also find that by increasing paleodune weights (Weighted Model 4) the results do not change in terms of overall true and false positive detections when compared to Weighted Model 3.
We also assess the quantitative raster probability values assigned at each location of recorded archaeological deposits (n = 1030) to assess how well individual materials are predicted by the model. The better the model performs, the higher the probability values will be at locations of archaeological deposits. Table 5 shows that Weighted Models 1 and 3 achieve the highest predicted values in known locations of archaeological activity and therefore perform best.
Finally, we create a PPM using Weighted Model 3 as the sole covariate. Model selection indicates that the weighted model is a better fit than the best-fitting unweighted model (PPM3; Table 6). We use Weighted Model 3 because it resulted in the greatest number of true positives when assessing 2019 survey results and had the highest average point values (see Tables 4 and 5). Nonetheless, second-order properties are excluded from this model and exploratory analysis indicates that second-order properties are influencing the point pattern ( Figure 6).

Point Process Modeling of Second-Order Properties
Adding a second-order area interaction process into the models produces a substantially better fit ( Table 6). The best-fitting model combines Weighted Model 3 with an area interaction component (PPM8), with ∆AIC and ∆BIC of 0 and w i of 1. The unweighted model with area interaction (PPM7) is worse at predicting archaeological patterns than PPM8, with ∆AIC and ∆BIC of >500 and W i of 0. Weighted Model 3 without area interaction (PPM5) performs substantially worse than PPM8, with ∆AIC and ∆BIC of >2300 and W i of 0. The residual K-function indicates that the best fitting model (PPM8) still underestimates second-order clustering at distances ≥ 1500 m (Figure 9). Table 4. The formulas and respective covariate weights for each predictive model and the results of ground truthed results from the Davis et al. [49] study in relation to each modified algorithm. The best performing model (bolded) produces the most true positives and highest overall values in areas with confirmed archaeological deposits (Table 5). D v = distance to vegetation (measured by SAVI) and D d = distance to paleodunes.  Complete Spatial Randomness (CSR) 1 13,174.52 0 13,164.14 0 * Raster composed of the following variables (weight in parentheses): Bedrock (w = 2.5), rocky outcrops (w = 2), vegetation (SAVI) (w = 1.75), islands (w = 1.5), coral (w = 1.5), ocean (w = 1), paleodunes (w = 2). ** Raster composed of the following variables: coral, water, islands, rocky shoreline, depth to bedrock.  * Raster composed of the following variables (weight in parentheses): Bedrock (w = 2.5), rocky outcrops (w = 2), vegetation (SAVI) (w = 1.75), islands (w = 1.5), coral (w = 1.5), ocean (w = 1), paleodunes (w = 2). ** Raster composed of the following variables: coral, water, islands, rocky shoreline, depth to bedrock.

Figure 9.
Residual K-function test of the best-fitting unweighted PPM with an area interaction parameter and the best-fitting weighted PPM with an area interaction parameter. Both models performed better with area interaction than without and the weighted model yields the best results.

Discussion
Results of the exploratory analyses indicate that certain environmental variables (e.g., depth to bedrock and rocky outcrops) are strongly influencing the intensity of the archaeological point pattern (i.e., the distributional pattern of past populations) and that second-order clustering is also contributing to the distributional patterns of archaeological deposits. K-and G-tests show clustering Figure 9. Residual K-function test of the best-fitting unweighted PPM with an area interaction parameter and the best-fitting weighted PPM with an area interaction parameter. Both models performed better with area interaction than without and the weighted model yields the best results.

Discussion
Results of the exploratory analyses indicate that certain environmental variables (e.g., depth to bedrock and rocky outcrops) are strongly influencing the intensity of the archaeological point pattern (i.e., the distributional pattern of past populations) and that second-order clustering is also contributing to the distributional patterns of archaeological deposits. K-and G-tests show clustering at distances up to 800 m while the PC-function indicates clustering up to 100 m. This suggests that there may be multiple scales of clustering. The PPMs demonstrate that the inclusion of variables related to freshwater and defensive strategies (i.e., depth to bedrock and rocky outcrops) improve the Davis et al. [49] predictive model. Moreover, the incorporation of differential weights for environmental covariates (derived from exploratory spatial analyses and ethnohistoric records) results in a substantial improvement in the predictive accuracy. The weighted model yields more true positive results and is a better fit to the archaeological data (based on model selection criteria) than the unweighted model. This suggests that covariates do not equally influence "suitability" of locations in the study area. Instead, freshwater resources and defendable areas are highly important, while vegetation appears less important for settlement choice.
While these first-order models perform well, they underestimate the empirical distribution of archaeological deposits in some portion of the study region, suggesting the influence of some kind of second-order process on the settlement pattern. The addition of a second-order clustering process (i.e., area interaction) results in a substantial improvement to the models, decreasing model selection criteria values by thousands compared to models without second-order properties ( Table 6). Combined with differential weights for environmental covariates, the settlement pattern in southwest Madagascar is best explained by environmental variables related to freshwater availability and defense, followed by marine resource access and inter-point clustering between archaeological points at some scales ( Figure 9). This conclusion fits well with ethnohistoric data for coastal Vezo communities in southwest Madagascar [41][42][43]66] and provides further support of an Allee-effect distribution (as suggested by Reference [49]). Thus, we find that PPMs-and second-order properties specifically-are critical for developing and refining predictive models for landscape-scale archaeological research in this area. Nevertheless, the best fitting model is still not a perfect explanation for settlement patterns in this region, as the residual K-function shows higher rates of clustering between points at distances of ≥1500 m. This suggests that some additional factor (e.g., land governance, kinship or social networks, etc.) not accounted for by the best-fitting model is causing aggregation in archaeological materials at distances greater than 1500 m. One possibility is that foraging took place by some communities at larger distances from a primary residence, which has been demonstrated ethnographically in this region [42,43].
Our results also contribute, more specifically, to the interpretation of Madagascar's archaeological record. Results in Table 4 suggest that archaeological settlement conforms to an ideal-free distribution (either IFD or IFD-Allee) based on the proportion of artifacts recovered from high suitability areas relative to medium and low suitability locations. This supports similar conclusions by Davis et al. [49]. The IFD predicts that as population densities increase in a given habitat, the overall resource quality in that region will decrease [51]. This degraded habitat quality, in turn, lowers the suitability of the area for future populations. Davis et al. [49] found correspondence between the archaeological settlement of the study region and an alternative model known as IFD-Allee. This model predicts that individuals sometimes benefit by settling less suitable habitats, either socially by attracting others to follow (which offsets predation and increases chances of group survival), or ecologically due to an economy of scale by modifying the surrounding area to increase its resource abundance (e.g., agriculture) [59,60,97]. In an IFD-Allee distribution, population density can decrease in higher suitability locations and increase in mid-level suitability areas [59]. Assuming that the same distribution of artifacts throughout the landscape exists as documented in Davis et al. [49] (i.e., a possible IFD-Allee), a much greater number of materials and archaeological deposits should be detected when applying the predictive model(s) developed here to new surveys in this area. This in turn will help to record at-risk cultural heritage along the coasts of Madagascar and better understand the archaeology of coastal populations in this region. While our case study focuses on coastal Madagascar, our approach can be applied to other areas as well, as long as the particular covariates included in the model are adjusted for these new contexts.
In terms of the local populations within the study region, there may be very different settlement patterns that our current models do not capture. For example, the new unweighted model (PPM6) suggests that inland dwelling groups may prioritize different resources, creating a series of "higher" suitability zones in areas currently ranked as "low" suitability. This is evidenced by the drop in artifact counts in "medium" probability areas and an increase in artifact counts in "low" probability areas (Table 4). This could also be evidence of resource control by certain populations, forcing larger communities into less suitable locations (e.g., IDD [61][62][63]). Additionally, it has been suggested that where subsistence strategies change, environmental proxies for habitat suitability will require modification [30,98]. This is because the variables associated with "suitability" (and their relative importance) change as subsistence strategies shift (e.g., fishing communities will value marine resources more than pastoralists living inland [e.g. References, [42][43][44]). As such, the differential weights of covariates within a model are likely different across time and space and between scales of interaction. Because the southwest of Madagascar contains a diverse range of foraging strategies [42,, this pattern of artifact counts based on suitability could reflect the varied subsistence base in this region (and by extension the fluidity of how "suitability" is defined). This may help explain why the best fitting model (PPM8) still underestimates clustering at larger distances: other local contexts may differ significantly from the overall regional pattern because of fundamental differences in their social or subsistence strategies.
Therefore, defining what constitutes a "suitable" or "unsuitable" location for settlement requires additional information. One important way this can be achieved is by engaging with local histories and ethnographic research (e.g., References [76,[99][100][101]). Our current project provides a good illustration of how such anthropological research methods can improve our knowledge of the past by integrating local knowledge, human behavioral ecology, and spatially explicit modeling approaches. Additionally, such strategies permit for studies of human-environment interaction that go beyond correlation, whereby formalized hypotheses are tested to evaluate associations between different variables [17,22,[102][103][104].
One limitation of our study is that our model uses modern environmental data and does not currently include paleoenvironmental information. Numerous studies have demonstrated that environmental and climate conditions have fluctuated in Madagascar over the past several-thousand years [105][106][107][108]. As such, some of our variables (i.e., depth to bedrock), are likely to have been quite similar several thousand years ago while others (i.e., vegetative index values, paleodune features) were potentially quite different. Paleodunes contain many archaeological deposits today, and so they are a direct proxy for past cultural activity, but vegetative health is potentially more problematic. Future work should thus focus on better integration of paleoclimate proxies matching the temporal placement of the archaeological materials identified. In the meantime, however, our results still provide a successful method for improving detection rates of archaeological deposits, even in the absence of paleoenvironmental data.
A second limitation is that we currently lack temporal information pertaining to the archaeological materials used for our analysis. Work is ongoing to acquire chronological data for these areas but without this information we cannot extend our analysis to investigating changes in resource use over time. This will form the basis of our future work. Currently, we can specify that some of the earliest archaeological deposits in this region (that have associated radiocarbon data) date to~2000-3000 B.P. [48]. More information is needed, however, to fully understand settlement chronologies of this region.
A final limitation is that we make no distinctions between artifact classes within the point pattern. Different artifact types (e.g., ceramics versus shell tools) may alter the best fitting PPM and provide insight into how specific activities differ across the study area. In the same vein, differences in site function (e.g., permanent settlement, seasonal foraging camp, etc.) will likely affect results of spatial analyses, as peoples' considerations will differ based upon the nature of what they plan to do (and how long they plan to stay) in a given location. Despite these limitations, the method presented here proves useful for planning fieldwork with respect to choosing areas for archaeological survey. Furthermore, our study provides a template of how predictive modeling and landscape archaeology can be improved using an iterative process of investigating the past.

Conclusions
Predictive modeling has a long history in archaeology and has resulted in great improvements in archaeological settlement studies around the world [3][4][5]10,12,32,[109][110][111]. Spatial statistical models can improve predictive algorithms and aid in survey projects, whereby the most probable locations for discovering archaeological deposits can be targeted. PPMs are one such spatial method that can substantially improve predictive modeling efforts for archaeological fieldwork given their ability to more fully characterize the fundamental properties of point patterns.
Our results led to the creation of several new archaeological predictive models, some of which resulted in a reduction of false positives during field assessment. Depending upon the purpose of the research, these models may be preferred. Aerial recording of Madagascar's archaeological landscape remains limited and this results in researchers not having a firm understanding of where many archaeological materials are located [49]. For this reason, we chose the model that resulted in the greatest number of true positives. It stands to reason that a few more false detections are a good tradeoff for a large increase in true detections when the goal is to locate as many cultural materials as possible. In other research programs it very well might be more beneficial to reduce false positives at the expense of true positives and negatives, as false positives and negatives can assist in re-evaluating prior hypotheses pertaining to archaeological settlement patterns. Given the nature of our analysis, however, evaluating false negatives is difficult, as there are no definitive locations where the model predicts a total absence of archaeological materials. Nonetheless, researchers who utilize this method (or similar approaches) should investigate "low" probability locations with the same rigor as "high" probability areas to ensure that surveys are not biased based on initial model assumptions, which, as we demonstrate here, sometimes require adjustments.
Our study also demonstrates how the importance of different variables can be incorporated as weights within a predictive model using the results of spatial modeling procedures. By integrating ethnohistoric data with statistical model-selection, we improve our understanding of what constitutes a "suitable" environment (following HBE expectations from an IFD model) and increase the number of true positive identifications of archaeological materials during fieldwork. As such, we argue that archaeological survey procedures can be greatly enhanced by replicating the methods developed here. To facilitate this, all code required for performing the necessary spatial statistical tests are provided as Supplemental Files.
Ultimately, this study can serve as a template for future settlement pattern analyses using an iterative archaeological assessment. We combine robust statistical analyses, landscape scale data (via remote sensing) and traditional knowledge with explicit theoretical frameworks to account for fundamental aspects of archaeological settlement distributions. The use of PPMs allows researchers to investigate important second-order properties that are often ignored by other predictive modeling efforts. The procedure we advocate here (Figure 2) allows for a continuous learning process whereby archaeologists can evaluate different hypotheses and subsequently refine those hypotheses to improve our understanding of the past. We hope that archaeologists find this process useful for framing future landscape scale studies of past human behavior.