Predicting Near-Future Built-Settlement Expansion Using Relative Changes in Small Area Populations

Advances in the availability of multi-temporal, remote sensing-derived global built-/human-settlements datasets can now provide globally consistent definitions of “human-settlement” at unprecedented spatial fineness. Yet, these data only provide a time-series of past extents and urban growth/expansion models have not had parallel advances at high-spatial resolution. Here our goal was to present a globally applicable predictive modelling framework, as informed by a short, preceding time-series of built-settlement extents, capable of producing annual, near-future built-settlement extents. To do so, we integrated a random forest, dasymetric redistribution, and autoregressive temporal models with open and globally available subnational data, estimates of built-settlement population, and environmental covariates. Using this approach, we trained the model on a 11 year time-series (2000–2010) of European Space Agency (ESA) Climate Change Initiative (CCI) Land Cover “Urban Areas” class and predicted annual, 100m resolution, binary settlement extents five years beyond the last observations (2011–2015) within varying environmental, urban morphological, and data quality contexts. We found that our model framework performed consistently across all sampled countries and, when compared to time-specific imagery, demonstrated the capacity to capture human-settlement missed by the input time-series and the withheld validation settlement extents. When comparing manually delineated building footprints of small settlements to the modelled extents, we saw that the modelling framework had a 12 percent increase in accuracy compared to withheld validation settlement extents. However, how this framework performs when using different input definitions of “urban” or settlement remains unknown. While this model framework is predictive and not explanatory in nature, it shows that globally available “off-the-shelf” datasets and relative changes in subnational population can be sufficient for accurate prediction of future settlement expansion. Further, this framework shows promise for predicting near-future settlement extents and provides a foundation for forecasts further into the future.


Introduction
In 2018, 55 percent of the world's population lived in urbanized areas, but this is projected to increase to 68 percent by 2050, due to natural population growth, continued rural to urban migration, and the conversion of rural to urban land [1][2][3]. Most of this anticipated urban growth will be in lowand middle-income countries, specifically in small to medium sized settlements, where the majority of introduce such a modelling framework and validate its performance against withheld time-specific past RS-derived observations and time-specific manual delineations of BS.

Study Areas and Data
To test across a variety of BS morphologies, environmental contexts, and developmental contexts, in addition to countries with varying spatial details of the input census-based population data, we sample countries less present in previous spatial urban and BS modelling studies [24], including Switzerland, Panama, Uganda, and Vietnam (Table 1). Additionally, these countries were chosen to capture a variety of population magnitudes, densities, and distributions across space as well as socio-economic, urban morphological, topographical, and data quality (e.g. spatial fineness of subnational population data) contexts. Given that this extrapolative framework builds off the previously fit interpolative Built-Settlement Growth Model (BSGMi) [16], the same set of covariates were used as in [16] for either predicting transition probability in the random forest ( Table 2, superscript "c") or in the remainder of the disaggregative process. These covariates were selected based upon previous literature to give immediate environmental context and information regarding settlement connectivity and proximity [28,32,33], e.g. negative relationship between slope and likelihood of transition, positive relationship between likelihood of transition and distance to existing BS. Covariates were time specific or assumed to be temporally invariant (Table 2), and were pre-processed and appropriately resampled to 3 arc seconds (~100m at the Equator) as detailed in Lloyd et al. [34]. Table 1. Summary of built-settlement transition data by country and period. Areal units here are pixels (~100m) as that is the unit handled by the model, which looks at relative areal changes as opposed to absolute areal changes. Adapted from Nieves et al. [16].

Country
Average Spatial a Average spatial resolution is the square root of the average subnational area, in km, and can be thought of as analogous to pixel resolution with smaller values indicating finer areal data and vice versa [35] b Note: the Switzerland data suffered from disproportionate, relative to manually interpreted 30 cm true-color imagery, amounts of growth as indicated by the European Space Agency (ESA) Remote Sensing (RS)-derived extents between 2000-2005 and is thought by Nieves et al. [16] to be due to the 2003-2004 shift from delineating land cover changes at 300 m to using imagery to dilenate at 150 m, in conjunction with the highly variable terrain in Switzerland compounding classification attempts. Table 2. Data used for estimating the annual number of non-Built-Settlement (BS) to BS transitions at the unit level (i.e. demand quantification), predicting the pixel level probability surface of those transitions, and performing the spatial allocation procedures of the model. Adapted from Nieves et al. [16]. b Covariates involved in Demand Quantification were used to determine the demand for non-BS to BS transitions at the subnational unit level for every given year. Covariates involved in Spatial Allocation were either used as predictive covariates in the random forest calculated probabilities of transition (see c) or as a post-random forest year specific weight on those probabilities and the spatial allocation of transitions within each given unit area. Covariates used as restrictive masks prevented transitions from being allocated to these areas. c Used as predictive covariates in the random forest calculated probabilities of transition d See Nieves et al. [16] for details on the construction of weighted LAN

Built-Settlement Data
Our chosen representation of BS was the "Urban" class, number 190, of the annual European Space Agency Climate Change Initiative thematic land cover dataset (https://www.esa-landcover-cci.org/; hereafter, ESA). We selected the ESA RS-derived extents data for its annual coverage, at the time of the study, from 1992 to 2015. It has recently been extended to provide coverage for the years 2016-2018 [46]. While ESA RS-derived extents have moderate spatial resolution, 10 arc sec resolution (~300m at Equator), its annual temporal resolution allows for the withholding of years for validation. In our period of interest, 2000 to 2015, the ESA data begins with a Medium Resolution Imaging Spectrometer (MERIS) imagery derived baseline land cover map and detects thematic class changes from this map using 30 arc second (~1 km at the Equator) SPOT VGT imagery (1999-2013) and PROBA-V imagery (2014-2015) [47]. Any detected changes observed over two or more years are delineated at 30 arc second resolution, if prior to 2004, and, beginning with 2004, are further delineated at 10 arc second resolution using the higher resolution MERIS or PROBA-V imagery [47]. Specific to the "Urban" class, ESA incorporates the Global Human Settlement Layer (GHSL) [12,18] and Global Urban Footprint (GUF) [13] datasets to better define the class and integrate elements of two BS datasets within the overall thematic built-environment context. Initial validation efforts estimate the 2015 "Urban" class user and producer accuracies between 86-88 percent and 51-60 percent, respectively, but no information on the other years currently exist [47].
While ESA utilizes the term "urban", it is more correctly capturing aspects of the built environment. Given the integration of the GHSL and GUF data sets, which capture built-settlement, into the ESA "urban" class, we have reason to believe that the ESA "urban" class is more correctly operating on a functional definition of "built-settlement" or "built-settlement"-like, and refer to it as such. For a more detailed discussion on built-settlement and remote-sensing representations, readers are referred to Nieves et al. (2020).

Population Data
Annual subnational unit area (hereafter simply "unit,") population estimates, for 2000 through 2020, were based upon the Gridded Population of the World version 4 (GPWv4) input data [40] were produced by the Center for International Earth Science Information Network (CIESIN) and spatially harmonized as described in Lloyd et al. [34]. To clarify, we are not using the gridded GPW product, which has uniform population density within a given unit, but we are using the same tabular population count data and the associated unit areas. These counts are based upon censuses/official estimates, interpolated at the subnational level per [40] to obtain annual estimates. Each unit possesses a unique ID, referencing a globally consistent grid (3 arc seconds), with the unit areas having globally harmonized coastlines and international borders. It is worth noting that the population count data utilized here are not adjusted to the U.N. country total population estimates, which are used to account for potential biases and errors. Further, the two primary sources of uncertainty in this dataset are linked to the census figures/official estimates and the simple regression used to obtain the annual estimates with few assumptions.

OpenStreetMap Data
OpenStreetMap (OSM) is an open database of user-contributed, edited, and curated spatial data also known as. While OSM offers global extent, like other Volunteered Geographic Information (VGI) [48], its completeness varies across space, with particular gaps in low and middle income countries, and has data quality that can vary both within and between countries [49,50]. Contrastingly, in the best of cases, OSM can approach the quality of official datasets [51]. However, agreed upon means of assessing VGI data quality and accuracy varies and is still debated [52]. Nonetheless, OSM data are used to fill data gaps where official/commercial datasets do not exist or are not publicly accessible and have improved or produced useful analyses and derived datasets, (e.g. [13,34,[53][54][55][56][57] [58]. It contained 8,083 buildings manually delineated by OSM contributors, of which we contributed over an additional 1,700 buildings in an effort to have near 100 percent coverage of permanent vertical structures covered by the definition of BS. We inspected all building footprints in the area for accuracy and temporal coincidence with true colour imagery in 2015. The resource intensive nature of manually delineating and checking building footprints precluded us from carrying out more widespread validations of this nature during this study. The building footprints are provided in the linked data repository (https://data.mendeley.com/datasets/cm6bnzvzfj/1).

Overview
Here we take annual time-series of BS extents spanning 2000-2010 and estimated annual changes in BS population and unit-average BS population density changes to predict short-term (within five years) BS extents from 2011 through 2015. BS population is the population coincident with the BS extents and unit-average BS population density is the BS population of a unit divided by the BS area within the same unit. We refer to the set of years making each time series as TS where TS = {2000, 2001, . . . ,2010} and, expanding the notation from Nieves et al. [16], the first and last years of the input time series are referred to as t0 and t1, respectively. We test this extrapolative Built-Settlement Growth Model (BSGMe) framework using an annual time series of RS-based ESA BS extents from 2000-2010 (TS ESA ).
Similar to the BSGMi framework [16], the BSGMe framework has two primary components of "Demand Quantification" and "Spatial Allocation", shown here in Figure 1. We generalize the BSGMe framework with following steps:

1.
Create gridded population maps for each year in the input TS, following Stevens et al. [54].

2.
For all years in the TS, extract the unit-specific population sum that is coincident with the year's corresponding BS extents and derive the unit-average BS population density 3.
Independently for each unit, and using a rolling origin validation, select the single best fitting model for BS population and, separately, unit-average BS population density from three classes of models: • Auto-Regressive Integrated Moving Average (ARIMA), • Error, Trend, Seasonality (ETS), and • Generalized Linear Model (GLM) given log-transformed inputs.

4.
For each unit, use the final selected model for BS population and for unit-average BS population density to predict short-term annual BS population and annual unit-average BS population density starting with year t1+1 and ending with year t1+h, where in this case 1 ≤ h ≤ 5 and represents the projection horizon, in numbers of years.

5.
Use these estimates to derive the unit-specific annual quantity demand of non-BS-to-BS transitions by dividing the BS population by the BS population density. 6.
Create a transition probability surface using a Random Forest (RF) based upon the observed transitions between t0 and t1 of the input time-series and covariates corresponding to t0.

7.
Take the fit relationships between the occurrence of transitions and the predictive covariates, contained in the final RF model, and predict the future non-BS-to-BS transition probability surface using the same covariates, but corresponding to year t1, as the input. 8.
For each unit and iteratively for all years t1+1 through t1+h, spatially disaggregate the predicted annual unit-level transitions (steps 1-5) using the base transition probability surface (steps 5-6) and, if available, unit-relative weights derived from changes in lights-at-night brightness, similar to Nieves et al. [16].
These steps produce annual binary spatial predictions of BS extent in gridded format. All modelling and analyses were carried out using R 3.4.2 [59] and utilized the IRIDIS 4 high-performance computing cluster. All code is provided in the linked data repository (https://data.mendeley.com/ datasets/cm6bnzvzfj/1). Full process diagrams are provided in Appendix A, Figures A1 and A2.

Built-Settlement Population Estimation
To obtain a set of annual estimated population surfaces for our study areas, we used the method detailed by Stevens et al. [54] to dasymetrically disaggregate [60,61] the census-based population from the unit-level to 3 arc second (~100m at the Equator) pixels. We independently modelled each country and year utilizing time-specific and, assumed, time-invariant predictive covariates (see Appendix A, Table A1). We included the distance-to-nearest BS edge at the year 2000 and the distance-to-nearest BS edge for the given year as predictive covariates. This corresponded with our assumption that population relates to inner parts of BS agglomerations differently from the outer parts and to avoid exaggerated areas of low population density relative to previously modelled years [16,62]. Annually, for each unit, we extracted and summed the populations from pixels that were within year-specific BS extents and derived the annual unit-average BS population density. This resulted in annual time-series of BS population estimates and unit-average BS population densities for every unit in the study area, covering eleven years.

Time-Series Model Fitting and Built-Settlement Population Projections
Using these annual unit-level time-series, we predicted future unit BS population and unit-average BS population density using a single model fitting and selection process detailed in Figure 1. For each unit, this process fits three classes of models: ARIMA models, ETS models, and an identity-link GLM model with log-transformed input values, all using a rolling origin framework validation in the final, i.e. between-class, model selection process.
ARIMA models [63][64][65] and ETS models [64][65][66][67] are two autoregressive model classes often applied to time-series data, including population forecasts [68]. Both classes have dependent model terms based upon preceding values in the input time-series. ETS models are based upon the assumption of non-stationary, i.e. the mean and variance of the underlying process are not constant, and can approximate non-linear processes [64]. Conversely, ARIMA models assume stationarity and a linear correlation between the values of the time-series, but remain a standard in forecasting time-series [64,69]. The best model within the ARIMA class relies on an automated fitting procedure utilizing unit root tests, iterative step-wise parameter fitting, and the resultant lowest Akaike Information Criterion (AIC) value, as described in detail by Hydman and Khandakar [64]. ETS class models are selected in an automated fashion, as described in Hyndman et al. [65], utilizing maximum likelihood parameter estimation, the corrected AIC (AICc), and bootstrapping simulation. For the ARIMA and ETS model classes only the number of years since year t0 and temporally preceding values in the input time-series were available as predictive covariates.
Generalized Linear Models (GLMs) provide a single consistent framework for linking the linear-based systematic elements of regression-type models, associated with Normal, binomial, Poisson, gamma, and other statistical distributions, with their respective random components through an integrated fitting procedure based upon maximum likelihood [70]. Here, we utilized an identity link function, and provided log-transformed input data with the number of years since year t0 as the sole predictive covariate.
During the fitting of these model classes we utilized a rolling origin validation ( Figure 2) of each model class in anticipation of needing to determine the final model based upon a single metric of error across the different number of years predicted into the future. A rolling origin validation fits a selected model upon an iteratively changing sample size and an inversely changing number of future time steps, i.e. "the rolling origin" (Figure 2) [71][72][73]. We used the MeDian Absolute Percent Error (MDAPE) as our forecasting error metric as opposed to the more common Mean Absolute Percent Error (MAPE). The MAPE, compared to other metrics, has the advantage of avoiding large errors when the true value is near zero [74]. The MDAPE retains the advantages of the MAPE, but is less influenced by extreme values and is more robust than the MAPE [69,74]. It can be written as: whereŷ is the predicted outcome of interest and y is the withheld observed outcome. Given our short input time-series (N ts = 11) and our projection horizon between one and five years (1 ≤ h ≤ 5), we utilized a maximum horizon of five years in the model fitting too. This meant the model classes were iteratively fit with between six (i.e. 2000-2005) and ten (i.e. 2000-2009) input observations, with all other observations withheld, and then predicted between one and five years, respectively, forward of the last input year of the given iteration sample. Each iteration produced a set of annual absolute percent errors for the projected years, of which the median was recorded. The sum of MDAPE values across all iterations represents the total error of each model class for the given unit. Written mathematically, for a given unit i, maximum horizon length h, and a being the index of the given set of iterations, the MDAPE sum within the rolling origin framework can be written as where the sample training series for a given iteration can be written as n ts = t1 + a -h and the set of projected years within an iteration are calculated for each year k that takes on values between n ts+1 , . . . , n ts+h , e.g., for h = 3 and a = 3 the models are fit on years 1 to 8 with a set of predictions made for ŷ n ts +1 ,ŷ n ts +2 ,ŷ n ts +3 . After the rolling origin framework finished, for each unit, we selected the best model between model classes based upon the lowest MDAPE isum and fit the selected model class on the entire available time series. Normally, using the entire time series is cause for concern of model over fitting. However, our larger concern was that excluding later observations in the extremely short time series could lead to excluding important information late in the series. Therefore, we assumed that fitting only on a subset of the time-series would be as harmful, or more so, than potentially overfitting any given unit. After the refitting, and independently for each unit, we predicted the final outcome of interest through our projection horizon, in this case 2011-2015. Full process diagram of this sub-procedure is provided in Appendix A, Figure A2. This time-series model selection and prediction procedure was used twice in the demand quantification component of the BSGMe: once for predicting future BS population and once for predicting future unit-average BS population density (Figure 1). For predicting future BS population, we first transformed, and later back-transformed, BS population to an "BS/Non-BS Ratio" to ensure BS population never exceeded total population [1]. We then calculated the final yearand unit-specific number of projected non-BS-to-BS transitions by dividing the projected BS population by the corresponding projected BS population density.

Spatial Allocation
Projecting non-Built-Settlement (BS)-to-BS Transition Probabilities Surface After calculating annual unit-level demand for non-BS-to-BS transitions, we spatially allocated transitions to the pixel level, producing annual projected BS extents. First, we trained a RF on transitions observed between 2000 and 2010 with spatially coincident covariates corresponding to the year 2000 ( Table 2). This RF was created following the sampling and training procedures in Nieves et al. [16] where an iterative covariate selection procedure was employed, removing covariates that did not improve the accuracy of the RF model. In this scenario, we were assuming that relationships observed between transitions and the predictive covariates persist into the near future. Therefore, we projected forward to estimate the probability of transition surface after 2010 by using 2010 representative covariates as input covariates (Figure 1). The values of the resulting probability surface range from 0.00 to 1.00 and represent the posterior probability of a pixel being classified as transitioning between, originally 2000 and 2010, 2010 and 2015 [75]. We elected to use a RF due to its efficiency and scalability as well as its ability to model complex interactions and non-linear phenomenon using a non-parametric approach with minimal input [75]. Further, RFs have been shown in at least one study to outperform other machine learning type methods, such as support vector machines [76], and showed satisfactory performance in Nieves et al. [16].
Annually Adjusting non-BS-to-BS Transition Probabilities While many projections are "truly future" scenarios and no earth observation data would be available, here we are validating the framework within a scenario where the "future" projection period is one where the input BS extent dataset does not have coverage, i.e. as if ESA had stopped producing the dataset at 2010, and we have access to observed lights-at-night (LAN) data during our projection period (2011)(2012)(2013)(2014)(2015). With this, we follow the procedure in Nieves et al. [16] of using average annual unit-normalized lagged LAN brightness to modify the period probability produced by the RF to a more annual representation of the unit-specific probabilities of transition. The assumption behind this process is that pixels with larger unit-relative changes in annual LAN brightness correspond to a larger probability of non-BS-to-BS transition occurring at those location and vice versa. That is, if a relatively large increase, with respect to the given subnational unit, in the LAN brightness occurred between years and given that the area was not already BS, we would assume this corresponded to a higher probability of non-BS-to-BS transition having occurred.
Using these annually adjusted unit-relative probabilities, we followed the procedure in Nieves et al. [16] to spatially disaggregate the demand quantification component-derived projected annual transitions from the unit-level to the pixel-level (Figure 1). Differing from Nieves et al. [16], we did not restrict where the transitions can occur, excluding existing BS areas and bodies of water, as, being the "future", we did not know observed transition locations in the projection period. This iterative disaggregation began with the last observed extents in year t1 (2010) and, within each unit i, if we had n number of predicted transitions for our given projected year, we selected pixels in unit i with the n th highest annually adjusted probabilities, and transitioned them from non-BS-to-BS. This is in line with Nieves et al. [16], Tayyebi et al. [77], Linard et al. [28] and others where it is assumed that pixels with higher transition probabilities are more likely to transition than pixels with lower probabilities. We repeated this process for all years in the projection period, using the previously projected year as the prior BS extents to expand upon, and output the union of the prior extents and the new projected transition as the next year's BS extents (Figure 1). All resulting and derived data are provided in the linked data repository (https://data.mendeley.com/datasets/cm6bnzvzfj/1).

Validation and Comparison Metrics
We validated BSGMe projected extents against the withheld ESA extents for 2011, 2012, 2013, 2014, and 2015. The ESA data themselves are an imperfect reference, but our goal was to replicate the pattern of ESA's capture of BS relative to BS population and BS population density changes. Therefore, "True" in all of these validations represents agreement of the BSGMe projections with the temporally corresponding withheld ESA validation extents and "False," equally, represents disagreement. For every year, we classified every pixel in the study areas as either True Positive, False Positive, False Negative, or True Negative, TP, FP, FN, TN, respectively. Using these pixel-level designations, we calculated classification contingency-table metrics, listed in Table 3, at the unit-level. Table 3. Classification metrics used in assessing the model performance.

Metric Equation Range and Interpretation
Recall [78] TP TP+FN (no recall) -1 (perfect recall) Precision [78] TP TP+FP (no precision) -1 (perfect precision) The fact that most BS land cover and non-BS land cover is in agreement from any time A to near future time B is simply due to the fact that most land cover remains the same, i.e. persistence, causes issues when looking at classification metrics [79]. Some methods exist for accounting for this [79], but because our input datasets assume that "once BS, always BS", we cannot utilize these adjustments in our binary classification assessment. Hence, the best alternative is to compare all results to a null or naïve model [79]. We utilized a conservative naïve model where we assumed that the 2010 BS extents remained constant through 2015, i.e. lacking any other information we assumed the BS extents remain approximately the same over the short-term. In end user applications, when missing year-specific BS extents, the last available BS extents are commonly used as a substitute. We validated the 2010 extents following the same procedures to compare to the modelled extents.
We also visually compared the 2015 BSGMe modelled extents and the withheld ESA 2015 extents to 2015 true color imagery, available via Google Earth [80], to better understand areas of over/under prediction. We further carried out a quantitative classification validation of the 2015 BSGMe modelled extents and the withheld ESA RS-derived 2015 extents against the presence of OSM building footprints in Switzerland around the municipalities of Visp, Brig-Glis, Naters, and Ried-Brig, where relative model over prediction appeared to be exceptionally bad.

Results
Looking at the distribution of unit-level F1 scores in Figure 3, we show that all models decrease in performance as projection horizon increases, with Vietnam having the most rapid rate of decrease and largest net decrease. In all countries, it appears that the naïve model outperforms all other models to varying degrees, but not typically by much in all countries with the exception of Uganda (Figure 3). Further investigating the distributions of F1 scores, in Figure 4, we show that recall also decreases as the projection horizon increases with Vietnam again having the most rapid and largest net decrease in recall. This makes sense as, according to the ESA RS-derived extent datasets, Vietnam had the largest relative growth while Switzerland, whose recall distributions are near identical and perfect across all input series, had very little growth, i.e. recall is driven here in Switzerland largely by persistence ( Figure 4, Table 1). As expected, as the projection year increases, the recall of the BSGMe produced projections outperforms the naïve model by an increasing magnitude. Unexpectedly, considering Figure 4, Uganda had relatively high values of recall, although the variance of unit-level recall was the largest of our study countries (Figure 4). Looking at the distribution of precision values in Figure 5, precision values decrease as the projection year increases across all countries and input series, except the naïve model because false positive could not occur with the extents remaining static. The low and variable precision shown by Uganda ( Figure 5) potentially explains the observed variance of its F1 scores (Figure 3). Our best guess for the low precision here was that the ESA RS-derived extents were not as good as the population data in Uganda, i.e. leading to worse demand quantification and spatial allocation in the production of the time-series and propagating error through the BSGMe projections. Examining the predicted and observed extents of even a subset of projection years and areas within the study countries, Figure 6, gives some context for the findings in Figures 3-5. The same temporal trend of increases in "false positives", red in Figure 6, imply large areas of over-prediction relative to areas of agreement with ESA RS-derived extents. Areas of false negatives, blue in Figure 6, when examined against time-specific true color imagery [80] seem to consistently coincide with low to mid-density areas of BS intermixed with trees.
Of these examples, Kampala, Uganda appeared to have the greatest magnitude of "false positives" while the Visp and Brig area of Switzerland appeared to have the largest relative number of "false positives" to RS-based observed transitions ( Figure 6). This prompted us to investigate these areas more with time specific true-color imagery [80]. Looking at time-specific true color imagery in an area of west Kampala, Uganda, we overlaid the observed (ESA) and predicted (BSGMe) extents at 2015 (Figure 7). We see that all extents are missing areas of BS, with ESA RS-derived extents missing the more fragmented and less densely settled areas (Figure 7   To begin to approach estimating how much this overestimation of false positives might be, we decided to compare an area of what appeared to be extreme over prediction by the2015 BSGMe extents relative to the 2015 ESA RS-derived extents, around Visp and Brig, Switzerland, and validate both by using the corresponding manually delineated building footprints (OpenStreetMap Contributers, 2019). By 2015, the ESA RS-derived data said there was 1,477 pixels of BS while the BSGMe-derived extents predicted 2,557 pixels of BS ( Figure 8). When we compared these extents OSM building footprint data, corresponding to those present in 2015 and with near 100% coverage, across 11,966 3 arc-second pixels in the validation area, we showed that many of the areas are, in fact, not false positives (Figure 8). In fact, the observed ESA data only has a recall of 41.1% compared to the BSGMe performance of 57.9%, but the ESA extents do retain the highest precision of 84.1% (Figure 8). Considering both recall and precision simultaneously, we see that the BSGMe extents have a F1 score of 0.625 which represents approximately a 12% increase in F1 score to the ESA data (0.552) garnered by a 50% increase in recall, but at the expense of a 20% decrease in precision (Figure 8).

Discussion
We have shown that the BSGMe projects BS extents into the near future with, in many cases, large agreement with the input dataset's withheld observations for predicted years (Figures 3-5). Beyond this, we found support that the validation of the BSGMe predictions, relative to the ESA RS-based observations, could be underestimating the true accuracy (Figures 7 and 8). We displayed this visually for a large proportion of Kampala, Uganda, and other example areas (Figure 7). Further, we quantified it by comparing against manually delineated building footprints for smaller settlements of the Visp and Brig area in Switzerland, showing the BSGMe having a 40 percent increase in recall and a 12 percent increase in F1 score relative to the ESA RS-based data ( Figure 8).
Overall, there are inherent limits to the BSGMe approach. The framework is sensitive to the size and configuration of the subnational units used, per the Modifiable Areal Unit Problem (MAUP) [81]. We would expect that less certainty in the spatial allocation would accompany larger unit area, but the effect of unit size on demand quantification is less clear; although Nieves et al. [16] found that smaller unit size was associated with higher overall unit interpolative accuracy. Additionally, we believe that the framework would be highly sensitive to the input projected population data, yet this characteristic could have potential utility for exploring deterministic outcomes of various input urban population projection scenarios. To further clarify, while here we utilized the Stevens et al. (2015) method for producing unit level estimates of BS population, any unit-level estimates of BS population and BS population density can be provided to the BSGMe modelling framework.
Given the dasymetric nature of the BSGMe framework, measures of uncertainty that would otherwise be generated by the RF, ARIMA, ETS, and GLM models within the framework cannot be propagated to the end predicted BS extents [54,60,61,82]. This uncertainty propagation limit is similar to and was noted in the interpolative settlement modelling framework of Nieves et al. [16]. However, in general, it would be expected that the accuracy of BSGMe extrapolative predictions would have a positive relationship with any errors associated with its input datasets. For instance, if the user-selected representation of BS or estimates of BS population were relatively inaccurate, it would be logical to suppose that the framework would be tasked with sorting out noisier relationships between relative population change and BS extent expansion, and likely have poorer framework performance. In light of the framework relationships to input data error/uncertainty and the limits of propagating and quantifying this uncertainty, it is recommended that any user of this framework compare modelled outputs to the input data layers as well as the uncertainty metrics of the individual framework modelling components, which are recorded in tabular format by the framework code (see code in linked data repository https://data.mendeley.com/datasets/cm6bnzvzfj/1).
Due to persistence, the future BS projections with the highest agreement was the naïve model ( Figure 3). Ignoring the actual ground truth, the model comparisons by metrics without potential end-user context are an oversimplification, with metrics like accuracy and F1 score treating a false-positive disagreement equally bad as a false-negative disagreement. It is more useful to interpret the results with a user's defined loss function in mind [83]. Should the user want to have few disagreements of any type, then the naïve model extents would be logical. However, if the user-defined cost of missing new BS extents would be a greater loss than the alternative cost of additional false-positives, the user would likely avoid the naïve approach in favor of one of the BSGMe predicted extents. Combined with the fact the false-positives of the BSGMe validations are likely inflated (Figures 3-8), it is likely that the difference in precision performance from that of the naïve model is smaller than presented here.
It is important to note that these validation findings are specific to the input ESA RS-derived extents data and the spatial scale of the input representation of BS (originally 10 arc second and then resampled to 3 arc seconds). More generally, the model framework presented here can accept any binary input of urban/built-environment/built-settlement. Although, given the framework's strong reliance on relative changes in population being indicative of relative changes in urban/built-environment/built-settlement, a functional definition of urban/built-environment/built-settlement that corresponds with aspects of the built-environment more likely to be spatially coincident with populations would be most appropriate. Whether our assumptions of population being usable as a proxy for the underlying drivers of BS expansion holds at other spatial scales of BS representation, e.g. 30m Landsat-derived, 12.5m radar-derived, or 500m Moderate Resolution Imaging Spectroradiometer (MODIS)-derived, remains unclear. Supplemental findings for a city-based area, from Nieves et al. [16], observed decreased interpolative agreement when applied to a 1 arc second radar-optical dataset, rescaled to 3 arc seconds. Theoretically, we would expect individual agency, local planning conditions, micro-economic level decision-making, and other "intangibles" from a country, to a global-extent application standpoint, to have a much larger role in the siting of BS at the average individual building scale (~10m-30m). However, most of this type of data, if it exists, remains unavailable across large extents and across time when working in low-to middle-income contexts.
The utilization of land cover to estimate a continuous population surface, using that population surface to estimate BS population, aggregate the BS population to the unit level to then estimate non-BS-to-BS transition demand at the unit level naturally raises a concern of circular reasoning or endogeneity. From a modelling perspective, the larger more important question is, "Why is the model being developed and what questions does it attempt to address?". Our purpose here was to develop a modelling framework capable of accurately predicting near future built-settlement expansion and to answer the question of whether this could be done by looking at subnational changes in population counts corresponding to BS areas. With this in mind, we do not believe circular reasoning, or "endogeneity", is a significant issue for the following reasons. First, our modelling framework is not an explanatory model [84] and endogeneity is, by definition, an issue of causality. Our framework falls somewhere between predictive and descriptive in nature [85] and makes no attempts at statistical inference of causation in any of its components; even the random forest is algorithmic in nature [84,86]. We were interested in utilizing the correlations in our framework to create the best predictions possible, not to infer anything on the causal linkages. Secondly, there is precedent for using the hierarchical structure of population data in this manner; other model frameworks have used changes in population, at a spatial scales coarser than the scale of prediction, to quantify demand for urban area expansion [16,[26][27][28][29]77,87], with one even using pixel level population to drive pixel level transitions [88]. Further, Angel et al. [3] also used geospatial and remotely sensed data to determine estimates of "urban" population and population densities that were subsequently aggregated and then used to predict future urban areas. However, this does raise the issue of fitness for purpose, similar to the discussion in Leyk et al. [11], where end users interested in causal questions and wishing to utilize datasets produced with certain covariates should assess how it was created to avoid the issue of endogeneity.
As expected, we observed that as the time from the last observation increased, the BSGMe projection decreased in agreement with the withheld ESA RS-derived validation extents. This positive association between time from last observation and projection agreement/accuracy is inherent to extrapolative models but could likely be reduced by using longer input time series, should data allow. While the automatic fitting procedure for ARIMA and ETS class models has been shown to have consistently good performance in the short-term (5-6 time steps) [64], this is predicated upon substantially longer time series (20 to 144 observations in the cited M3 competition series data [89]) than are typical with current BS or urban based population datasets at subnational unit level and with large or global extent. Due to the growing uncertainty that accompanies longer projections, we do not recommend extending this framework past the short-term without longer input time series and without further assessment. Another reason we do not currently recommend using the framework for longer term predictions is the lack of including other causal aspects of non-BS -to-BS transition, e.g. economic and planning/zoning information. We excluded such data from the framework because it is typically not available globally, for multiple time points, and at subnational resolutions.
We save unit-and year-specific 95% confidence intervals produced, via bootstrapping, by the ARIMA and ETS models [64], but we did not produce similar intervals for the GLM models (see linked data repository https://data.mendeley.com/datasets/cm6bnzvzfj/1). This was because we were only utilizing the GLMs to capture the general linear trend and not inferring the true value bounds, due to an inability check for the necessary corresponding inferential assumptions for every subnational unit in an automatic, efficient, and robust manner.

Conclusions
Here, we have shown the BSGMe model framework to be flexible and automatable across several environmental, urban morphological and input-data quality contexts while maintaining acceptable agreement with validation data and even surpassing the performance of the input dataset's withheld observations when compared to manually interpreted conditions in time-specific true-color imagery. This framework is novel in that it is globally applicable, with no need for user or expert input parameters, and relies largely on relative changes in subnational population to determine the timing and magnitude of changes. While validated across four countries, this framework is scalable to producing global extents across different periods and with different input BS and population datasets. Proof in point, the WorldPop Programme (www.worldpop.org) adopted this modelling framework to produce global annual BS extents at 100m resolution from 2015 through 2020, using input time-series from 2000-2014 based upon observed and BSGMi interpolated extents (https://doi.org/10.5258/SOTON/WP00649) derived from Global Human Settlement Layer, ESA urban land cover class, and Global Urban Footprint [34].
Being able to produce annual datasets of near future BS extents, and the intermediate BS populations, have a variety of end user applications where investigating potential impacts of BS population changes and BS spatial expansion can have impact, such as public health, sustainability, planning and infrastructure, and transportation management. However, as seen in this study, users should utilize auxiliary data in conjunction with their expert and or local knowledge of the application/study area to assess whether the modelled extents are suitable for their applications and needs. Additionally, this framework and its open-source code can be used as a platform for further investigating deterministic relationships between population, population densities, and BS expansion.
The extent predictions of the BSGMe framework can also be utilized, in a setup similar to this study, by producers of future BS and urban feature data sets to re-investigate areas of disagreement between the BSGMe and their extraction algorithm, knowing that there is a heightened probability of BS being truly present (Figures 7 and 8).
As the temporal resolution of global BS and urban feature data sets catch up to their high spatial resolution, further investigations of this framework will become more accessible and feasible as well as have reduced uncertainty in their conclusions. However, as evidenced in this study, there is a continued need for an independent multi-temporal data set of urban features with global extent that can be used for training and or validation. While OSM offers global extent, it has its own biases in completeness [50,51] and, more significantly, lacks any temporal attributes. One potential solution would be for the producers of urban feature data sets to make their manually identified training and validation points, footprints, and sample grid cells publicly available, e.g. by some research collaboration akin to POPGRID Data Collaborative (www.popgrid.org) with agreed upon documentation, data attribute, and definitional standards. Until such a time, large scale ground truthing, much less temporal ground truthing, of BS or urban features will likely be limited and often surpass the resources of many studies with large or global extent.
Future work should investigate the robustness of this framework with different spatial scale representations of BS as inputs and differing lengths of input time-series. Additional experimentation with the demand forecasting methods is also a large area that remains to be explored. Further validation of more areas should also be prioritized, particularly in areas where urban feature datasets are known to have extraction issues, e.g. arid regions, in order to understand how such error may propagate through this framework into the resulting extents. Other desirable work would involve examination of the applied utility of the BS outputs produced by both the interpolative and extrapolative BSGM frameworks. Acknowledgments: Many of the spatial covariates (doi:10.5258/SOTON/WP00644) used here are the product of the "Global High Resolution Population Denominators Project" funded by the Bill and Melinda Gates Foundation (OPP1134076). The authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associated support services at the University of Southampton, in the completion of this work.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript, or in the decision to publish the results. Appendix A Figure A1. Full process diagram for the Built-Settlement Growth Model-extrapolation (BSGMe) as broken down into the "Demand Quantification" procedure and the "Spatial Allocation Procedure". For details on the "Spatial Transition Disaggregation Procedure", readers are referred to Nieves et al. [16]. For details on the "Subnational Temporal Model Fitting and Prediction Procedure", readers are referred to Appendix A, Figure A2.