Satellite-Based Human Settlement Datasets Inadequately Detect Refugee Settlements: A Critical Assessment at Thirty Refugee Settlements in Uganda

: Satellite-based broad-scale (i.e., global and continental) human settlement data are essential for diverse applications spanning climate hazard mitigation, sustainable development monitoring, spatial epidemiology and demographic modeling. Many human settlement products report exceptional detection accuracies above 85%, but there is a substantial blind spot in that product validation typically focuses on large urban areas and excludes rural, small-scale settlements that are home to 3.4 billion people around the world. In this study, we make use of a data-rich sample of 30 refugee settlements in Uganda to assess the small-scale settlement detection by four human settlement products, namely, Geo-Referenced Infrastructure and Demographic Data for Development settlement extent data (GRID3-SE), Global Human Settlements Built-Up Sentinel-2 (GHS-BUILT-S2), High Resolution Settlement Layer (HRSL) and World Settlement Footprint (WSF). We measured each product’s areal coverage within refugee settlement boundaries, assessed detection of 317,416 building footprints and examined spatial agreement among products. For settlements established before 2016, products had low median probability of detection and F1-score of 0.26 and 0.24, respectively, a high median false alarm rate of 0.59 and tended to only agree in regions with the highest building density. Individually, GRID3-SE offered more than ﬁve-fold the coverage of other products, GHS-BUILT-S2 underestimated the building footprint area by a median 50% and HRSL slightly underestimated the footprint area by a median 7%, while WSF entirely overlooked 8 of the 30 study refugee settlements. The variable rates of coverage and detection partly result from GRID3-SE and HRSL being based on much higher resolution imagery, compared to GHS-BUILT-S2 and WSF. Earlier established settlements were generally better detected than recently established settlements, showing that the timing of satellite image acquisition with respect to refugee settlement establishment also inﬂuenced detection results. Nonetheless, settlements established in the 1960s and 1980s were inconsistently detected by settlement products. These ﬁndings show that human settlement products have far to go in capturing small-scale refugee settlements and would beneﬁt from incorporating refugee settlements in training and validating human settlement detection approaches.

Much human settlement mapping research to date has focused on improving measures of urbanization [7,8,22,23], but there has been very little formal examination of the inclusion of small-scale settlements in broad-scale human settlement products (with exceptions [24,25]). As a result, it remains unclear whether rural, small-scale settlements that are home to approximately 3.4 billion people, including over half of Africa's population [26], are included in satellite-based human settlement datasets.
Refugee settlements managed by the United Nations High Commissioner for Refugees (UNHCR) offer an ideal test case for gauging small-scale settlement detection by satellite-based human settlement datasets. Refugee settlements-sometimes referred to as "camps"-span 132 countries and are home to approximately one-third of the 21 million global refugee population as of late 2020 who have been forcibly displaced across national borders due to violence, persecution, or intimidation [27]; the remaining two-thirds of the global refugee population reside in urban areas outside of UNHCR-managed settlements. The average stay in a UNHCR-managed refugee settlement was 10 years as of 2015 [28], though many settlements become intergenerational homes through a "protracted refugee scenario", in which refugee populations greater than twenty-five thousand people are displaced for more than five years [29]. The location, setting and timing of refugee settlements are broadly documented with openly accessible geospatial data on building footprints, settlement boundaries, as well as ancillary information on settlement population, refugee arrival and duration of habitation, much of which has been collected in support of rapid or prolonged humanitarian support [30].
The diversity of spatio-temporal data on refugee settlements is beneficial for a targeted assessment of their inclusion in broad-scale human settlement products, yet there has never been a formal assessment of the detection of refugee settlements. There are, by contrast, many analyses of individual refugee settlements using high or moderate resolution (e.g., Sentinel and Landsat) satellite imagery or derived products to estimate settlement area [31,32], enumerate dwellings [33][34][35][36][37], model refugee populations [38], guide the delivery of aid and relief [39,40], assess environmental conditions [41][42][43][44], map land cover/use change [45][46][47] and quantify economic development [48].
Several characteristics of refugee settlements likely challenge systematic detection of refugee settlements. First, many refugee settlements consist of small-scale dwellings and structures that are diffusely distributed and interspersed with vegetation or bare earth [41,46,49], leading to mixed pixels in moderate resolution satellite imagery (i.e., Sentinel-1/2 and Landsat). Second, building materials in refugee settlements may include plastic tarp, wood fiber, thatching and mud, which have distinct spectral signatures from typical building materials used outside of humanitarian settings [50] and may also offer less spectral separability [51] from their immediate surroundings. Third, refugee settlements tend to be rapidly built and settled, sometimes only over the course of weeks to months, but may also intermittently grow in size or number of dwellings with new refugee arrivals over time.
The goal of this study is to assess how well broad-scale satellite image-based human settlement products capture individual refugee settlements. With a case study of 30 UN-HCR refugee settlements in Uganda that were settled between 1960 and 2018, we use four human settlement datasets-the Global Human Settlement Layer from Sentinel-2 (GHS-BUILT-S2), the World Settlement Footprint (WSF), the High Resolution Settlement Layer (HRSL) and Geo-Referenced Infrastructure and Demographic Data for Development settlement extent data (GRID3-SE)-as well as two building footprint datasets (OpenStreetMap and Microsoft) and georeferenced refugee settlement boundary data. We had three specific objectives: (1) measure the areal coverage of satellite-derived human settlement products within refugee settlement boundaries; (2) measure the detection of building footprints within refugee settlements; (3) assess multi-product agreement (i.e., regions of mutual detection) among settlement datasets. For each objective, we summarize results across the 30 refugee settlements and offer qualitative and quantitative assessments of factors potentially affecting refugee settlement coverage and detection, such as the spatial resolution and timing of satellite imagery used to generate the settlement products. This study's findings help bridge the divide between humanitarian and human settlement monitoring efforts and contribute to an ongoing discussion of achieving a more inclusive representation of refugees in geospatial Big Data (e.g., [52,53]).

Study Area
As of writing, Uganda has the third largest refugee population in the world with 1.4 million refugees (nearly 3% of Uganda's total population) under UNHCR protection [54]. Uganda's earliest refugees settled in the 1960s, but the population increased almost ten-fold between 2012 and 2017 with the arrival of refugees from South Sudan, the Democratic Republic of the Congo (DRC) and Burundi ( Figure 1). Given that many of the same conflicts that displaced refugees in the first place have persisted, many refugees do not intend to return to their home country and are likely to remain in Uganda for years to come [55]. Approximately 94% of refugees live in 30 UNHCR-managed settlements in the Northern and Western Regions of Uganda ( Figure 2) and the remaining 6% of refugees live in the capital city of Kampala [56]. All study refugee settlements have been persistently inhabited after their establishment and none have been closed or abandoned. The abundance and persistence of refugee settlements, the diversity of settlement morphologies and density and the availability of settlement boundary and building footprint data, described below, make Uganda an ideal case study to evaluate the detection of refugee settlements by broad-scale human settlement products. The 30 study refugee settlements have a median boundary area of 4 km 2 and range from 0.2 km 2 (Mireyi) to 790 km 2 (Bidi Bidi). Refugee settlements tend to grow rapidly after establishment with the construction of dwellings and infrastructure to accommodate incoming refugees. Refugee settlements have a median population of 16,782 people (mean, 47,619; standard deviation, 58,093) as of September 2020 (Table 1). Settlement layouts adhere to predefined UNHCR settlement planning protocols [57], but are uniquely designed to fit site-specific considerations and vary from a grid-like organization of dwellings and other structures delineated by roadways to clustered agglomerations of dwellings. Settlements are broadly self-contained with housing, water and latrine infrastructure (WASH), markets, financial services and educational and healthcare facilities on site, as well as buildings for refugee response (i.e., processing and registering arriving refugees) and administration (i.e., settlement management, aid provision, inter-sector coordination, etc.). Each family is allocated a 30 m by 30 m plot for housing and agricultural cultivation and individual dwellings tend to be small (median area of 25 m 2 ). Typical materials used for dwellings at the time of refugee arrival are plastic tents, tarps, or grass thatching, which may be replaced with more durable building materials over time such as brick or tin roofing [43].

UNHCR Refugee Settlement Boundary Data
Refugee settlement boundaries ( Figure 3) demarcate land for settlement planning and management purposes and are established before or during refugee arrival by the UNHCR in agreement with the Government of Uganda. The boundary does not represent an absolute barrier to refugee settlement land use, as structures, infrastructure and agricultural plots are commonly established alongside or beyond the boundary. It is also common for refugee populations to only occupy a portion of the area demarcated by the settlement boundary. Several of the larger settlements (Bidi Bidi, Rhino Camp and Imvepi) include sub-settlement boundaries (i.e., zones, villages, blocks) that delineate regions of refugee dwellings from those of non-refugee populations who were in place before the settlement was established [57]. UNHCR refugee settlement boundaries were published in 2020 and are available at https://data2.unhcr.org/en/documents/details/74116 (accessed on 1 December 2020).

Human Settlement Datasets
We examined four satellite-derived broad-scale human settlement products at the 30 study refugee settlements. The Global Human Settlement Layer with Sentinel-2 (GHS-BUILT-S2; [2]) offers 10-m resolution global-scale coverage of probabilistic settlement presence based on a convolutional neural net (CNN) classification of Sentinel-2 Level 1C imagery from January 2017 to December 2018. GHS-BUILT-S2 was publicly released in November 2020. We initially examined settlement detection using >1% and >50% probability thresholds but adopted recommendations from [2] to use a 20% probability threshold for settlement detection in rural, low-density settlement regions. GHS-BUILT-S2 has a nominal accuracy of 85% when using a detection probability threshold of greater than 20% in Africa, but there is no explicit mention of detection accuracy for refugee settlements. GHS-BUILT-S2 is available at https://ghsl.jrc.ec.europa.eu/ghs_bu_s2_2018.php (accessed on 1 December 2020).
The World Settlement Footprint (WSF; [3]) is a 10-m resolution global-scale binary human settlement dataset based on multi-temporal Sentinel-1 Synthetic Aperture Radar (SAR, 10 m) and Landsat 8 (30 m) imagery from 2014-2015 classified using a support vector machine (SVM). WSF offers global-scale coverage with a nominal settlement detection accuracy of 86% and was publicly released in July 2020. WSF producers note that smaller structures made with building materials commonly found in refugee settlements (e.g., mud bricks and straw) were not consistently detected with WSF, but there is no quantitative assessment available to support this observation. WSF is available at https: //figshare.com/articles/dataset/World_Settlement_Footprint_WSF_2015/10048412 (accessed on 1 December 2020).
The High Resolution Settlement Layer (HRSL; [4]) is a 30-m resolution population and settlement dataset with coverage across 140 countries and was released in December 2017. HRSL's binary settlement data are based on a CNN classification of buildings using 50 cm resolution Maxar imagery from 2011-2015. HRSL has an average precision and recall of 95% and 91%, respectively, and 99% and 93% for Uganda, respectively. There is no mention of refugee settlements being considered in the development of the settlement detection approach or accuracy assessment. HRSL data are available at https: //www.ciesin.columbia.edu/data/hrsl/ (accessed on 1 December 2020).
The Geo-Referenced Infrastructure and Demographic Data for Development settlement extent data (GRID3-SE; [5]) offers wall-to-wall coverage for 51 countries in sub-Saharan Africa and was made publicly available in April 2020. GRID3-SE is a vector (i.e., polygon) product based on Ecopia building footprints derived from 50 cm resolution Maxar imagery. Approximately 77% of imagery input to GRID3-SE for Uganda were acquired between 2016 and 2020, with 23% from 2015 or earlier. Settlement extents are created by processing building footprint centroids to a 3 arc-second raster grid of building densities. Based on the underlying building density, polygons were classified as built-up area (BUA; high density), small settlement area (SSA; moderate density), or hamlet (low density), following the classification approach described in [58]. To capture any indication of settlement presence within GRID3-SE, we merged all three settlement classes and refer to this combined coverage as GRID3-SE henceforth. As with other products, there is no mention of refugee settlements being considered in GRID3-SE's development or validation, but refugee settlements and other settlements with temporary structures are recognized as being difficult to detect. GRID3-SE data for Uganda are available at https://academiccommons.columbia.edu/doi/10.7916/d8-s1yg-pc20 (accessed on 1 December 2020).
In addition to the four products above, we visually assessed coverage of the Landsatbased Global Human Settlement Layer (GHS-BUILT; https://ghsl.jrc.ec.europa.eu/ghs_ bu2019.php, accessed on 1 December 2020), Global Urban Footprint (GUF; [59]) and the Global Artificial Impervious Area (GAIA; [60]) within study settlement boundaries. Since these products had zero or negligible coverage within study settlements, we removed them from further analysis.

Refugee Settlement Building Footprint Data
Two different kinds of building footprint data were used for assessing the areal coverage and detection of the four settlement products. The open-source OpenStreetMap (OSM) building footprint dataset for Uganda includes 304,482 individual building footprints, covers a total area of 7.26 km 2 across the 30 study settlements and was created between 2017 and 2020, according to footprint timestamp data. There is no explicit accuracy assessment for OSM footprint data but their accuracy, completeness and geo-precision are recognized to depend on settlement morphology, building density and rooftop architecture [61]. The OSM footprint dataset is available at https://data.humdata.org/dataset/hotosm_uga_buildings (accessed on 1 December 2020).
We also used building footprints created through the Microsoft (MS) AI for Humanitarian Action program in partnership with the Humanitarian OpenStreetMap Team. This MS dataset spans Uganda and Tanzania and was generated using a deep neural net model trained with 1.2 million labeled buildings and Maxar very-high-resolution satellite imagery collected between 2018 and 2019 [62]. The MS dataset includes 58,544 individual building footprints that span a total area of 2.16 km 2 across 27 study settlements; Ayilo II, Boroli II and Pagirinya lack MS footprint data. The reported precision and recall for the MS footprint dataset are 95% and 62%, respectively, but vary between urban and rural settings [63]. The MS building footprint dataset is available at https://github.com/microsoft/Uganda-Tanzania-Building-Footprints (accessed on 1 December 2020).
We created a fused building footprint product by merging the complementary OSM and MS building footprint datasets ( Figure 4). The OSM-MS fused dataset includes 317,416 individual building footprints across 8.47 km 2 over all 30 refugee settlements; only 0.94 km 2 was recorded in both OSM and HOT footprint datasets. OSM-MS building footprints have a median area of 0.60 km 2 (mean, 2.05; min, 0.04; max, 14.93; std, 3.41) and there are a median 589 OSM-MS building footprints per km 2 within a refugee settlement's UNHCR boundary (mean, 860, min, 68, max, 4213, std, 926). Even with the fused OSM-MS building footprint product, there are notable omission errors (e.g., Figure 4b-d), which we discuss below.
To support a pixel-wise comparison with each settlement product, we prepared a 10-meter resolution raster georeferenced to the WSF product. We rasterized the fused OSM-MS building footprints to a binary settlement/non-settlement raster. In an effort to increase the likelihood of agreement with human settlement products, we considered any pixel intersecting an OSM-MS building footprint as a settlement pixel; any pixel lacking an OSM-MS building footprint was considered a non-settlement pixel. Rasterized building footprint coverage spans from 0.04 to 14.93 km 2 across the 30 study settlements (Table 1) and all mentions below of the OSM-MS building footprint dataset refer to the 10-m resolution raster product rather than the input polygons.

Methods
As stated above, this study's goal is to assess how well broad-scale satellite imagebased human settlement products capture individual refugee settlements and the study objectives are to measure the areal coverage and detection of refugee settlements and assess agreement among human settlement products ( Figure 5). To support intercomparison among settlement products and building footprint, OSM-MS data, HRSL and GHS-BUILT-S2 were resampled and georeferenced to the 10-m resolution WSF product and GRID3-SE vector data were similarly rasterized at 10-m resolution and georeferenced to WSF. More difficult to address was the fact that settlement products and building footprint datasets were based on satellite imagery acquired at different times and in some cases before refugee settlements had been established ( Figure 6). WSF and HRSL are predominantly based on imagery from before 2016 that are unlikely to be relevant for detecting settlements established in 2016 or later. Meanwhile, GRID3-SE, GHS-BUILT-S2, OSM and MS were predominantly based on imagery acquired in 2016 or later, after the majority of study refugee settlements had been established.
To help account for the different periods of satellite imagery used to create the settlement products and building footprint datasets, we report areal coverage, detection rates and agreement results for pre-2016 refugee settlements (i.e., settlements established before 2016) separately from post-2016 refugee settlements (i.e., settlements established in 2016 or later). We acknowledge that changes in refugee settlement design or construction at or around 2016 could contribute a bias in the comparison of pre-and post-2016 results; however, we found no indication of changes to refugee settlement composition at or around 2016.  Note that we include the comparison of GRID3-SE to OSM-MS building footprint data even though GRID3-SE is meant to represent settlement extents that are inclusive of buildings. While we expect GRID3-SE coverage to exceed building footprint coverage, quantifying the differences in coverage (Objective 1) and detection (Objective 2) provides relevant insight into the utility of GRID3-SE for refugee settlement mapping.

Objective 1: Measure Areal Coverage within Refugee Settlements
To measure potential under-or over-representation of refugee settlement area, we measured the areal coverage of each human settlement product at study refugee settlements at two scales. First, we measured the ratio of settlement product area to the UNHCR refugee settlement boundary area, which, as mentioned above, may include regions of unpopulated land, as well as non-refugee population settlements that pre-dated the establishment of the refugee settlement. Second, we compared each human settlement product's area of coverage to the total area of OSM-MS building footprints within each settlement's UNHCR boundary.

Objective 2: Measure Detection of Building Footprints within Refugee Settlements
We measured the detection of refugee settlements by human settlement products through comparison with the 10-meter resolution OSM-Microsoft building footprint data. We measured the total area of "hit" (true positive, TP), "false alarm" (false positive, FP), "miss" (false negative, FN) and "none" (true negative, TN) pixels for each product within each settlement's UNHCR boundary ( Table 2). With these values in hand, we calculated commonly used detection metrics, namely, probability of detection (POD, also known as recall), critical success index (CSI; [64]), F1-score (F1) and false alarm rate (FAR, also known as the false positive ratio) ( Table 3). POD indicates the proportion of building footprint area represented in a settlement's product coverage. CSI accounts for potential over-and underdetection of building footprint area by the settlement products. F1 provides additional sensitivity to detection of sparse features. Lastly, FAR indicates over-detection of building footprint area. All metrics range from 0 to 1.

Objective 3: Assess Agreement among Settlement Products
We next examined the agreement among GHS-BUILT S2, WSF, HRSL and GRID3-SE in their detection of pre-and post-2016 refugee settlements. We made a multi-product agreement raster at 10-m spatial resolution that represented the number of products with coverage per pixel within each settlement boundary. We measured the area detected by a single human settlement product (i.e., unique detection without agreement from another human settlement product), as well as the area detected by two, three, or four products (i.e., common detection by all human settlement products) within boundaries and over OSM-MS building footprints. We also measured the respective contributions of each human settlement product to each level of agreement (e.g., 1-4) to identify product-level differences in unique or multi-product agreements. To gauge how detection changes with increased agreement, we calculated detection metrics from Objective 2 for the multiproduct agreement raster relative to the OSM-MS building footprint dataset. Since the values in the multi-product agreement raster are mutually exclusive over space, we could not independently assess each value's detection fairly without being penalized for the footprints detected by other values in the multi-product agreement raster. We therefore measured detection metrics cumulatively by including single (unique) coverage in the assessment of the two-product agreement coverage, including one, two and three-product coverage in the assessment of the three-product agreement coverage, etc.

Objective 1: Measure Areal Coverage within Refugee Settlements
We found large differences in coverage among the four human settlement products within UNHCR refugee settlement boundaries (Figure 7). GHS-BUILT-S2, HRSL and GRID3-SE had coverage across all 30 settlements, but WSF did not offer a single pixel at eight different refugee settlements that were established between 1992 and 2016. GRID3-SE offered more than five-fold the coverage of other products within UNHCR settlement boundaries and detected a median 65% (min, 9%; max, 100%; std dev, 26%) of the settlement boundary area for pre-2016 settlements and a median 28% (min, 11%; max, 72%; std dev, 18%) of the boundary area for post-2016 settlements. Note that contiguous GRID3-SE coverage often went beyond the UNHCR settlement boundary but only the coverage within the boundary is considered here. HRSL had the second largest median coverage with 13% (min, 1%; max, 49%; std dev, 13%) of the UNHCR boundary area for pre-2016 settlement and 1% (min, 0%; max, 26%; std dev, 8%) of post-2016 settlements. The median coverages of GHS-BUILT-S2 and WSF did not exceed 2% for pre-or post-2016 settlements and the maximum areas covered by GHS-BUILT-S2 and WSF across all settlements were only 12% (i.e., Boroli I, established 2014) and 25% (i.e., Olua II, established 2012), respectively. The non-zero coverage of post-2016 settlements by HRSL and WSF, which do not use post-2016 source imagery, most likely resulted from detection of non-refugee settlements that predated the refugee settlement establishment. We next measured human settlement product coverage compared to OSM-MS building footprint coverage within each settlement's boundary. We found that GHS-BUILT-S2 and WSF underestimated the OSM-MS building footprint area by a median 0.30 km 2 (min, 0.53 km 2 overestimate; max, 11.80 km 2 ; std dev, 2.50 km 2 ) and 0.60 km 2 (min, 0.61 km 2 overestimate; max, 10.86 km 2 ; std dev, 2.49 km 2 ), respectively (Figure 8). HRSL offered a closer approximation of building footprint area with a slight median underestimation of 0.04 km 2 (min, 0.68 km 2 ; max, 4.01 km 2 ; std dev, 1.21 km 2 ), while GRID3-SE overestimated the OSM-MS footprint area by a median 1.69 km 2 (min, 129.09 km 2 ; max, 0.07 km 2 ; std dev, 32.95 km 2 ); the overestimation by GRID3-SE was expected given that GRID3-SE is meant to represent a settlement extent inclusive of building footprints. Thus, GRID3-SE covered more of the settlement boundary area, but HRSL offered the best agreement with building footprint coverage. Pre-2016 settlements had lower RMSE than post-2016 settlements across all products, a consequence of pre-2016 settlements predating most of the satellite imagery used by the settlement products.

Objective 2: Measure Detection of Building Footprints within Refugee Settlements
We found that the four settlement products only partially and inconsistently detected building footprints within study refugee settlements. For pre-2016 settlements, the settlement products offered a median probability of detection (POD) of 0.26, critical success index (CSI) of 0.13, F1-score (F1) of 0.26 and false alarm rate (FAR) of 0.59. Settlement products fared worse for post-2016 settlements, with a median POD of 0.17, CSI of 0.09, F1 of 0.16 and FAR of 0.73. Detection rates were expected to be higher for pre-2016 settlements given their establishment prior to source image acquisition; yet, even pre-2016 detection rates were far below each settlement product's nominal detection accuracy, which ranges from 85 to 99%. Of individual products (Figure 9), GRID3-SE had the overall highest detection of pre-2016 refugee settlements with a median POD of 0.97. GRID3-SE had rather low median CSI of 0.21 and F1 of 0.38 with a high median FAR of 0.75, but these metrics were affected by "false alarm" sites in between and surrounding building footprints. HRSL offered the second highest detection across all metrics and was only marginally below GRID3-SE for all metrics other than POD. GHS-BUILT-S2 and WSF tended to have the lowest detection rates at less than 0.10 for all metrics other than F1. Detection varied substantially across settlements (see Appendix A Figure A1), but revealed a general under-detection of settlement footprint area, as depicted by the OSM-MS dataset, by all products except for GRID3-SE, as well as an over-detection (i.e., "false alarms") of settlement area by all products. Since human settlement products and building footprint datasets were based on satellite imagery collected at different periods ( Figure 6), it is distinctly valuable to consider the detection rates of the earliest settlements that were broadly stable throughout the study period. Mireyi, for example, is a small refugee settlement demarcated by a 0.20 km 2 boundary with 0.11 km 2 of building footprint coverage. Mireyi was established in 1994, well before settlement product satellite imagery were collected; yet, the four products offered disparate and sometimes limited detection of the settlement (Figure 10). GRID3-SE had complete coverage of all OSM-MS building footprints (POD = 1.0), while HRSL tended to only capture the central region of the settlement (POD = 0.51) and overlooked settled regions near the boundary (FAR = 0.40). WSF and GHS-BUILT-S2 offered progressively less coverage but managed to capture some settled areas that were not depicted in HRSL. Considering the oldest settlements of Kyangwali (1960) and Oruchinga (1961) (see Appendix A Figure A2) that were established 50 years before the earliest satellite imagery used in HRSL, products still tended to offer only partial detection. Moving several decades later to the next established settlements of Rhino Camp and Elema, established in 1980 and 1992, respectively, HRSL excelled in all detection metrics other than POD, which GRID3-SE once again led. These are exceptions, however, since settlements established in the mid-1990s through the most recently established settlements (such as Boroli I and Ayilo I, below) were best detected by GRID3-SE across detection metrics.

Objective 3: Assess Agreement among Settlement Products
Given the occasionally overlapping coverage among the four human settlement products, we next examined the areal coverage and detection of multi-product agreement. For pre-2016 settlements, we found that nearly half (45%) of the median coverage within a settlement boundary was unique to a single settlement product. This prevalence of unique detection and the rapidly decreasing coverage of two-(17%), three-(3%) and four-(<1%) product agreement ( Figure 11) suggest high disagreement among the four settlement products. A similar trend in declining coverage with greater levels of agreement was measured in post-2016 settlements and points to an overall low agreement among products in their detection of refugee settlements. GRID3-SE was almost entirely responsible for unique detection within refugee settlements by providing 99% of single product coverage (Table 4). GRID3-SE was also present at all locations of detection by other products; in effect, there was not a single pixel detected by GHSL-BUILT-S2, HRSL, or WSF that was not also detected by GRID3-SE. After GRID3-SE, HRSL contributed zero unique detection but offered the second most coverage for sites detected by two or three products, while GHSL-BUILT-S2 and WSF offered zero unique detection and contributed the least to twoor three-product agreement coverage.
A total of 21 of the 30 study settlements had some coverage by all four products, 8 settlements had coverage by three settlement products and a single settlement (Agojo) only had coverage by two products (Appendix A Figure A3). The majority of multi-product agreement occurred in built-up regions with densely arranged, large and highly reflective structures ( Figure 12). The multi-product agreement is likely associated with refugee settlement administration or UNHCR coordination, while individual, small-scale refugee dwellings were rarely detected by products other than GRID3-SE. The absence of agreement across much of the largest refugee settlements (e.g., Bidi Bidi, Imvepi and Kiryandongo) reflects the openness and broad absence of structures. Multi-product agreement outside of settlement boundaries was not uncommon (e.g., Boroli II, Elema, Mungula I, etc.) and may be associated with dwelling spillover (e.g., Maaji I), market structure construction (e.g., Pagirinya) and, less commonly, nearby non-refugee settlements (e.g., Lobule). The prevalence of multi-product agreement outside the UNHCR settlement boundaries suggests that the space directly influenced by refugees and settlement-level land use is not always encapsulated by refugee settlement boundaries, a finding echoed in [46].  We found that increasing agreement among settlement products also improved POD, CSI and F1 and decreased FAR (Figure 13; Appendix A Figure A4). The largest gain in detection occurred with the transition from unique detection by a single product (i.e., GRID3-SE, as explained above) to agreement by two products (usually HRSL and GRID3-SE) with little subsequent improvement in detection with the transition from two-product agreement to three-or four-product agreement. POD showed the greatest overall increase from unique detection by a single product to multi-product agreement-likely because it overlooked "false alarm" sites-while CSI, F1 and FAR were much more constrained in their incremental changes between agreement levels. Detection of pre-2016 settlements benefited more from increasing agreement than post-2016 settlements, which was expected, given the overall higher detection rates at pre-2016 settlements. Figure 12. Maps of agreement among four settlement products at (a) Mireyi, (b) Boroli I and (c) Ayilo I: "1 product" indicates unique coverage by a single product; "2 products", "3 products" and "4 products" indicate the number of settlement products that share coverage; "None" indicates zero coverage. Note that the spatial scale of settlement maps varies. Figure 13. Violin plots of detection metrics, probability of detection (POD), critical success index (CSI), F1-score (F1) and false alarm rate (FAR) for multi-product agreement sites in comparison to OSM-MS building footprint coverage in settlements established before and in or after 2016: "1 product" indicates unique coverage by a single product; "2 products", "3 products" and "4 products" indicate the number of settlement products that share coverage. Black horizontal lines in violin plots represent individual settlement observations. POD, probability of detection; CSI, critical success index; F1, F1-score; FAR, false alarm rate.

Discussion
This study is the first to systematically examine how well refugee settlements are captured by leading satellite data-based human settlement products, GHS-BUILT-S2, WSF, HRSL and GRID3-SE. Overall, we found generally low coverage within refugee settlements, which resulted in low detection rates and high FAR for products often in excess of 0.50. GRID3-SE tended to provide the most coverage within a settlement's boundary with consistently high rates of building footprint detection (POD, CSI and F1). HRSL captured far less area than GRID3-SE but better approximated building footprint coverage within refugee settlement boundaries and often had comparable CSI and F1 to GRID3-SE. GHS-BUILT-S2 and WSF tended to capture much less area within refugee settlement boundaries, underestimated building footprint coverage and had the lowest detection accuracies, albeit with the lowest FARs. Multiple products found common detection in regions with dense arrangements of buildings or with exceptionally large buildings, with similar results to [6,33], but there were few gains in detection rates when combining more than two different products.
It is likely that several characteristics of refugee settlements, human settlement products and building footprint datasets combined to influence the areal coverage, detection and agreement results. While a building-level examination is beyond the scope of this study, the small size of buildings in study refugee settlements likely poses a central chal-lenge to detection. This is suggested in the higher detection rates of GRID3-SE and HRSL, achieved with 50 cm resolution source imagery that is capable of resolving small buildings. Meanwhile, the 10 m resolution imagery used by GHS-BUILT-S2 and 10-30 m resolution imagery used by WSF increases the likelihood of capturing mixed pixels that may include buildings, as well as surrounding vegetation, soil and, less so, infrastructure. GHS-BUILT-S2 and WSF tended to overlook small buildings typical of family dwellings but did capture larger buildings, which tended to be administrative in function. Settlement morphology can also affect detection rates, as GHS-BUILT-S2 and WSF captured regions of densely clustered buildings that effectively offer a spatially contiguous settlement signal. For many settlements, WSF and GHS-BUILT-S2 exclusively detected structures in densely built-up regions. Such densely arranged buildings contributed to settlement detection rates and underlay the pattern of 3-and 4-product agreement (Figure 12; Appendix A Figure A3). Morphology also influences FAR, since false alarms most often occurred in open, vegetated lands immediately adjacent to buildings rather than infrastructure.
The timing of settlement establishment can also contribute to divergent areal coverage and detection. We found that settlements established before 2016 were better detected than settlements established in 2016 or later, which was mainly a consequence of the pre-2016 acquisition of satellite imagery used to generate the settlement products. However, we found that even the earliest and most populated refugee settlements of Kyangwali (1960), Oruchinga (1961) and Rhino Camp (1980) were poorly detected by the settlement products, despite these settlements being persistently inhabited for decades before the satellite imagery used by the settlement products were collected. It is also possible that seasonal and phenological conditions at the time of image acquisition were not favorable for refugee settlement detection [49,65]; however, the specific dates of imagery used in generating the human settlement product coverage over Uganda here were not available in the metadata of the settlement dataset.
Since new construction of dwellings or other developments may come with the arrival of refugees years after settlement establishment, different settlement products may capture different stages of a given settlement's growth ( Figure 6). Settlement growth could lead to disparate assessments among products. For example, if new dwellings were constructed in a settlement between 2011 and 2017, satellite imagery from 2011 used by HRSL would likely contribute to an undercount of building footprint coverage, compared to imagery from 2017 used by GRID3-SE. Settlement growth could also lead to a widening disagreement among products and building footprint datasets, which are based on imagery from 2017-2020. For example, any buildings constructed after the imagery of a settlement dataset were collected would be recorded as a "miss", in comparison with the building footprint dataset based on later imagery. Conversely, the removal or destruction of buildings would be recorded as a "false alarm"; however, there is no indication of declined building density over time nor were any study refugee settlements closed or decommissioned.
It was impossible to disambiguate the effects of misaligned image dates used by the human settlement products and building footprint datasets from legitimate differences in coverage and detection among products. This was compounded by the lack of ground truth data and potential inaccuracies (i.e., undercounting) in the OSM-MS building footprint data that would bias results. Examining coverage at settlements established before the earliest acquired imagery in 2011 helped address this uncertainty, but having a georeferenced metadata product labeling the specific image dates used in each product would have completely clarified the ambiguous product timing at settlements. While we cannot characterize the effects of refugee settlement growth on study results without additional information on the timing and manner of settlement dynamics, we can have more confidence in findings associated with long-established settlements that are more likely to be stable by 2011.
Similarly difficult to evaluate was the effect of discrepancies between characteristics of buildings within refugee settlements and those used to train and validate human settlement detection approaches. Image classification approaches for remote sensing-derived global settlement products are commonly trained and evaluated on densely populated human settlements [6,24,66]. If a training dataset did not include refugee or other informal settlements that have construction materials and morphologies distinct from cities and towns that are typically used to train and validate a detection approach, it is likely that settlement detection approaches would have struggled to capture study refugee settlements.
Recognizing the compounding effects of refugee settlement composition, layout and change, variable spatial resolution and timing of source satellite imagery across settlement product and building footprint datasets and the vagaries of satellite image processing, classification and validation of human settlement products is essential to identify paths toward improved detection of refugee and other small-scale settlements. There are ample opportunities for future work to build on this study's findings. Foremost, refugee settlement locations and areas should be included in the training and validation of remote sensing-based human settlement product generation and explicit accuracies should be reported; this would help clarify the relevance of broad-scale human settlement products for detection of refugee and other small-scale settlements. While the study was set in Uganda to benefit from the wealth of ancillary data on refugee settlement boundaries and building footprints, it is worth noting that these study refugee settlements may be easier to detect than others with less building density and smaller structures that lack durable roofing materials. Therefore, including additional detail on the size, type and construction material of structures being detected or excluded by settlement products would be helpful to target improvements in further human settlement products. Similarly, characterizing refugee settlement detection in terms of spectral or textural conditions or quantifying the influence of building size or building density on detection success would offer tailored suggestions for improved detection of small-scale settlements. Finally, in order to capture new refugee settlements and not only document the growth of existing settlements, remote sensing-based human settlement datasets need timely updates at least every year and cannot linger as static, outdated snapshots of where people live, refugees or otherwise.

Conclusions
This study presents the first systematic analysis of refugee settlement detection in satellite-based broad-scale human settlement products by comparing coverage, detection and agreement of four human settlement products, namely, GHS-BUILT-S2, WSF, HRSL and GRID3-SE. GRID3-SE offered the greatest coverage within settlement boundaries and led in detection of building footprints alongside HRSL; however, all products struggled to capture refugee settlements regardless of size or age. Such inadequate detection of refugee settlements raises concerns about the utility of human settlement products for documenting locations and extent of small-scale settlements. Formally incorporating refugee settlements in the training and testing of human settlement products, reporting detection accuracy at refugee settlements and providing annual updates to human settlement datasets that keep pace with the establishment and growth of new refugee settlements would add much needed transparency and reliability to future efforts to detect refugee and other small-scale settlements.  Data Availability Statement: OSM (OpenStreetMap) refugee settlement boundary data for Uganda are available at https://data2.unhcr.org/en/documents/details/74116 (accessed on 1 December 2020). GHS-BUILT-S2 data are available at https://ghsl.jrc.ec.europa.eu/ghs_bu_s2_2018.php (accessed on 1 December 2020). WSF data are available at https://figshare.com/articles/dataset/World_ Settlement_Footprint_WSF_2015/10048412 (accessed on 1 December 2020). HRSL data are available at https://www.ciesin.columbia.edu/data/hrsl/ (accessed on 1 December 2020). GRID3-SE data for Uganda are available at https://academiccommons.columbia.edu/doi/10.7916/d8-s1yg-pc20 (accessed on 1 December 2020). OSM footprint data for Uganda are available at https://data.humdata. org/dataset/hotosm_uga_buildings (accessed on 1 December 2020). MS building footprint data are available at https://github.com/microsoft/Uganda-Tanzania-Building-Footprints (accessed on 1 December 2020).

Acknowledgments:
The authors thank Anna Ballasiotes for their contribution to a preliminary assessment of refugee settlement detection. We thank members of the Human Planet Group on Earth Observations community for productive discussions around human settlement mapping in and outside of refugee settings. We acknowledge the Ampinefu ("Mary's River") band of the Kalapuya people, who are the original inhabitants of the land now occupied by Oregon State University.        Figure A3. Maps of agreement among four settlement products at refugee settlements: "1 product" indicates unique coverage by a single product; "2 products", "3 products" and "4 products" indicate the number of settlement products that share coverage; "None" indicates zero coverage. Note that the spatial scale of settlement maps varies. Figure A4. Lollipop plots of probability of detection (POD), critical success index (CSI), F1-score (F1) and false alarm rate (FAR) for multi-product agreement sites in comparison to OSM-MS building footprint coverage across all refugee settlements: "1 product" indicates unique coverage by a single product; "2 products", "3 products" and "4 products" indicate the number of settlement products that share coverage. POD, probability of detection; CSI, critical success index; F1, F1-score; FAR, false alarm rate.