Earth Observation for Settlement Mapping of Amazonian Indigenous Populations to Support SDG7

Indigenous communities in the Amazon suffer from lack of access to basic services, such as electricity. Due to their isolation and difficult access it is challenging to acquire data on their location, numbers and needs, which would enable adequate development plans. Earth observation (EO), in combination with participatory mapping can support the creation of settlement maps as a basis for creating spatially explicit models of needs of basic services. Combining Landsat time series with SkySat and PlanetScope imagery, we have mapped the location and size of these settlements and modelled the number and densities of their houses. Additionally, we have projected settlement growth by 2030 in order to assess a demand of services that will be valid in the near future. We conducted surveys in 49 communities in the Ecuadorian Amazon to acquire information on the peoples’ living conditions and needs, and validated our model based on the findings. The number of buildings per cleared land had a strong linear relationship with the communities surveyed (adjusted R2 0.8). We used this linear relationship to model the number of buildings for the complete study area as well as for the 2030 settlement projection. Combining this information with data on the living conditions of indigenous communities, we can efficiently estimate the needs of basic services for larger territories and prompt development plans according to indigenous peoples’ needs and wishes.


Introduction
The Amazon region includes nine countries in South America, all characterized by a highly culturally diverse population. However, a large number of indigenous groups are facing high levels of poverty, strengthened by their isolation and lack of access to basic services, such as electricity. These indigenous communities, which, paradoxically, would benefit the most from an increase in the provision of basic services, are often left behind national development plans [1][2][3].
In order to provide such services and contribute to the Agenda 2030 and related Sustainable Development Goals (SDG), governments and development agencies need information on their numbers, distributions and specific needs. At the moment, this spatial information is missing in the Amazon region. Access is often expensive and restricted, and frequently only possible by air or via rivers, which hampers traditional methods of survey. Since early 90's, the initiative AmazonGISnet (http://www.amazongisnet.net/), supported by the University San Francisco de Quito with community leaders and local technicians that we have engaged and trained through several community and capacity building workshops. The trainings were designed so that the workflows for rural electrification planning can be replicated by the indigenous technicians themselves in other areas. These included several informative workshops about the project goals with different communities, EO and GIS-based analysis, and field data collection about socioeconomic variables using tablets.
The goal of our investigation is to develop and test a methodology which is simple and accurate enough to identify and quantify small settlements in the Amazonia. This is done with the objective to generate a spatially explicit model of the electricity demand in off-grid settlements. For that, we made use of geolocated survey data, the Google Earth Engine (GEE) cloud computing platform [19] and freely available Landsat imagery, and a small subset of VHR imagery.

Study Area
While electricity grids reach 99% of the Ecuadorian population, according to the national Census data, the Ecuadorian Amazon has the lowest electrification rates (73%) [10]. Their main economic activity is rotation agriculture, where rotation cultivation is practiced, leaving the forest to regenerate after a few years of cultivation while moving onto nearby plots. The construction materials (wood, leaves, or concrete) can vary from one community and nationality to another, as well as the amount of non-residential buildings (health posts, meeting buildings, schools, etc.) and the distribution of households (sparse or concentrated).
We selected two larger study areas ( Figure 1) covering seven out of the ten nationalities of Ecuador (i.e., Kichwa, Shuar, Achuar, Siona, Secoya, Cofan and Sapara). Figure 1 shows the nighttime lights based on the Visible Infrared Imaging Radiometer Suite (VIIRS) as well as the two study areas. Some sources of light within the northern study area are visible. However, some communities nearby still do not have access to electricity. Wide rivers and orography still pose a barrier to extend the grid, and some of those lights are caused by gas flares from oil platforms [20].

Field Data
We trained eight indigenous technicians who surveyed a total of 49 communities and 725 households from March to June 2019. Data were collected using the app Geofarmer (http://app.geofarmer.org) [21]. The app allowed technicians to store the survey data, geolocate it, and upload it to the server once internet connection was reached. Structured interviews with each community leader were conducted to gather information on environmental changes about the living Figure 1. Nighttime lights map of Ecuador at 1 km resolution based on the median composite of the Visible Infrared Imaging Radiometer Suite (VIIRS) for the year 2018. Yellow boxes show the areas analyzed. Note: some of the light spots to the east correspond to gas flares from oil platforms, rather than electric light [20].

Field Data
We trained eight indigenous technicians who surveyed a total of 49 communities and 725 households from March to June 2019. Data were collected using the app Geofarmer (http://app. geofarmer.org) [21]. The app allowed technicians to store the survey data, geolocate it, and upload it to the server once internet connection was reached. Structured interviews with each community leader were conducted to gather information on environmental changes about the living conditions, but also about needs and wishes of the community-e.g., number of people and population distribution, number of households, number and type of infrastructure built and to be built and prices of different types of fuels. Besides, another survey was carried out in approximately 10% of the households of the communities with information related to the habits of consumption and needs of energy-e.g., expenses incurred in fuel (gas, gasoline, diesel), number of batteries and candles used or type of electrical appliances they would like to have. These questions have been used to calculate a demand of electricity per household and to develop the electricity model, which will be presented in a separate paper, currently under preparation. The technicians also prepared conceptual maps of each community visited that were used to assess the results of our maps and to better interpret the VHR images ( Figure 2).

Image Processing
We used the cleared land as a proxy for the settlement size using Landsat imagery. Estimating population sizes by calculating the cleared area from a plane or satellite image is an approach already employed by organizations working with uncontacted people in order to minimize the disturbances to them [22]. From a remote sensing perspective, cleared land means pixels with significantly less vegetation than the matrix of forested pixels. This lack of vegetation is directly related to the presence of people through their activities (mainly construction and agriculture). Landsat-8 offers good acquisition rates over the Amazonia at 30 m resolution, and the fmask [23] makes it possible to remove the clouds and cloud shadows to create high quality cloudless composites.
One option to obtain usable data from the satellite images is to classify the landscape into categorical classes and extract the class of interest. However, this method needs accurate and welldistributed training and validation data of all the classes present. Obtaining such information often demands extensive field sampling campaigns or the use of substantial amounts of VHR data. Performing an accurate classification also requires a relatively high knowledge of remote sensing and machine learning systems. In turn, this requires complex workflows whose transferability to non highly specialized personnel is challenging. Instead, we focused our analysis on extracting the land cover class we needed (cleared land) using methods as automatic as possible. We used the Tasseled Cap component of Wetness (TCW) [24] to locate and delineate settlements. Lower values in this component matched the settlement areas even better than the Greenness component, or the high values of the Brightness component. Landsat TCW has been used before to track deforestation in tropical regions because it reacts better than vegetation indices to the loss of vegetation cover [25][26][27]. We computed the median TCW of all Landsat-8 images for the period 2013-2018 via GEE and applied the fmask [23]. We also used the Global Surface Water (GSW) layer [28] to mask out water bodies, as

Image Processing
We used the cleared land as a proxy for the settlement size using Landsat imagery. Estimating population sizes by calculating the cleared area from a plane or satellite image is an approach already employed by organizations working with uncontacted people in order to minimize the disturbances to them [22]. From a remote sensing perspective, cleared land means pixels with significantly less vegetation than the matrix of forested pixels. This lack of vegetation is directly related to the presence of people through their activities (mainly construction and agriculture). Landsat-8 offers good acquisition rates over the Amazonia at 30 m resolution, and the fmask [23] makes it possible to remove the clouds and cloud shadows to create high quality cloudless composites.
One option to obtain usable data from the satellite images is to classify the landscape into categorical classes and extract the class of interest. However, this method needs accurate and well-distributed training and validation data of all the classes present. Obtaining such information often demands extensive field sampling campaigns or the use of substantial amounts of VHR data. Performing an accurate classification also requires a relatively high knowledge of remote sensing and machine learning systems. In turn, this requires complex workflows whose transferability to non highly specialized personnel is challenging. Instead, we focused our analysis on extracting the land cover class we needed (cleared land) using methods as automatic as possible. We used the Tasseled Cap component of Wetness (TCW) [24] to locate and delineate settlements. Lower values in this component matched the settlement areas even better than the Greenness component, or the high values of the Brightness component. Landsat TCW has been used before to track deforestation in tropical regions because it reacts better than vegetation indices to the loss of vegetation cover [25][26][27]. We computed the median TCW of all Landsat-8 images for the period 2013-2018 via GEE and applied the fmask [23]. We also used the Global Surface Water (GSW) layer [28] to mask out water bodies, as well as beach and sand bar areas that had low TCW median values.
In the next step, we established a threshold that separated the lowest TCW values corresponding to the settlements. Histogram thresholding methods can be used to separate different types of classes of values according to their frequency. We used two of such methods. One is the Otsu (1979) [29] thresholding method. It maximizes the inter-class variance based on the observed distribution of pixel values. It has been used before to separate water bodies (with high TCW values) from land masses (lower TCW values) [30]. Because we masked out the water pixels using the GSW layer, the threshold that maximizes the interclass variance now is the one between settlements (low TCW values) and forest mass (high TCW values). While this method is appropriate for bimodal distributions, sometimes images can show a unimodal distribution. In this case, the Rosin (2001) [31] thresholding method is more appropriate. It traces a line between the peak of the histogram to the higher end of the histogram ( Figure 3). The threshold point is the one that maximizes the perpendicular distance between the line and the histogram. We also applied this method to our TCW image, inverting first the histogram in order to comply with the assumption of this method. The method assumes that there is a dominant class that produces a peak in the lower end of the histogram (forests) and a secondary population at the higher end (cleared land) ( Figure 3). After retrieving the threshold value, the image histogram was reverted to its original distribution.
Resources 2020, 9, x FOR PEER REVIEW 6 of 19 method assumes that there is a dominant class that produces a peak in the lower end of the histogram (forests) and a secondary population at the higher end (cleared land) ( Figure 3). After retrieving the threshold value, the image histogram was reverted to its original distribution.  [31] thresholding method in the inverted histogram of the TCW.

Settlement Modelling
Even though cleared land data in the Amazonia can show the location and extent of settlements, we still need to transform this information into building or population estimates. Using a set of VHR SkySat (1 m) and PlanetScope (3 m) imagery for 2019 of the surveyed communities we manually digitized all the buildings in our study areas. We analyzed the correlation between the number of cleared land pixels and the number of buildings visible in the VHR images. We limited this analysis to a radius of 200 m around the location of each of the household surveys. Patches of cleared land within 60 m (two Landsat pixels) of another were aggregated, and each digitized building was assigned to its closest patch of cleared land within another 60 m radius. This was necessary because on some occasions, the buildings were not inside the cleared land patch, but rather adjacent to it.
We also studied the relation between the digitized buildings against the number of buildings declared in the field survey by the community leader. For this, we accounted for all the cleared land pixels and digitized buildings in a 500 m radius around each one of the 49 communities surveyed. This second analysis was performed in order to compare results against the survey data; number of people, number of houses, and number of infrastructures per community.

Validation
In order to validate the model, we applied the relation between cleared land pixels and buildings found in larger un-surveyed areas and tested the correlation between the predicted and real number

Settlement Modelling
Even though cleared land data in the Amazonia can show the location and extent of settlements, we still need to transform this information into building or population estimates. Using a set of VHR SkySat (1 m) and PlanetScope (3 m) imagery for 2019 of the surveyed communities we manually digitized all the buildings in our study areas. We analyzed the correlation between the number of cleared land pixels and the number of buildings visible in the VHR images. We limited this analysis to a radius of 200 m around the location of each of the household surveys. Patches of cleared land within 60 m (two Landsat pixels) of another were aggregated, and each digitized building was assigned to its closest patch of cleared land within another 60 m radius. This was necessary because on some occasions, the buildings were not inside the cleared land patch, but rather adjacent to it.
We also studied the relation between the digitized buildings against the number of buildings declared in the field survey by the community leader. For this, we accounted for all the cleared land pixels and digitized buildings in a 500 m radius around each one of the 49 communities surveyed. This second analysis was performed in order to compare results against the survey data; number of people, number of houses, and number of infrastructures per community.

Validation
In order to validate the model, we applied the relation between cleared land pixels and buildings found in larger un-surveyed areas and tested the correlation between the predicted and real number of buildings.

Settlement Growth Model: SLEUTH
Indigenous settlements are not static. Houses are taken down and rebuilt, rotation agriculture is practiced, some new settlements are formed, and others have to be moved because of the advancement of the river. Additionally, population numbers change through migration and through improvements in health conditions and birth rates. Thus, in order to estimate a demand for electricity that is still valid in the mid-term, it is possible to project population spatial distribution using urban growth models, such as SLEUTH [32]. We mapped the settlement areas for 1990 and 2000, in addition to 2018 using the same methodology as in Section 2.3. The only exception was that we could not apply the Otsu nor Rosin methods because the results were not consistent through time. Instead, we used manual thresholding using visual interpretation for the 1990 and 2000 periods (Table 1). Cellular Automata (CA) are one of the most popular simulation tools in geography. Their handling is simple and, at the same time, they are able to capture complex patterns of development [33,34]. In Clarke's Urban Growth Model [35], more commonly known as SLEUTH, urbanization is seen as a diffusion process of complex urban patterns. It has been applied in urban growth studies all over the world [36][37][38][39]. SLEUTH is an acronym of its initial input factors: slope, land use, exclusion (in our case water bodies), transport, and hillshade. Five growth coefficients (dispersion, breed, spread, slope, road gravity) define four growth rules-spontaneous growth, reflecting the random emergence of new urban areas, new spreading center growth, edge growth depicting urban sprawl, and road-influenced growth. The last one is SLEUTH-specific and relocates a temporary cell along the road to its final position ( Figure 4). Space and time are treated discretely in the CA. One growth cycle represents one year of urban growth and consists of the four aforementioned subsequent growth rules. Each selected new urban cell is compared to the local slope and exclusion information as well as a random value. The growth coefficients are defined during the calibration process. Every parameter combination of the particular growth coefficients between values of 0 to 100 is tested until the optimal balance is assessed. In his thesis, Goetzke (2011) [40] modified the urban growth model and implemented it into XULU (eXtendable Unified Land use Modeling Platform), a modeling environment developed at the University of Bonn [41]. The standard calibration evaluation method has been replaced by the multiple resolution validation (MRV) [42]. Thus, the urban land-use input is reduced from five to two.

Settlement Modelling
The TCW values under the Rosin threshold managed to locate and map different types of settlements; from large and electrified, to isolated small communities ( Figure 5). This gave information about their location and size. .

Settlement Modelling
The TCW values under the Rosin threshold managed to locate and map different types of settlements; from large and electrified, to isolated small communities ( Figure 5). This gave information about their location and size.

Settlement Modelling
The TCW values under the Rosin threshold managed to locate and map different types of settlements; from large and electrified, to isolated small communities ( Figure 5). This gave information about their location and size.
. Next, we studied the relation between the size of each settlement, given by the number of cleared land pixels, and their actual number of buildings, that were digitized from the VHR imagery. This was done for all communities within 200 m around the 725 surveyed households. It resulted in a correlation of r = 0.85 and an adjusted R 2 = 0.72 ( Figure 6). Next, we studied the relation between the size of each settlement, given by the number of cleared land pixels, and their actual number of buildings, that were digitized from the VHR imagery. This was done for all communities within 200 m around the 725 surveyed households. It resulted in a correlation of r = 0.85 and an adjusted R 2 = 0.72 ( Figure 6).   Next, we studied the relation between the size of each settlement, given by the number of cleared land pixels, and their actual number of buildings, that were digitized from the VHR imagery. This was done for all communities within 200 m around the 725 surveyed households. It resulted in a correlation of r = 0.85 and an adjusted R 2 = 0.72 (Figure 6).
.   The number of digitized buildings within the 500 m radius showed a high correlation with the number of cleared land pixels (r = 0.84, adjusted R 2 = 0.70, Table 1). According to the survey data, population numbers were correlated with the number of buildings (r = 0.82, adjusted R 2 = 0.67). Note that these two pairs of datasets have been derived from either satellite imagery or survey data. When analyzing in combination, the survey and satellite data, correlation values were lower (Table 1). For instance, the correlation between the cleared land pixels and the number of houses declared in the surveys was of only r = 0.58, adjusted R 2 = 0.32. The correlation between the cleared land pixels and the number of inhabitants was even lower (r = 0.30, adjusted R 2 = 0.07). Correlation values increased when including the number of infrastructures (Table 2). This is expected, since the digitized number of buildings included living spaces as well as infrastructures, because it was not possible to distinguish them in satellite images.

Validation
In order to transform the cleared land into the number of buildings, we applied the linear relationship derived from Figure 6 (Equation (1)) between the number of buildings and cleared land pixels, which is the one with a higher adjusted R 2 and higher number of samples (n = 149): where y is the number of buildings and x is the number of cleared land pixels. This provided a spatially explicit estimation of the demand of services. Besides, it is possible to apply this same relationship to the results of models that forecast settlement growth in order to generate a demand that will still be valid in the future. In order to validate the model, we applied equation 1 to a larger set of 1726 identified clusters of potential settlements in an un-surveyed area. The correlation between the predicted and the real number of houses, digitized from the VHR imagery, resulted in a moderate value of r = 0.69. Table 3 presents the results of the settlement simulation. The model has been calibrated by using the settlements of 1990 as the start year and 2000 as the final year. The validation was performed using the settlements for 2018 (Table 1). Calibration was carried out with the implemented multi resolution validation [42]. The Ft value represents the mean overall agreement weighted over different resolutions. It is a very good measurement for balancing the error of quantity and error of location. The error of location is regularly higher when it comes to land use modeling since the change rate is very often lower than the persistence signal. However, both values-calibration and validation-show very high accuracies. In order to obtain a better impression of the model's performance, we also measured the hits, misses, false alerts, and correct rejections. The specificity outperformed the sensitivity, meaning the prediction performance of no-change is very high (0.96/1.0), while the prediction performance of change is just average (0.54/0.57), shown in Table 3. The Kappa indices reveal accuracies better than random indices, but the location error is too high compared to the error of quantity. Therefore, Fuzzy Kappa is a good validation index because it considers the fuzziness of urban growth simulation. When not hitting settlement growth correctly pixel-by-pixel, Fuzzy Kappa mirrors how good the model hits the right neighborhood at least. The null model validation again depicts how good the model performs in comparison to a so-called null-model, simulating no-change at all. Due to the very high accuracy from the highest resolution-i.e., 30 m, and due to a low change signal-the null resolution-i.e., the resolution where the model outperforms the null model-is reached for south and north study areas at a points eight-times the original resolution-i.e., 240 m ( Figure 8). Table 3. Area of settlements observed and simulated for the northern and southern study areas. Additionally, the accuracies and coefficients for both models are depicted.

SLEUTH Results
North South  In total, for the northern part of the region of interest, we project a growth of settlement area of 59% for the year 2030 and for the northern part of 63%. Figure 9 depicts the settlement growth for a part of the southern study area from 1990 to 2030, along with the probability of increase. Results showed a steady increase in the settlement area (34,866 ha/year in the north and 2901 ha/year in the south). The edge growth directed by the spread coefficient (Table 3) determines the visual impression. Since CA, as with SLEUTH, are stochastic models resulting in a slightly different patterns with every model run, one should perform several iterations in order to get a reliable spatial impression. Accordingly, a map showing the outcome of 100 Monte Carlo Iterations depicts the probability of urban growth (Figure 9). In total, for the northern part of the region of interest, we project a growth of settlement area of 59% for the year 2030 and for the northern part of 63%. Figure 9 depicts the settlement growth for a part of the southern study area from 1990 to 2030, along with the probability of increase. Results showed a steady increase in the settlement area (34,866 ha/year in the north and 2901 ha/year in the south). The edge growth directed by the spread coefficient (Table 3) determines the visual impression. Since CA, as with SLEUTH, are stochastic models resulting in a slightly different patterns with every model run, one should perform several iterations in order to get a reliable spatial impression. Accordingly, a map showing the outcome of 100 Monte Carlo Iterations depicts the probability of urban growth (Figure 9). In total, for the northern part of the region of interest, we project a growth of settlement area of 59% for the year 2030 and for the northern part of 63%. Figure 9 depicts the settlement growth for a part of the southern study area from 1990 to 2030, along with the probability of increase. Results showed a steady increase in the settlement area (34,866 ha/year in the north and 2901 ha/year in the south). The edge growth directed by the spread coefficient (Table 3) determines the visual impression. Since CA, as with SLEUTH, are stochastic models resulting in a slightly different patterns with every model run, one should perform several iterations in order to get a reliable spatial impression. Accordingly, a map showing the outcome of 100 Monte Carlo Iterations depicts the probability of urban growth (Figure 9).

Settlement Modelling
The use of the TCW to identify settlements worked in a specific environment, such as the Amazon rainforest, because water indices are sensitive to the removal of vegetation and the water it contains [25][26][27]. Low TCW values were also found high up in the Andean mountains where vegetation cover is very low. These were not part of our study area but could be masked out using altitude criteria.

Settlement Modelling
The use of the TCW to identify settlements worked in a specific environment, such as the Amazon rainforest, because water indices are sensitive to the removal of vegetation and the water it contains [25][26][27]. Low TCW values were also found high up in the Andean mountains where vegetation cover is very low. These were not part of our study area but could be masked out using altitude criteria.
We conducted tests using Sentinel-2 composites, but the lack of thermal bands made it challenging to efficiently remove the noise produced by persistent cloud cover [44]. Similar analyses were conducted using Sentinel-1 multitemporal metrics, but many buildings were too small and too embedded with the vegetation to cause the characteristic high backscatter values of built-up surfaces.
The two histogram thresholding methods we used (Otsu, 1979 [29]; Rosin, 2001 [31]) yielded the same value for the 2018 time stamp, and very similar values for the 1990 and 2000 time stamps, indicating a good separability between the values corresponding to the settlements and the rest. An additional advantage of using a continuous dataset, such as the TCW, rather than discrete classification values, is that it can locate settlements even within mixed pixels caused by the small size of these settlements. The TCW is thus a good proxy for settlement monitoring in forested tropical regions. It revealed a large number of communities that are not represented in any of the global urban products existent to date, such as the Global Human Settlement Layer [45] or Global Urban Footprint [46]. If working on dryland or savannah environments, the TCW will not be a good proxy for settlement detection. In this case, higher values of vegetation indices produced by orchards surrounding communities could be used as proxies for settlement size. In our case, the greenness TC component had a higher contamination of cloud cover, returning many false positives-i.e., unmasked clouds were labeled as low vegetation areas, and thus mixed up with settlements.
While limiting the analysis to the areas within 500 m from where the community survey took place allowed us to use the community survey data in the analysis (e.g., number of inhabitants, infrastructure, etc.), expanding the analysis to 200 m around where all household surveys took place allowed us to increase the sample size from 49 (community surveys) to 149 (clusters of buildings), but without survey data.
We got high correlation values and coefficients of determination between cleared land and number of buildings observed in the VHR imagery (Figures 6 and 7). However, these values were lower when comparing the cleared land pixels 500 m around the center of the community against the number of buildings reported by the survey (Table 2). This is partly caused by some houses that belong to the same community (and reported in the community surveys) but which are further away than the 500 m radius established ( Figure 10). Enlarging this radius further would cause an overlapping of adjacent communities.
regions. It revealed a large number of communities that are not represented in any of the global urban products existent to date, such as the Global Human Settlement Layer [45] or Global Urban Footprint [46]. If working on dryland or savannah environments, the TCW will not be a good proxy for settlement detection. In this case, higher values of vegetation indices produced by orchards surrounding communities could be used as proxies for settlement size. In our case, the greenness TC component had a higher contamination of cloud cover, returning many false positives-i.e., unmasked clouds were labeled as low vegetation areas, and thus mixed up with settlements.
While limiting the analysis to the areas within 500 m from where the community survey took place allowed us to use the community survey data in the analysis (e.g., number of inhabitants, infrastructure, etc.), expanding the analysis to 200 m around where all household surveys took place allowed us to increase the sample size from 49 (community surveys) to 149 (clusters of buildings), but without survey data.
We got high correlation values and coefficients of determination between cleared land and number of buildings observed in the VHR imagery (Figures 6 and 7). However, these values were lower when comparing the cleared land pixels 500 m around the center of the community against the number of buildings reported by the survey (Table 2). This is partly caused by some houses that belong to the same community (and reported in the community surveys) but which are further away than the 500 m radius established ( Figure 10). Enlarging this radius further would cause an overlapping of adjacent communities. Figure 10. Community surveys were conducted in the center of the community (center of blue circle). However, on some occasions, some parts of the community were further away. In others, two communities were too close to each other to enlarge this radius.
The correlation between inhabitants and buildings declared in the surveys was high (r = 0.82), which suggests that the survey data are consistent. Another factor contributing to the low correlation between survey and satellite-derived data is that the correlations between inhabitants and infrastructure (r = 0.46, adjusted R 2 = 0.20), and houses and infrastructure (r = 0.55, adjusted R 2 = 0.29) Figure 10. Community surveys were conducted in the center of the community (center of blue circle). However, on some occasions, some parts of the community were further away. In others, two communities were too close to each other to enlarge this radius.
The correlation between inhabitants and buildings declared in the surveys was high (r = 0.82), which suggests that the survey data are consistent. Another factor contributing to the low correlation between survey and satellite-derived data is that the correlations between inhabitants and infrastructure (r = 0.46, adjusted R 2 = 0.20), and houses and infrastructure (r = 0.55, adjusted R 2 = 0.29) were relatively low (Table 2)-i.e., some communities have more built-up infrastructures (visible with satellite) per inhabitant (not visible with satellite) than others. This will affect the relation between cleared land pixels (created by infrastructure as well as by houses) and inhabitants. Regardless, the correlation between cleared land and buildings is high and solid (Figures 6 and 7), and it is a good proxy to estimate settlement distribution in Amazonian tropical forests.
We also investigated the relation between settlement size and the distance to the grid for the off-grid settlements, but the size of the settlements was not influenced by it (adjusted R 2 = 0.0003).

Validation
The area used for validation was larger than the one used for calibration (27.71 km 2 of cleared land vs. 3.45 km 2 , respectively) and in consequence more diverse in terms of the spatial distribution of settlements (e.g., size, concentration . . . ). We observed a higher rate of false positives (cleared land but no building) around the settlements, particularly in areas where cattle and mining activities are present, which also cause larger patches of cleared land. These are precisely areas where indigenous population could be more likely to increase for being closer to economic activities, but also be more affected by the environmental impacts these activities cause [5].
Out of the 5749 buildings digitized, only 4818 could be assigned to the cleared land pixels. All others, (false negatives) corresponded mostly to isolated single buildings away from large cleared areas. Although the correlation between predicted and real number of buildings was moderate (r = 0.65) and many false positives occurred (Figure 6), the tangible advantage of our results is to know the exact location, spatial distribution, and quantitative estimations of these marginalized communities for which there are little to no data available.

Settlement Growth Model: SLEUTH
The application of the SLEUTH model in this setting faced several limitations. First, patterns such as rotation agriculture and the large heterogeneity of the area made it difficult to identify areas of permanent settlements, which the SLEUTH model needs for calibration. Since many indigenous communities live within protected areas, this dataset commonly used to clamp settlement growth was not appropriate in our case. Additionally, the road network in the eastern communities is nonexistent, or it can only be used on foot, and many of these communities are only reachable by plane or boat. However, the 2030 projection gives an idea of the communities that might grow more, and the linear relationship modelled from Figures 6 and 7 can easily be applied to this data set. Another limitation is that SLEUTH is a pure growth model. A decrease or a perforation in settlement patterns is not possible. Additionally, one has to remember that SLEUTH originally has been conceptualized and implemented for metropolitan areas. Although there are studies addressing the rural neighborhoods of metropolitan areas [47], this is the first time that SLEUTH has been successfully implemented to a truthfully rural area in a region nearly untouched by the globalized world.

Implementation and Limitations
Global products and methodologies that map populations without access to electricity already exist. Some of these studies are at global or continental scales [17,18,48], and have a resolution not fine enough for implementing decentralized rural electrification plans. Other studies have made use of the VIIRS nighttime lights dataset [16,49]. However, some electrified rural areas that emit intermittent light might not be captured by this sensor [16]. In turn, gas flares from oil platforms can be recorded as light even though nearby settlements lack electricity. This was the case for some of our surveyed communities in the northern study area (Figure 1). Other studies make use of more accurate spatial and census data [50,51], which were not available in our case; a shortage common in regions of difficult access.
Gathering spatial information from indigenous communities is not new. Other initiatives have made use of participatory mapping techniques [52] for digitizing different features created by indigenous peoples [53] or surveying their preferences [54]. In this sense, the training we conducted with the indigenous technicians are key for the long-term success of the project and its replication in other areas [55,56]. Some initiatives have already failed due to lack of maintenance of the infrastructure, or loss of interest and mistrust from the local populations towards industry because of the environmental, health and social problems they have caused in their territories [1,5]. Thus, the active participation of indigenous people in performing surveys in their local language ensures that household information is accurate and realistic, reducing the risk of creating standard but unrealistic solutions from planning offices. In addition, some planners can prioritize villages with the highest need or with a higher demand and potential for development [12].
In this context, the SE4All approach fits well since it encourages the use of distributed renewable energy systems to increase access to electricity, and with that, the sustainable development of vulnerable communities [57]. By transferring this methodology to them, their communities can assume a larger control of their own development without relying on foreign actors. Still, support from the AmazonGISnet is necessary, for instance, to implement more complex modeling tools such as SLEUTH. Making these methods replicable by the indigenous communities implied to be constrained to using accessible satellite imagery without making it excessively complex. Thus, despite land cover classifications being a common way of producing urban area estimates (e.g., [58]), using a single proxy, such as the TCW and an automated threshold method, was a much more straightforward approach. Although we used statistical methods to separate the TCW values corresponding to the settlements from the rest, it is also possible to set the threshold using visual interpretation and high resolution imagery. Understandably, different interpreters might give slightly different threshold values. However, this should not affect the positive correlation between cleared pixels and number of buildings, nor will it affect the results as long as consistency is kept-i.e., using the same threshold for a whole homogenous image.
These spatially explicit models greatly help to estimate real costs of rural electrification plans such as grid extension as well as decentralized solutions-i.e., individual solar systems for sparse settlements with low demand, or solar mini-grids for more concentrated settlements with a higher demand. Moreover, this information helps to specify and accelerate decentralized rural electrification plans by providing accurate and affordable estimates of demand for electricity companies and governments in the Amazon region, which usually have limited resources [8].

Conclusions
Earth Observation is key to enable development action plans to reach the Sustainable Development Goals in marginalized communities and the Global Development Agenda 2030. Indigenous communities in the Amazon are often left out national development plans. Additionally, their location, size, or spatial arrangement is mostly unknown or misinterpreted as they live in very remote areas. By using remote sensing derived proxies, such as the Tasseled Cap Wetness component we can map the location, size and distribution of settlements at sufficient resolutions to identify even small rural villages. Combining this information with data on living conditions, we can efficiently estimate the needs of basic services, such as electricity for larger territories, and especially for the regions that need it the most. Besides, the simplicity of the methodology we have developed makes it possible to transfer it to trained indigenous technicians so that they can replicate it in their territories and have more autonomy in the development of their own communities. Moreover, integrating the indigenous communities directly using a participatory mapping approach helps to get demographic insights relevant to set up development plans. Additionally, their involvement helps to give a voice to rural communities living in remote areas and to understand their views and needs. The approach developed in the project SE4Amazonian can support renewable energy companies to draw rural electrification plans in a few months compared to the years it would take by carrying out field inspections on their own. Furthermore, the here presented methods could be embedded in other attempts to foster sustainability.