Large infrastructure projects, such as industrial mining projects, act as a strong pull factor for migration in low- and middle-income countries [1
]. The main driver of in-migration into project areas is often the large workforce required, particularly during the construction phase [3
]. In addition, multiplier effects on local employment, including petty traders and small-scale service providers, lead to an even higher number of people profiting from the mine than merely the direct mining employees [4
]. As a result, sparsely populated remote areas can be transformed into busy semi-urban environments within a few years [5
In these areas, the rapid influx of migrants can strain local health systems, food and water supplies, sanitation and waste management systems, as well as other public services such as education, and thus lead to a diverse set of environmental, social and health impacts [3
]. It is therefore of crucial importance for policy makers to understand the spatial and temporal population growth patterns within their constituency for adequate resource allocation, development planning or disaster management [7
In sub-Saharan Africa, keeping track of migration and population growth is usually done through censuses [10
]. The implementation of censuses is costly and therefore usually conducted only once in a decade [10
]. This temporal resolution is, however, not sufficient to identify the fast-paced migratory patterns associated with large infrastructure developments.
In the absence of reliable population data, remote sensing applications have the potential to help trace settlement changes in remotely located mining areas in sub-Saharan Africa [8
]. The opening of the Landsat archive in 2008 together with freely available software has created opportunities for researchers and public institutions in resource-poor areas to use remote sensing techniques for population tracking [13
Indeed, over the last few decades, Landsat imagery has been increasingly used for land use classification [15
]. Combining Landsat imagery with auxiliary data, different approaches have been developed to trace urban growth at high temporal resolutions. For example, Gong and colleagues produced annual maps of settlements over China for a 40-year period in conjunction with night-time light data [16
]. While they achieved high accuracies in the urban coastal regions, the accuracy in the sparsely populated areas in the backcountry was considerably lower [16
]. Other approaches include using zonal plans, very high-resolution satellite imagery, aerial photographs or ground-truth information from field visits as auxiliary data [8
]. However, in rural areas in sub-Saharan Africa, this data is either not applicable for land use classification or not available on a larger scale. Alternatively, visual interpretations of Landsat imagery by experts can serve as training data for land use classification [12
]. But at the 30 m pixel size Landsat imagery provides, this is hardly feasible in areas with scattered settlements lacking tarred roads or large building complexes, inherent to many rural places in sub-Saharan Africa.
Historic Google Earth imagery could serve as a cheap and widely available information source to derive multi-annual training datasets. Different studies have successfully incorporated this data source to produce land use maps [12
]. Most prominently, Gong et al. [21
] used Google Earth imagery to generate training datasets for a global land cover product at 30 m resolution. Further, Schneider has identified stable land uses for studying land use changes around major Chinese cities [18
]. However, the vast majority of existing studies have either had a focus on densely populated urban and peri-urban areas [20
], produced land use classifications at lower temporal resolution [24
], or relied on auxiliary ground-truth data and datasets that are not freely available in remote locations of sub-Saharan Africa [17
In summary, as a foundation for policy making and impact assessment practice in the context of large mining projects, methods are needed for tracking population growth at a high spatial and temporal resolution [30
]. For the method to be widely applicable, it should (i) only incorporate freely available data; (ii) rely on imagery with high geographical and temporal coverage; and (iii) perform well in a rural setting. Therefore, the overarching objective of this study is to use freely available data from the Landsat archive in conjunction with historic Google Earth imagery to quantify annual settlement growth patterns in rural settlements in sub-Saharan Africa. The specific research questions are: (i) Is suitable satellite imagery and training data available for the time period of interest? (ii) Is the classification result of built-up areas comparable between the different years? (iii) Can migration patterns be detected in industrial mining areas and at what geographical extent?
3.1. Availability of Landsat Satellite Imagery
Across sensors and study areas, 716 images with cloud cover below 10% were available (Table 1
). The total number of downloaded images was 101, 428, and 187 from the Landsat 5, 7, and 8 missions, respectively. Until 2013, images from the Landsat 5 mission were available and the Landsat 8 satellite was launched in 2013. The Landsat 7 satellite provided images throughout the study period. However, since a failure in the Scan Line Corrector (SLC) in early 2003, the images show stripes of missing data.
depicts the capture dates of the retained satellite images that yielded high-quality land use maps using our approach. In total, 211 images were included for the post-classification steps (see Table 1
). Of note, the vast majority of retained images were taken in the beginning (i.e., January and February) or the end (i.e., October–December) of the calendar year. These months coincide with the dry season in Burkina Faso.
For most years enough Landsat images could be retained. However, in a few instances only two useful images were available (e.g., in the Bissa area in 2012). In the case of disagreement between the two classifications, the modal value of the two initial land use classes was randomly assigned. Further, in some instances when only a few images were retained in the image stack, patches of missing data remained due to cloud coverage and gaps in SLC-off Landsat 7 scenes.
3.2. Availability of Historic Google Earth Imagery
More challenging than getting satellite imagery was to obtain Google Earth images to generate a training dataset valid for the entire study period. The availability of high-resolution imagery varied strongly depending on the location of the study area so that the start date of the study needed to be shifted. In general, older images were available over the capital Ouagadougou. In the more remote and rural areas, historic Google Earth imagery of sufficient resolution for determining land use was only available from around 2006/2007. Even in these instances, finding images covering all land use classes was cumbersome for that period. It was particularly challenging to delimit seasonal water bodies that partly or entirely dry out towards the end of the dry season.
3.3. Settlement Growth in Mining and Non-Mining Areas
The percentage of built-up areas over time in the four mining areas and their comparison areas are depicted in Figure 4
. Overall, differences in the variability of the growth curves were observed. In the areas where training data was obtained from anywhere within the satellite scene (i.e., Bissa and Taparko), a higher variability was observed. Indeed, there were a number of outliers in the classification in Bissa and Taparko leading to negative growth of settlements. For example, the raw classification for the Taparko scene in 2016 featured particularly few urban pixels and thereby leading to negative settlement growth in the previous year through the temporal consistency correction. Further, in the Bissa scene only two images were retained for 2009 and 2012 with extreme numbers of classified urban pixels. Visual inspection of the raw classification maps revealed that in these cases the misclassified urban pixels were mainly over barren and rocky ground. After application of the temporal consistency correction, the number of misclassified pixels could be reduced (see Figure 5
Generating training data in the proximity of the areas of interest (i.e., Essakane and Youga) led to more stable results classification results over the years. Only in a few instances were negative growth years observed. In these areas a general urbanization trend, at different paces, was seen.
The growth curves showed different slopes both throughout the study period and across study areas. However, no clear pattern could be observed that could indicate strong in-migration to the studied mining areas. Although in some areas the settlements are at a greater distance from the mines, the growth patterns were similar in the different geographical extents.
3.4. Accuracy Assessment
shows the result of the accuracy assessment. The OA for the different scenes were 86.4%, 58.5%, 80.3%, and 95.1% for Bissa, Taparko, Essakane, and Youga, respectively. Overall, there were large differences between the two approaches used for training data generation and between the study areas. The Kappa coefficient of the individual study areas ranged from as low as 0.176 to 0.902. Only in Youga was the classification sufficiently sensitive in detecting built-up pixels. In all scenes, only few non-built-up pixels were misclassified as built-up. Obtaining training data in the proximity of the study areas (approach 2) improved the accuracy substantially. However, in the Essakane scene only 30.4% of the built-up pixels in the reference dataset were correctly classified. Visual inspection of misclassified pixels revealed that most errors occurred in the less densely populated fringes of villages and at isolated clusters of buildings (see Figure 6
High-resolution Google Earth and 716 Landsat images were used to estimate annual settlement growth in rural mining areas in Burkina Faso. While the number of satellite images from Landsat was sufficient, finding adequate training data among historic Google Earth imagery was challenging. Indeed, in our study areas high-resolution imagery before 2006 was only available over larger urban areas. Still, using training data in proximity to the areas of interest reduced the inter-annual variability and resulted in higher classification accuracy. Overall accuracy of the four scenes ranged from 58.5% to 95.1%. These results show that with local training data and relatively humid environments the proposed methodology can yield stable and accurate estimates of settlement growth over time. However, due to the limited number of accurately classified study areas, no apparent differences in settlement growth patterns between mining and comparison areas were observed.
When comparing the growth curves of the predominantly rural areas selected for this paper with those of mainly urban areas reported in other publications, three patterns were observed: (i) the availability of Google Earth imagery influenced the classification accuracy; (ii) negative growth was observed in some study areas; and (iii) there is limited potential for additional post-classification correction approaches in our study setting. Each of these observations is discussed separately in the subsequent paragraphs.
Regarding the varying accuracy, it is noteworthy that the availability of historic high-resolution Google Earth imagery was limited. The available Google Earth scenes in the beginning and end of the study period had to be used as training data in order to meet the required sample size for fitting the SVM model [37
]. When training data were located in cropped cloud areas or extents with remaining haze coverage, classification accuracy was low, leading to the exclusion of a substantial number of scenes during the visual quality assessment. Further, the accuracy assessment was limited to one extent in one year for each site because of the limited availability of Google Earth imagery. Still, the assessment indicates that for most scenes the number of undetected built-up pixels was substantially higher than in other studies [16
], but also that the classification of the Youga scene provided very high accuracies. This scene differed in two aspects. Firstly, training data was obtained more closely to the area of interest, and secondly it is located further south in a tropical savanna climate. The lower accuracies in the other scenes may be caused by the similar spectral signatures of urban areas and natural bare surfaces (e.g., low normalized difference vegetation index (NDVI), an indicator for healthy vegetation) [29
]. These similarities may be more pronounced in the semi-arid regions of northern Burkina Faso, where vegetation is sparse and the corrugated sheet roofs are often covered by a sand layer. Indeed, the vast majority of available cloud-free scenes used in this study were taken in the dry Harmattan season, characterized by dusty trade winds. Purposively selecting scenes shortly after the growing season might alleviate this problem.
A few other studies have also reported negative or absent settlement growth within their study period, although to a lesser degree [17
]. Whether this was due to actual removal of buildings or misclassification errors is however not discussed. The higher variability found in the present study can partly be explained by the low percentage of built-up pixels in relatively small geographical areas. Hence, misclassification of, e.g., a patch of rocky ground into the built-up class will lead to a significant spike in the number of urban pixels in that year. Further, the absence of accelerated growth patterns in mining areas may also be due to a densification of housing within the existing settlement extents, which is difficult to detect at 30 m pixel size.
Regarding post-classification approaches, other studies observed that more robust results were obtained when incorporating spatial consistency checks, in addition to the temporal consistency correction as applied in this study [33
]. This approach includes a calculation of the probability of a pixel to be urban as a function of the surrounding pixels. Although this may reduce the “salt and pepper” effect in scattered sparsely populated areas in rural sub-Saharan Africa where building clusters only cover a few pixels, this may lead to an underestimation of the built-up areas.
The strength of the method used in this study is the reliance on globally and freely available data and its relatively straight-forward workflow relying on few image pre-processing steps. This could make it useful for researchers and public institutions with limited technical expertise to track settlement changes in areas where reliable and up-to-date population data is scarce. In these cases, the Google Earth training data could be complemented with additional ground-truth points from field observations.
As the repositories are continuously built up, Landsat and high-resolution Google Earth imagery will become increasingly available for longer periods allowing for long-term tracking of population growth remote areas. Additionally, other imagery from more recently launched satellite missions could be incorporated in the workflow. For example, the Sentinel-2 satellites provide freely available imagery at a 10–60 m resolution on a nearly global coverage since 2015 [39
]. While this timeframe was not sufficient for the present study, it could serve as a good baseline for future endeavors for multi-annual land use classifications [27
]. Still, the increased resolution could reduce the problem of pixels featuring multiple land use classes.
Future studies should also investigate the performance of the approach in remote areas in other climatic zones, potentially incorporating other spectral indices, such as NDVI or natural built-up index (NDBI). Additionally, in order to determine the magnitude of mining-related population growth, more long-term studies covering a higher number of mining areas are needed.