Characterizing Small-Town Development Using Very High Resolution Imagery within Remote Rural Settings of Mozambique

: While remotely sensed images of various resolutions have been widely used in identifying changes in urban and peri-urban environments, only very high resolution (VHR) imagery is capable of providing the information needed for understanding the changes taking place in remote rural environments, due to the small footprints and low density of man-made structures in these settings. However, limited by data availability, mapping man-made structures and conducting subsequent change detections in remote areas are typically challenging and thus require a certain level of ﬂexibility in algorithm design that takes into account the speciﬁc environmental and image conditions. In this study, we mapped all buildings and corrals for two remote villages in Mozambique based on two single-date VHR images that were taken in 2004 and 2012, respectively. Our algorithm takes advantage of the presence of shadows and, through a fusion of both spectra- and object-based analysis techniques, is able to differentiate buildings with metal and thatch roofs with high accuracy (overall accuracy of 86% and 94% for 2004 and 2012, respectively). The comparison of the mapping results between 2004 and 2012 reveals multiple lines of evidence suggesting that both villages, while differing in many aspects, have experienced substantial increases in the economic status. As a case study, our project demonstrates the capability of a coupling of VHR imagery with locally adjusted classiﬁcation algorithms to infer the economic development of small, remote rural settlements.


Introduction
Over the past decades, satellite observations have offered a large improvement in understanding the global distribution of population. Numerous datasets have been developed to document the presence, extent, and dynamics of population growth and the built environment. Repeated comprehensive global coverage and free data access enabled wide use of moderate and coarse resolution satellite imagery for assessments at various scales. Schneider, et al. [1] proposed one of the first algorithms that mapped urban areas globally based on the Moderate Resolution Imaging Spectroradiometer (MODIS) imagery. Mapping urban areas at a spatial resolution of 1 km, this algorithm was incorporated into the standard MODIS MCD12 global land cover product [2]. Later, Schneider, et al. [3] developed an updated algorithm with a higher spatial resolution (500 m), which was subsequently integrated into Collection 5 of the MCD12 product [4]. While MODIS-based algorithms capture the extent of large urban areas such as cities, Landsat-based mapping efforts offer delineations of smaller urban areas and settlements supported by the finer spatial resolution of Landsat observations. Early studies focused on mapping individual urban areas from Landsat imagery appeared in the 1980s [5] and the mapping developments continued to be limited in spatial scope throughout the 1990s and early 2000s (e.g., Li and Yeh [6] and Ji, et al. [7]), largely due to the high cost of Landsat imagery. Since the opening of the Landsat archive in 2008, when Landsat imagery became freely available, Landsat data have been increasingly used for regional-and global-scale mapping efforts [8][9][10][11][12][13]. In addition to directly mapping the extent of populated places, various satellite-derived metrics have been used to examine socio-economic conditions and assess living standards. For example, nighttime lights (NTL) datasets, derived first from the Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) [14][15][16][17][18], starting in 1992, and subsequently from the Visible Infrared Imaging Radiometer Suite (VIIRS) day-night band (DNB) [18][19][20][21][22], starting in 2012, capitalize on observations of electricity consumption after dark. Numerous studies have established correlations between the NTL data and local income [23], gross regional product [24], poverty [25], and a multitude of other socio-economic characteristics [26].
The absolute majority of projects have focused on changes in urban and peri-urban environments primarily due to the inability of coarse and moderate resolution sensors to capture the presence and characteristics of remote rural settlements across the world. Most algorithms are focused on identifying man-made materials (e.g., impervious surfaces that include both pavements and roofing materials) [27][28][29] or substantial electricity-enabled signals [30][31][32]. Additionally, most algorithms rely on a sufficient density level of manmade materials that can be detected within a 30-500 m observational footprint. Neither is applicable to characterizing remote rural settlements where man-made materials are sparse and the mapping objects are far below the 30-500 m pixel size. Recent advances have been made in detecting the presence of remote rural settlements using moderate resolution data [33]. However, these efforts fall short of delivering reliable estimates of settlement extent and do not even attempt to assess the structure and composition of those allowing for direct inferences for population size and some aspects of people's socio-economic status at the village-to-household level.
Access to adequate housing-a major Sustainable Development Goal of the United Nations [34]-is frequently cited as a key indicator of improved socio-economic conditions in developing countries. In rural settings across Africa, this is often viewed as replacing buildings constructed from natural materials (mud walls and thatch roofs) with man-made materials which include concrete and brick walls and corrugated metal roofs [35]. These changes are commonly seen as an expression of an increase in local purchasing power and improved well-being [36]. Other signs of improved economic conditions include the establishment of community-level benefits, including schools, clinics, roads, and other relevant infrastructure (e.g., new places of business). In addition, livestock is viewed as a major economic asset in some rural communities within developing countries [37] and thus, changes in livestock-related structures which, unlike cultivated areas, are frequently incorporated into the settlement configuration, are also viewed as a metric for economic growth. Although cumulatively of considerable significance [35], the spatial scale of these improvements is generally beyond the detection capabilities of moderate resolution satellite monitoring and is completely outside the capabilities of coarse resolution-based observations.
Satellite-based very high resolution (VHR) imagery has become increasingly available since 2000. With spatial resolutions approaching or below 1 m, they offer unprecedented opportunities to map surface features through the identification of individual buildings [38]. This makes VHR data a crucial source for mapping remote rural settlements with low economic status. However, the VHR data present a different set of challenges, including limited availability, high cost, lack of between-image consistency, and relatively high geometric inaccuracies [39]. Immense variability of environmental settings and village/building structures among remote rural settlements of the world poses additional challenges to developing global or even regional algorithms for VHR-based rural settlement mapping. For example, many rural buildings are constructed from locally sourced natural materials (e.g., thatch and wood) which are spectrally similar to both senescent vegetation and bare ground, which are also a common background in remote rural settlements. This renders the mapping of buildings in remote villages based on the spectral signature alone extremely challenging and is considered one of the major limitations of VHR data [40]. Fortunately, VHR data are usually rich in spatial information (i.e., geometry, texture, and contextual relationships between the ground features [41][42][43][44]) which allows for the implementation of object-based image classification (OBIC). Compared to the classical spectra-based classification approaches, which generally operate at the pixel level, OBIC takes advantage of the rich spatial information present in the VHR imagery and considers surface features as individual "objects", which mimics humans' perception and object identification [45]. Due to its ability to identify complex surface features, OBIC has been increasingly applied to VHR imagery over the past two decades [46].
Many studies on urban environments based on remote sensing rely on some form of change detection, i.e., classification of rural areas using images that are acquired at two or more different points in time [7,[47][48][49] or using the continuous seasonal trajectory [50]. However, because of the comparatively short time span of commercial VHR data collection (compared with moderate-and coarse-resolution data) and the limited availability of existing VHR images, a meaningful change detection based on VHR data usually involves handling images that are acquired by different sensors with drastically different specifications and image characteristics. Therefore, successful implementations of change detection based on VHR data would require a high level of flexibility embedded in the methodology, one that can be adjusted on an image-by-image basis. In this project, we examined the potential for characterization of remote rural settlement structures and their changes in Mozambique using disparate VHR observations from different sensors. Our primary goal is to develop an approach that fuses both pixel-and object-based classification algorithms to map and quantify changes between 2004 and 2012 at the village level, with a particular emphasis on a locally appropriate context for characterizing elements of small-town development and economic change in rural settlements of Mozambique.

Study Area
This study is focused on assessing the changes in village structure as evidence of community-level economic development following the establishment of the Great Limpopo Transfrontier Park (GLTP) in southern Mozambique in 1999. The GLTP was established under the general principles of the transboundary resource management approaches, which aim to promote both conservation and economic development goals [51]. Our study focuses on two villages within the GLTP that are located close to each other (~3 km apart) and are within the designated park border zone ( Figure 1). Village 1 was well established in its current location before the GLTP and covers approximately 60 ha. Village 2 was also present within the park before the establishment of the GLTP, however, due to rebuilding following a flooding event, its area nearly doubled in 2003 [50] with the resultant area of 24 ha by 2004 (the beginning of this study).
In both villages, as is typical for rural Mozambique, the primary surface objects include structures for housing, crop storage, and livestock holding pens (known as, and hereinafter referred to as, corrals), against the background of compacted bare soil, occasional trees, and tall hedge fences. The housing and storage buildings are usually small (visual estimates within 2012 VHR imagery show 4-10 m 2 buildings), although communal structures such as schools, clinics, or general stores can be substantially larger (e.g., 20 m 2 within one of the villages). Traditionally, buildings are constructed out of natural materials but the roofing materials vary, i.e. more traditional housing structures typically have thatch roofs (Figure 2a), whereas more affluent villagers use metal sheeting as their roofing material ( Figure 2b). Corrals usually present a circular enclosure with access to water and occasional trees for shade within the enclosed area and fences constructed out of wood ( Figure 2c). Based on the visual assessment of the VHR imagery, the corrals range in size roughly between 7 and 40 m in diameter.

Image Classification
In this project, we used two multispectral VHR images that were acquired in 2004 and 2012 ( Table 1). The two VHR images were georegistered to each other using manually selected ground control points across the study area. The images were then subset to the visually identified maximum extent within each of the villages for both images and only subsets were subsequently used in the classification. The apparent clouds in the images did not impact the villages of interest and therefore there was no need for image preprocessing requiring cloud and shadow removal. Our initial evaluation of the two images revealed that the spectral signatures of metal roofs and corrals (likely due to the existence of moist materials and livestock waste on the ground inside corrals) are distinctly different from other objects and, thus, they could be mapped directly based on spectra-based classification methods. However, the buildings with roofs constructed from natural materials (i.e., thatch roofs) are spectrally indistinguishable from the surrounding land surface (partially senescent grass and bare ground) and cannot be successfully separated from the background spectrally. Based on this, we designed a fused classification scheme that incorporated both spectral-and object-based mapping approaches ( Figure 3). Metal roofs represent impervious man-made materials with a distinctly different and easily identifiable spectral signature (high reflectance in the visible range and very low reflectance in the near-infrared (NIR) range). Although corrals are also constructed of natural materials, they visually appear to be spectrally different from other objects within the village context. In contrast, buildings with thatch roofs are visually and spectrally indistinguishable from the background soil and senescent vegetation. However, their three-dimensional structure produces a shadow component that can be identified spectrally based on its very low reflectance across all bands of the imagery. Trees are the only other objects within the village that produce a similar shadow component, however, the crowns of the trees are photosynthetically active and thus can be differentiated spectrally from buildings. The association between the shadows and the objects (either trees or structures) can be assumed reliably as the shadows are always spatially adjacent to the objects in the direction opposite to the solar azimuth angle.

Image Classification
In this project, we used two multispectral VHR images that were acquired in 2004 and 2012 ( Table 1). The two VHR images were georegistered to each other using manually selected ground control points across the study area. The images were then subset to the visually identified maximum extent within each of the villages for both images and only subsets were subsequently used in the classification. The apparent clouds in the images did not impact the villages of interest and therefore there was no need for image pre-processing requiring cloud and shadow removal. Our initial evaluation of the two images revealed that the spectral signatures of metal roofs and corrals (likely due to the existence of moist materials and livestock waste on the ground inside corrals) are distinctly different from other objects and, thus, they could be mapped directly based on spectra-based classification methods. However, the buildings with roofs constructed from natural materials (i.e., thatch roofs) are spectrally indistinguishable from the surrounding land surface (partially senescent grass and bare ground) and cannot be successfully separated from the background spectrally. Based on this, we designed a fused classification scheme that incorporated both spectral-and object-based mapping approaches ( Figure 3). Metal roofs represent impervious manmade materials with a distinctly different and easily identifiable spectral signature (high reflectance in the visible range and very low reflectance in the near-infrared (NIR) range). Although corrals are also constructed of natural materials, they visually appear to be spectrally different from other objects within the village context. In contrast, buildings with thatch roofs are visually and spectrally indistinguishable from the background soil and senescent vegetation. However, their three-dimensional structure produces a shadow component that can be identified spectrally based on its very low reflectance across all bands of the imagery. Trees are the only other objects within the village that produce a similar shadow component, however, the crowns of the trees are photosynthetically active and thus can be differentiated spectrally from buildings. The association between the shadows and the objects (either trees or structures) can be assumed reliably as the shadows are always spatially adjacent to the objects in the direction opposite to the solar azimuth angle.  During the spectra-based classification stage, the image subsets were classified into six surface type classes (i.e., metal roofs, corrals, shadow, grass, tree crowns, and bare ground) based on the pixel values. For each of the two images (one for 2004 and one for 2012), around 1000 random sample points were selected capturing the spectral signatures of the six classes. Table 2 lists the number of sample points that were selected for each class for the training purpose. Among the sample points selected for each year, the number of points assigned to each class was determined based on two factors: (1) the estimated proportion of that class in terms of its area over the total area of all six classes in each image and (2) the level of variation in the spectral signature of that class across the image. There were no training samples selected for the grass class in the classification of the 2012 image because the richer spatial and spectral details contained in that image allowed the subsequent object-based classification to achieve satisfactory results in mapping metal and thatch roofs and corrals without the need to include the grass class as an intermediate class. It is also noted that the training samples were selected within Village 1 because it contains all six targeted classes and is larger than Village 2. Considering the high spectral complexity typically associated with the VHR imagery, random forest classification [52] was used due to the nonparametric nature of this method, which allows for the discontinuous spectral distribution of pixels within a single class [53]. It was implemented using the "randomForest" package [54] in R [55]. All spectral bands of each image (i.e., 4 for the 2004 image and 8 for the 2012 image) were used in the random forest classification to maximize the model performance. The surface type classes mapped by the random forest classifier were subsequently vectorized to produce spatially contiguous objects. Then, each object was reclassified based on the size of the object, its surface type, and its contextual relationship with its neighbors. Table 3 summarizes the re-assignment rules used in the post-classification procedure to identify buildings with metal and thatch roofs and corrals. All other surface objects were combined into the "Other" class. This object-based classification procedure was implemented using the ArcPy module [56] within the Python environment.  During the spectra-based classification stage, the image subsets were classified into six surface type classes (i.e., metal roofs, corrals, shadow, grass, tree crowns, and bare ground) based on the pixel values. For each of the two images (one for 2004 and one for 2012), around 1000 random sample points were selected capturing the spectral signatures of the six classes. Table 2 lists the number of sample points that were selected for each class for the training purpose. Among the sample points selected for each year, the number of points assigned to each class was determined based on two factors: (1) the estimated proportion of that class in terms of its area over the total area of all six classes in each image and (2) the level of variation in the spectral signature of that class across the image. There were no training samples selected for the grass class in the classification of the 2012 image because the richer spatial and spectral details contained in that image allowed the subsequent object-based classification to achieve satisfactory results in mapping metal and thatch roofs and corrals without the need to include the grass class as an intermediate class. It is also noted that the training samples were selected within Village 1 because it contains all six targeted classes and is larger than Village 2. Considering the high spectral complexity typically associated with the VHR imagery, random forest classification [52] was used due to the nonparametric nature of this method, which allows for the discontinuous spectral distribution of pixels within a single class [53]. It was implemented using the "randomForest" package [54] in R [55]. All spectral bands of each image (i.e., 4 for the 2004 image and 8 for the 2012 image) were used in the random forest classification to maximize the model performance. The surface type classes mapped by the random forest classifier were subsequently vectorized to produce spatially contiguous objects. Then, each object was reclassified based on the size of the object, its surface type, and its contextual relationship with its neighbors. Table 3 summarizes the re-assignment rules used in the post-classification procedure to identify buildings with metal and thatch roofs and corrals. All other surface objects were combined into the "Other" class. This object-based classification procedure was implemented using the ArcPy module [56] within the Python environment.

Accuracy Assessment
The accuracy of the classification results of the two images was assessed independently. For each image, we generated 100 random points (different from the sample points that were used to extract training data) evenly distributed across the four classes (25 points for metal roofs, thatch roofs, corrals, and others) within each village, resulting in a total of 200 points for each of the mapping periods 2004 and 2012. The accuracy assessment followed a double-blind approach under which different analysts drew accuracy assessment samples and conducted validation. Following the protocol, an analyst with no prior knowledge of the classification results assigned each point to one of the four classes based on the visual interpretation of the original VHR images. Two separate confusion matrices with associated overall accuracy are provided for accuracy assessment for 2004 and 2012.

Accuracy Assessment Outcomes
The confusion matrices and associated accuracy statistics generated for the 2004 and 2012 images are shown in Tables 4 and 5, respectively. Although both classifications achieved high overall accuracy, the overall accuracy of the 2012 classification (94%) is higher than that of the 2004 classification (86%). Metal roofs and corrals were identified reliably in both years with both User's and Producer's accuracies for each of the classes exceeding 94%. The thatch roofs were identified more successfully in the 2012 classification (88% and 94% for User's and Producer's Accuracy, respectively). Within the 2004 image, the thatch roofs were mapped less successfully (58% and 88% for User's and Producer's Accuracy, respectively). Overall Accuracy: 86%

Change in Village Composition between 2004 and 2012
Our classification approach is designed to deliver identification of individual objects but has an inherent error in providing the estimate of the footprint size of the individual objects, resulting from the identification of an object based on its shadow component and the substantial difference in spatial resolution between images for 2004 and 2012. Therefore, we focus our analysis on comparing the number of individual building objects rather than evaluating the likely change in their sizes. In addition, since the metal roofs serve as a metric for assessing changes in the wealth of villagers, we analyze the changes in the number of houses with metal roofs separately. In contrast, because corrals usually represent objects much larger than an individual pixel (>8 and >50 pixels for 2004 and 2012 classifications, respectively), we are able to assess change both in the number of corrals and their total area over time.
Our results show that the two villages within this study differ greatly in the extent of the villages, the number of structures, the composition of roofing materials, and the area of livestock holdings. The larger of the two settlements-Village 1-had a total of 721 buildings in 2004, 18% of which had metal roofs, although there was a clear dominance of traditional thatch roofs in all structures (Table 6). By 2012, the number of buildings with metal roofs more than doubled (+117%) while the total number of buildings in the village increased only modestly (+11%) due to a decrease in thatch roof structures (−13%) ( Table 6). There are no distinct clustering patterns in the distribution of thatch vs. metal roofing material within this village (metal shown in blue and thatch shown in green in Figure 4) except a small number of larger metal roof structures in the center of the village. Village 1 represents a typical clustered rural settlement with an overall circular shape with a distinct village center (visible in Figure 4 as a patch with low building density roughly in the center of the village). Despite the overall circular shape of the village, the buildings are organized along a grid. Although the number of buildings increased slightly over the eightyear period, the total extent and the spatial configuration of Village 1 remained generally unchanged. As is customary for the region, livestock is contained in corrals located along the periphery of the village. Between 2004 and 2012, both the number of individual structures and the total area of corrals around Village 1 grew substantially (Table 6 and Figure 4). The total area occupied by corrals grew by 52% indicating a considerable increase in livestock holding capacity.
In contrast, Village 2, which rapidly increased in land area following a flooding event in 2003, had a much lower number of buildings in both 2004 (282) and 2012 (248) ( Table 6). The extent of Village 2 expanded by 67% between 2004 and 2012, which was much larger than the expansion rate of Village 1 (14%). Considering the total number of buildings in Village 2 actually decreased during the eight-year period, this expansion occurred through a de-densifying distribution of buildings rather than the rapid growth of the total number of structures. Although Village 2 is considered a cohesive village, the housing units are considerably more dispersed and isolated, making it challenging to identify the boundaries of the settlement precisely. By 2012, Village 2 had a building density of~6 buildings per ha compared to~12 buildings per ha in Village 1. In 2004, only 39 buildings with metal roofs and 243 buildings with thatch roofs were mapped. By 2012, the total number of buildings decreased (−12%) primarily due to a sharp decline (−38%) in buildings with thatch roofs. Although there was substantial growth in the number of buildings with metal roofs (149%), the drop in the number of traditional housing units outweighed the gain in the improved building structures. The overall configuration of Village 2 is linear, indicating general village appearance and growth along a local road network. Although the corrals continue to be located on the periphery of the settlement, because of their linear structure they appear to be spread out within the village itself. Interestingly, the total number of corrals decreased between 2004 and 2012 from eight to five, although the total area occupied by corrals grew by 41%. Another noteworthy observation is that all but one of the corrals mapped in 2004 were not visible in 2012 and completely new structures were established in different locations ( Figure 5). The major growth in the total area of corrals around Village 2 is attributable primarily to the establishment of a single large corral along the southwest boundary of the village. Similar to Village 1, there is no discernable spatial clustering pattern in the distribution of metal and thatch roofing across Village 2. Moreover, the spatial expansion of Village 2 is achieved through building out both traditional (thatch roofs) and modern (metal roofs) structures.
Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 16 buildings per ha compared to ~12 buildings per ha in Village 1. In 2004, only 39 buildings with metal roofs and 243 buildings with thatch roofs were mapped. By 2012, the total number of buildings decreased (−12%) primarily due to a sharp decline (−38%) in buildings with thatch roofs. Although there was substantial growth in the number of buildings with metal roofs (149%), the drop in the number of traditional housing units outweighed the gain in the improved building structures. The overall configuration of Village 2 is linear, indicating general village appearance and growth along a local road network. Although the corrals continue to be located on the periphery of the settlement, because of their linear structure they appear to be spread out within the village itself. Interestingly, the total number of corrals decreased between 2004 and 2012 from eight to five, although the total area occupied by corrals grew by 41%. Another noteworthy observation is that all but one of the corrals mapped in 2004 were not visible in 2012 and completely new structures were established in different locations ( Figure 5). The major growth in the total area of corrals around Village 2 is attributable primarily to the establishment of a single large corral along the southwest boundary of the village. Similar to Village 1, there is no discernable spatial clustering pattern in the distribution of metal and thatch roofing across Village 2. Moreover, the spatial expansion of Village 2 is achieved through building out both traditional (thatch roofs) and modern (metal roofs) structures.

Village Structure as a Proxy for Socio-Economic Standing
The economic status of a region is typically estimated based on income or expenditures [57,58]. However, in developing countries, the conventional income-and expenditurebased accounting measures are commonly hindered by low survey frequency and reliability, and a lack of compatibility between countries [58]. It is against this backdrop that remote sensing has emerged as a powerful tool in inferring the well-being of a region, thanks to its wide spatial coverage and the ability to identify various indicators including green spaces [59], building size, counts, and types [60,61], and impervious surface cover. In this study, we focus on mapping three types of surface objects that are common in remote villages in Mozambique: metal roof buildings, thatch roof buildings, and corrals. Through the comparison of mapping results between 2004 and 2012, we are able to infer how the economic status of the villages under evaluation had changed during the eight-year period. Our analysis indicates a general and widespread shift towards metal roofing in the case study villages. Between 2004 and 2012, the percentage of structures with metal roofing doubles. By 2012, villages use metal roofing for over a third of all structures. Moreover, our findings indicate that renovations to existing structures (i.e., replacing thatch roofing with metal) drive the observed changes rather than the construction of new structures. While natural resources such as thatching grass can be harvested by village residents, metal materials must be purchased. The increased use of metal as a roofing material suggests that the economic conditions of residents have improved. Our finding that the land area for livestock holdings has increased in both villages provides a further indication of improved economic well-being. The magnitude of investments made to renovate existing structures and expand corral areas also indicates that small-town development is occurring within the buffer zone of a protected area.

Challenges and Limitations of the Mapping Algorithm
In this study, we developed an algorithm to classify two single-date VHR images that were almost 10 years apart for two villages in Mozambique, based on which the change in the socio-economic status of the two villages during that period of time was inferred. While our algorithm performed satisfactorily for our case study, it is subject to the influences of a series of potential sources of uncertainties, which, for wider applications, especially those in other environmental and temporal settings, may greatly affect the performance of the proposed algorithm. Here we would like to discuss two of the main uncertainty sources.
First; the complications associated with using shadows as a major source of information. Shadows have been used by a series of studies to identify buildings [62][63][64] and estimate building height [65] based on VHR imagery. While they provide a unique opportunity to map man-made buildings and structures when required spectral information is limited, their reliability is dependent on a series of factors including the environmental conditions, the image parameters, and the targets to be identified. For our case study, because our villages of interest are located in dryland ecosystems and their economic status is relatively low, the spectral characteristics of bare ground are comparatively homogeneous and shadow-casting ground objects are less diverse and well-spaced. This made shadows a reliable source of information for our project. Under situations where the ground features are more spectrally diverse (e.g., the surface being a mixture of bare ground and impervious surface), or more complicated shadow-casting features exist (e.g., larger and irregularly shaped tree crowns), or the density of buildings is high, shadows may become a feature that is either difficult to be extracted or does not have a one-to-one association with the ground objects. In those situations, shadows may negatively affect the classification accuracy instead of improving it, if they are not properly accounted for by the mapping algorithms. In addition, since the discernibility of shadows is generally positively correlated with the solar zenith angle at the time of image acquisition, this practice of using shadows to aid VHR image classification is expected to be less reliable when working with images that are acquired during local noon.
Second; the inconsistencies between the two images. Since change detections involve more than two images, the performance of change detections is affected by not only the factors influencing the classification accuracy based on the individual images but also the consistencies between the images. In this project, the two VHR images we used are different in terms of a few parameters, including spatial resolution and band numbers (Table 1). With higher resolution and more spectral bands, more spatial and spectral information is present, which likely leads to higher image classification accuracies. Considering our VHR image for 2012 had both higher spatial resolution and more bands than the 2004 image, these differences are most likely responsible for the higher accuracy of the 2012based classification than the 2004-based classification (Tables 4 and 5). In addition to these image-related inconsistencies, another type of inconsistency that stems from the in-flight ability of many VHR sensors to adjust viewing angles can also lead to substantial uncertainties in the change detection performance. While this feature allows the VHR imaging sensors to capture many more images for the same targets, it can lead to large differences in viewing and solar (and, hence, shadow) azimuth angles, lighting conditions, and relief displacement between images captured at two different times, even by the same imaging sensor, ultimately reducing the accuracy of the change detection-oriented image classifications [66]. A third type of inconsistency could exist between images acquired at different times of the year. In this case, strong phenological differences (e.g., snow-on/off and leaf-on/-off statuses) between the image pair could introduce large uncertainties which will propagate into the classification/change detection results. The existence of these inconsistencies means that the image classification algorithm needs to be modified on an image-by-image basis, something we have done in our project.

Technical Innovation and Future Directions
The technical innovations involved in this project are mainly two-fold. First, it is one of the few case studies where shadows are used in VHR-based image classification in a rural setting. While, as mentioned previously, various previous studies have utilized shadows in their image classifications [62][63][64], most of them were implemented in urban environments. Although comparatively speaking, the complexity (in terms of the diversity of the ground features) involved in urban-based image classifications is probably higher than that in rural settings, rural-based image classifications are not necessarily easier because they are commonly associated with a great limitation in image availability. By using shadows, our project was able to take advantage of the two single-date VHR images that were available to us and derive information that may be useful for various decision-makers. Second, this project presents a case study where VHR-based OBIC was implemented without commercial OBIC software. Previous literature reviews [46,67] showed that most studies involving OBIC relied on commercial OBIC software, particularly eCognition. While software like eCognition are powerful tools, they are usually costly and thus may not be appropriate for research efforts with limited budgets. Our project demonstrates that by breaking down the object-based classification processes which are typically handled by OBIC software end-to-end, OBIC can be implemented in different environments and with software that are much cheaper or even free of charge. Today, open-source platforms (such as Python) and free geospatial software (such as QGIS) are quickly gaining popularity among researchers because of their increasing technical capacity in machine learning and image classification [68]. Against this backdrop, we expect to see more OBIC applications that are based exclusively on open-source workflows in the future.
In parallel with the continuous addition of commercial satellite-based VHR image products, another notable trend within the remote sensing community is the ever-increasing usage of Unmanned Aerial Systems (UASs). UASs are capable of providing VHR imagery with ultra-high spatial resolutions (i.e., at cm-levels) [69]. This, coupled with other features such as low cost and high efficiency [69,70], makes UAS a great candidate for mapping buildings in remote settlements. Successful applications of such kind have been demonstrated in villages in China [71], Dominican Republic [72], and Rwanda [73]. In addition to the aforementioned advantages, another capacity of UAS-based VHR is the ability to generate the 3-d structure of the buildings. This is achieved by taking advantage of multiple images acquired from different angles, which can be interpreted by photogrammetric software into 3-d point clouds that resemble the structure exteriors [74,75]. Such a practice is achievable using satellite-based VHR imagery, as demonstrated by Partovi, et al. [76], however, it is more suitable when based on UAS-acquired imagery, thanks to the latter's higher resolution and image availability. While, undoubtedly, there are technical challenges (e.g., georegistration [73]), a combination of rich spectral and contextual information and detailed 3-d structure, as provided by VHR imagery acquired by both satellite-based and UAS-based imaging sensors, would most likely significantly increase the accuracy and usability of VHR imagery in mapping man-made structures in rural communities.  Data Availability Statement: The classification maps and code produced through this project are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.