Spatial and Temporal Human Settlement Growth Differentiation with Symbolic Machine Learning for Verifying Spatial Policy Targets: Assiut Governorate, Egypt as a Case Study

Since 2005, Egypt has a new land-use development policy to control unplanned human settlement growth and prevent outlying growth. This study assesses the impact of this policy shift on settlement growth in Assiut Governorate, Egypt, between 1999 and 2020. With symbolic machine learning, we extract built-up areas from Landsat images of 2005, 2010, 2015, and 2020 and a Landscape Expansion Index with a new QGIS plugin tool (Growth Classifier) developed to classify settlement growth types. The base year, 1999, was produced by the national remote sensing agency. After extracting the built-up areas from the Landsat images, eight settlement growth types (infill, expansion, edge-ribbon, linear branch, isolated cluster, proximate cluster, isolated scattered, and proximate scattered) were identified for four periods (1999:2005, 2005:2010, 2010:2015, and 2015:2020). The results show that prior to the policy shift of 2005, the growth rate for 1999–2005 was 11% p.a. In all subsequent periods, the growth rate exceeded the target rate of 1% p.a., though by varying amounts. The observed settlement growth rates were 5% (2005:2010), 7.4% (2010:2015), and 5.3% (2015:2020). Although the settlements in Assiut grew primarily through expansion and infill, with the latter growing in importance during the last two later periods, outlying growth is also evident. Using four class metrics (number of patches, patch density, mean patch area, and largest patch index) for the eight growth types, all types showed a fluctuated trend between all periods, except for expansion, which always tends to increase. To date, the policy to control human settlement expansion and outlying growth has been unsuccessful.


Introduction
Currently, urban land consumption is a significant challenge for sustainable development. As the global population is expected to reach 9.7 billion by 2050 [1], a substantial amount of further human settlement growth (HSG) is anticipated, especially in developing countries. Accordingly, it is projected that built-up areas in developing countries may increase from 300,000 km 2 in 2000 to 770,000 km 2 in 2030, and 1,200,000 km 2 by 2050 [2]. With almost 60% of the world's population already experiencing a critical food-deficit [3], the further consumption of agricultural land by HSG increases food insecurity in certain (adapted from [6]).
To observe such dynamics and types of HSG, several researchers have utilized remote sensing [6][7][8]. Recently, machine learning algorithms that extract built-up areas from remote sensing data have started to play a significant role in extracting built-up area estimates [9]. Pesaresi et al. [9] introduced a supervised classification technique in urban remote sensing called symbolic machine learning (SML). This technique was used earlier in ecological modeling [10]. SML allows users to find systematic relations between sequenced data (image instances) and a spatial reference set containing encoded information [11].
In contrast with neural networks, which mostly produce black-box models, SML allows users to modify and verify modeling dynamics [10]. Unlike other machine learning techniques (e.g., logistic regression, linear regression, decision trees, support vector machines, clustering, dimensionality reduction, and density estimation) that are used in remote sensing, SML can deal with datasets that have variable characteristics (e.g., different sensors, and atmospheric characteristics) using less time, training sets, and complex learning processes [11]. SML was used in extracting the global built-up area producing a data set that is called Global Human Settlement Layer (GHSL) [12]. Although the GHSL provides useful information to monitor HGS in four temporal points (1975,1990,2000,2014), the accuracy of this dataset was not tested only for Europe [12].
For the spatial analysis of HSG, Geographic Information Systems (GIS) provide tools such as spatial metrics to measure and visualize the Spatio-temporal patterns and changes [6,[13][14][15][16][17][18][19][20]. Spatial metrics are quantitative measures based on a patch-based representation of the spatial phenomenon under investigation (e.g., HSG) and may be used to identify and characterize different types of HSG (e.g., class metrics). Identifying various types of HSG supports the assessment of land development policies, such as sustainable development and smart growth [13]. For example, the high presence of scattered growth indicates that the land development pattern is sprawling rather than compact [21], the latter often promoted under sustainable urban development policies.
Egypt is a developing country that is expected to have rapid HSG, especially in the Nile Valley region [22]. Between 1907 and 2018, Egypt's population grew from 14 million to 104 million inhabitants [23], and this growth was accompanied by uncoordinated HSG, especially after 1952 [24]. Since the 1970s, three strategies to manage HSG have been adopted. The first was to expand in the (adapted from [6]).
Recently, machine learning algorithms that extract built-up areas from remote sensing data have started to play a significant role in extracting built-up area estimates [9]. Pesaresi et al. [9] introduced a supervised classification technique in urban remote sensing called symbolic machine learning (SML). This technique was used earlier in ecological modeling [10]. SML allows users to find systematic relations between sequenced data (image instances) and a spatial reference set containing encoded information [11].
In contrast with neural networks, which mostly produce black-box models, SML allows users to modify and verify modeling dynamics [10]. Unlike other machine learning techniques (e.g., logistic regression, linear regression, decision trees, support vector machines, clustering, dimensionality reduction, and density estimation) that are used in remote sensing, SML can deal with datasets that have variable characteristics (e.g., different sensors, and atmospheric characteristics) using less time, training sets, and complex learning processes [11]. SML was used in extracting the global built-up area producing a data set that is called Global Human Settlement Layer (GHSL) [12]. Although the GHSL provides useful information to monitor HGS in four temporal points (1975,1990,2000,2014), the accuracy of this dataset was not tested only for Europe [12].
For the spatial analysis of HSG, Geographic Information Systems (GIS) provide tools such as spatial metrics to measure and visualize the Spatio-temporal patterns and changes [6,[13][14][15][16][17][18][19][20]. Spatial metrics are quantitative measures based on a patch-based representation of the spatial phenomenon under investigation (e.g., HSG) and may be used to identify and characterize different types of HSG (e.g., class metrics). Identifying various types of HSG supports the assessment of land development policies, such as sustainable development and smart growth [13]. For example, the high presence of scattered growth indicates that the land development pattern is sprawling rather than compact [21], the latter often promoted under sustainable urban development policies.
Egypt is a developing country that is expected to have rapid HSG, especially in the Nile Valley region [22]. Between 1907 and 2018, Egypt's population grew from 14 million to 104 million inhabitants [23], and this growth was accompanied by uncoordinated HSG, especially after 1952 [24]. Since the 1970s, three strategies to manage HSG have been adopted. The first was to expand in the adjacent desert of the Nile Valley and Delta region [25,26]. The second was to implement a land-use Remote Sens. 2020, 12, 3799 3 of 23 development plan at the national level [27]. The third was to criminalize building on agricultural land [28]. In 1986, the first national land-use development plan in rural areas was adopted to preserve agricultural land and manage HSG [29]. Despite these strategies, Egypt lost more than one million acres (404,670 hectares) of its best agricultural land between 1980 and 2004 because of HSG [30], while between 1984 and 2007, the built-up area on agricultural lands has doubled [31]. By 2015, Khalifa reported that 85% of unplanned (informal) settlements were built on agricultural land [32]. This transformation contributed to the decline in agricultural jobs, the formation of unplanned settlements with a lack of local services and open spaces, land fragmentation, and pollution [24].
In 2003, the central government realized that the policy toward HSG had to be changed [24]. First, planning was decentralized [24]. The local authorities and the governorate received more power in planning and implementation processes. Second, the existing built-up area was densified by allowing vertical expansion and utilizing undeveloped pockets [24]. Accordingly, the (second) national project for land-use development plans in rural areas was introduced in 2005 [27]. This project aimed to reduce the annual HSG rate to be around 1% until 2020 by directing HSG to fill in the vacant pockets (infill) and preventing all types of outlying growth [33]. However, in 2011 the Arab Spring disrupted all ongoing projects in Egypt [34]. Even though the third land-use development plan for 2020-2035 is now in preparation, until now, no study has evaluated recent policies on HSG in Egypt and Assiut Governorate in particular.
From a methodological perspective, we aimed to show how new machine learning techniques can improve the accuracy of human settlement data sets to enable settlement growth analysis, which goes beyond that shown in Figure 1. From a substantive perspective, we use these data sets to answer three questions. Did the 2005 land-use development plan manage to control the HSG rate to the desired level? Was the land-use plan able to promote infill growth and prevent outlying growth? How was the HSG pattern affected by the Arab Spring? The analysis was executed with data from 1999, 2005, 2010, 2015, and 2020.
In our approach, machine learning and remote sensing were used for extracting the built-up area. GIS, Landscape Expansion Index, and class metrics were used to analyze the spatiotemporal characteristics of HSG in the study area and identify growth types that occurred during the study period. Further, we developed a QGIS Growth Classifier, a plugin tool that fosters the analysis of HSG types. We also introduce five new HSG types that highlight the impact of roads and corridors on settlement growth. Free data sources and tools were used.

Materials and Methods
To assess the impact of the policy shift on HSG, five steps were followed: (1) extract the built-up area from satellite imageries using SML; (2) validate the results and the training sets; (3) calculate the Landscape Expansion Index (infill growth, edge-expansion, and outlying growth); (4) identify outlying growth types (linear branch, proximate scattered isolated scattered, proximate scattered, and isolated clustered); (5) compute and compare some class metrics for the various growth types. After the description of the study area and input data, these steps are described in detail.

Study Area
We examined the situation in the Nile Valley region. With approximately 1000 km length and an average of 12 km width (Figure 2a), the Nile Valley region represents 1.2% of the country's total area [23]. In contrast, it provides a living environment for 28% of the total population [23]. Although the agriculture sector contributes to 63% of rural employment and 40% of rural income for the Nile Valley inhabitants [23], 7.5% of the total agricultural area was lost between 1984 and 2007 due to HSG [31].
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 24 governorate in the Nile Valley region [35]. Assiut Governorate consists of 11 centers, each of which has a city, and a total number of 235 villages. Assiut's population was 2.9 million in 1999 and had increased to 4.5 million inhabitants by 2018 (2.9% p.a. growth) [23], representing approximately 4.2% of Egypt's total population.

Input Data
As shown in Table 1, five inputs were used: Assiut built-up area in 1999, four Landsat images, Assiut roads and water bodies, the Global Human Settlement Layer (GHSL), and the African Land Cover (ALC). The "Shapefile" for Assiut's built-up area in 1999 was obtained from the National Agency for Remote Sensing and Space (NARSS), which gives a total built-up area of 85.5 km 2 for Assiut, as shown in Figure 3. Landsat images for the years 2005, 2010, 2015, and 2020 were used to extract the built-up area. Landsat data was chosen because it provides a free image for the same place every 16 days since the 1980s. Assiut is located within Landsat path 176, row 41. The Landsat 7 images were pre-processed by and obtained from Maryland University's Global Land Cover Facility database (GLCF) [36], whereas Landsat 8 images were pre-processed by and collected from the United States Geological Survey (USGS) [37].
The Global Human Settlement Layer (GHSL) [38] was used as a training set for the 2005 and 2010 built-up areas' extraction process, as demonstrated further in the following section. The GHSL The Nile Valley consists of eight governorates ( Figure 2b). Assiut Governorate, which is located in the middle region of the Nile Valley approximately 400 km south of Cairo, was chosen because it has been experiencing rapid HSG since the 1950s. Moreover, between 2011 and 2014, it has reportedly experienced the highest number of encroachments on agricultural land by unplanned HSG of any governorate in the Nile Valley region [35]. Assiut Governorate consists of 11 centers, each of which has a city, and a total number of 235 villages. Assiut's population was 2.9 million in 1999 and had increased to 4.5 million inhabitants by 2018 (2.9% p.a. growth) [23], representing approximately 4.2% of Egypt's total population.

Input Data
As shown in Table 1, five inputs were used: Assiut built-up area in 1999, four Landsat images, Assiut roads and water bodies, the Global Human Settlement Layer (GHSL), and the African Land Cover (ALC). The "Shapefile" for Assiut's built-up area in 1999 was obtained from the National Agency for Remote Sensing and Space (NARSS), which gives a total built-up area of 85.5 km 2 for Assiut, as shown in Figure 3.  Landsat images for the years 2005, 2010, 2015, and 2020 were used to extract the built-up area. Landsat data was chosen because it provides a free image for the same place every 16 days since the 1980s. Assiut is located within Landsat path 176, row 41. The Landsat 7 images were pre-processed by and obtained from Maryland University's Global Land Cover Facility database (GLCF) [36], whereas Landsat 8 images were pre-processed by and collected from the United States Geological Survey (USGS) [37].
The Global Human Settlement Layer (GHSL) [38]  The African Land Cover (ALC), which was developed by the European Space Agency (ESA) [39], was used for extracting the built-up area for 2015 and 2020. ALC is a thematic land cover map for Africa at a 20m spatial resolution. ALC and GHSL were considered training sets, rather than the final estimation for the built-up area because they have not yet been validated with enough data covering all regions in the world [12].
The roads and water bodies (e.g., irrigation canals) layer ( Figure 3), which was obtained from OpenStreetMap (OSM) [40] and updated, was used for identifying two types of HSG: edge-ribbon and linear branch. OSM allows users to digitize different objects, such as buildings, roads, and water bodies. Approximately 50% of Assiut's roads and water bodies were added to the OSM data set by the lead author in order to have full coverage of the study area.  [41]. MASADA 1.3 allows the user to use multispectral satellite imagery regardless of its spatial and radiometric resolution. Further, it supports high to very high-resolution imageries (10 m to 0.5 m) with an additional two embedded models for extracting textural and morphological features as inputs to the SML. In the current study, there is no use for these embedded models because we relied on Landsat imagery. The software offers an assessment for the output using a reference set (e.g., a thematic map for the built-up area that had already been validated). In this study, we also did not rely on this feature due to the unavailability of reference sets. Besides the satellite bands, a training set for the built-up area and water is needed to execute the analysis.

Extracting the Built-Up Area Using Symbolic Machine Learning
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 24 Like many machine learning algorithms, SML is a supervised classification method for extracting information (e.g., built-up area estimations) from big data (e.g., multispectral satellite bands) [9]. SML consists of two steps: (1) data reduction and quantization; (2) evaluating the association between the symbolic data and a reference set [11].
Data reduction and quantization have three main phases [9]. First, a taxonomy for each input data (e.g., multispectral imagery bands) is generated. This phase aims to reduce the colors in each spectral band. Second, the sequences of each pixel in all inputs are constructed. Third, the unique sequence for each pixel is defined based on the occurrence in frequency analysis results. After obtaining a symbolic data layer from the data reduction and quantization step, a comparison between the symbolic data layer and the training set (e.g., GHSL) was conducted to determine each pixel's value (class). Accordingly, a confidence measure (the Evidence-based Normalized Differential Index (ENDI)) was developed for evaluating the association between the symbolic data and training data. Based on this evaluation, each pixel's value was used as a decision criterion for the final classification output. MASADA 1.3 uses an automatic threshold method that identifies the threshold to minimize the intra-class variance between built-up and non-built-up areas. The SML was used before for extracting the Global Human Settlement Layer (GHSL) [12]. MASADA 1.3 radiometric workflow produced nine raster output maps and five documents. All but one, the map "BU_class.tif" which is the final estimated built-up area, are intermediate outputs.
The five output documents measure the analysis performance and state the logs and inputs of the analysis.
To allow using any multispectral optical sensor, MASADA 1.3 software facilitates the user to tune the algorithm's parameters. For the SML supervised classification of medium resolution sensors (e.g., Landsat), three significant parameters could be tuned to reach the optimal condition for the classification: rad_q_minlev, rad_q_maxlev, and rad_qlev, as presented in Table 2. Rad_qlev parameter [41] determines the number of levels used to reduce the input satellite bands' colors. The optimal number of levels should result in average support (smlAvgSuppRad) [41] between 100 and 1000. The software calculates the smlAvgSuppRad after each run. Rad_q_minlev and rad_q_maxlev are used to exclude the outliers from the input data.
The new built-up area during each period was identified by taking the last time-step as the baseline [12,42]. Accordingly, the classification result for 2020 was subtracted from the former temporal point classification (i.e., 2015) to obtain the built-up area that was built between 2015 and 2020. The process of subtracting the classification of one temporal point from the former was followed until reaching 1999, which is the first time-step in this study.

Parameter Name
Parameter Definition rad_q_minlev The minimum cut-off value for rescaling the radiometric bands before quantization rad_q_maxlev The maximum cut-off value for rescaling the radiometric bands before quantization rad_qlev Number of levels to reduce the radiometric data Like many machine learning algorithms, SML is a supervised classification method for extracting information (e.g., built-up area estimations) from big data (e.g., multispectral satellite bands) [9]. SML consists of two steps: (1) data reduction and quantization; (2) evaluating the association between the symbolic data and a reference set [11].
Data reduction and quantization have three main phases [9]. First, a taxonomy for each input data (e.g., multispectral imagery bands) is generated. This phase aims to reduce the colors in each spectral band. Second, the sequences of each pixel in all inputs are constructed. Third, the unique sequence for each pixel is defined based on the occurrence in frequency analysis results. After obtaining a symbolic data layer from the data reduction and quantization step, a comparison between the symbolic data layer and the training set (e.g., GHSL) was conducted to determine each pixel's value (class). Accordingly, a confidence measure (the Evidence-based Normalized Differential Index (ENDI)) was developed for evaluating the association between the symbolic data and training data. Based on this evaluation, each pixel's value was used as a decision criterion for the final classification output. MASADA 1.3 uses an automatic threshold method that identifies the threshold to minimize the intra-class variance between built-up and non-built-up areas. The SML was used before for extracting the Global Human Settlement Layer (GHSL) [12]. MASADA 1.3 radiometric workflow produced nine raster output maps and five documents. All but one, the map "BU_class.tif" which is the final estimated built-up area, are intermediate outputs. The five output documents measure the analysis performance and state the logs and inputs of the analysis.
To allow using any multispectral optical sensor, MASADA 1.3 software facilitates the user to tune the algorithm's parameters. For the SML supervised classification of medium resolution sensors (e.g., Landsat), three significant parameters could be tuned to reach the optimal condition for the classification: rad_q_minlev, rad_q_maxlev, and rad_qlev, as presented in Table 2. Rad_qlev parameter [41] determines the number of levels used to reduce the input satellite bands' colors. The optimal number of levels should result in average support (smlAvgSuppRad) [41] between 100 and 1000. The software calculates the smlAvgSuppRad after each run. Rad_q_minlev and rad_q_maxlev are used to exclude the outliers from the input data.

Parameter Name
Parameter Definition rad_q_minlev The minimum cut-off value for rescaling the radiometric bands before quantization rad_q_maxlev The maximum cut-off value for rescaling the radiometric bands before quantization rad_qlev Number of levels to reduce the radiometric data The new built-up area during each period was identified by taking the last time-step as the baseline [12,42]. Accordingly, the classification result for 2020 was subtracted from the former temporal point classification (i.e., 2015) to obtain the built-up area that was built between 2015 and 2020. The process of subtracting the classification of one temporal point from the former was followed until reaching 1999, which is the first time-step in this study.

Accuracy Assessment for the Training Sets and Classification Results
For accuracy assessment, a dataset that consists of 500 random points was used to validate the classification results (2005, 2010, 2015, and 2020) as well as the training sets (the GHSL built-up dataset for 2014, which consists of layer 3, 4, 5, and 6 (GHSL_2014) and ALC) [43]. Meanwhile, the GHSL built-up dataset for 2000, which consists of layers 4, 5, and 6 (GHSL_2000), was validated using the 1999 built-up area. By calculating the sensitivity and specificity [44], the training set that has more probability for detecting built-up area (sensitivity) and the less probability for false detection (less omission error) could be identified. Moreover, the classification results could be compared to the training sets (inputs) to determine which gives more probability for detecting urban areas. This procedure allows determining whether the outputs are more accurate than the training sets.

Identifying Human Settlement Growth Types
After obtaining the built-up area during each period, a QGIS plugin tool (Growth Classifier) [45], which we developed to identify HSG types based on the Landscape Expansion Index (LEI) [7,15,19,46], was applied to the four built-up area epochs (outputs step 1). Using buffer analysis, LEI depends on three rules for the identification of urban growth types. Rule 1 is that if the buffer around the new patch is mostly intersecting with the old patch, this type is considered infill growth. Rule 2 is that if the buffer area around the new patch is partially intersecting with the old patch, this type is edge-expansion. Rule 3 is that if the buffer area around the new patch does not intersect with any existing patch, this growth type is outlying.
The application of these three rules is represented in the following equation: where α is the area of the intersection between the existing patch and buffer area, β is the area of the intersection between the buffer area and the non-built-up area. According to this equation, the LEI values could be between zero and 100. As shown in Figure 5, the value of LEI is >50 for infill growth, whereas the edge-expansion is 0 < LEI < 50. Finally, the LEI for outlying growth is zero. It is worth mentioning that Liu et al. (2010) tested many values for the buffer distance, and they reported that 1 m is the best buffer distance. Colsaet et al. [47] reviewed more than 200 papers, and found that proximity to roads and transport facilities increases land take, which subsequently leads to HSG. Accordingly, we proposed five growth types that examine the effect of corridors on growth types. The first is a sub-type for edge-expansion, which is edge-ribbon (the other four are discussed in the next section). Angel et al. [5] defined ribbon development as built-up areas that are less than 100 m wide and have less than 30% built-up neighborhoods. Verbeek et al. [48,49] identified ribbon development as any patch with 200 m minimum length or if the ratio between the patch length and the adjoining road is more than 80%. In our study, edge-ribbon is any edge-expansion patch that adjoins a corridor (i.e., roads and irrigation networks) such that the patch has a linear shape.
Remote Sens. 2020, 12, 3799 is the area of the intersection between the existing patch and buffer area, is the area of the intersection between the buffer area and the non-built-up area. According to this equation, the LEI values could be between zero and 100. As shown in Figure 5, the value of LEI is >50 for infill growth, whereas the edge-expansion is 0 < LEI < 50. Finally, the LEI for outlying growth is zero. It is worth mentioning that Liu et al. (2010) tested many values for the buffer distance, and they reported that 1 m is the best buffer distance.  Accordingly, and based on Rule 2, Rule 4 is that if a new patch is classified as edge-expansion, adjoining a corridor, its width is less than 100 m, and length is more than 150, this growth type is edge-ribbon. These values were chosen to identify (relatively) dense patches on the sides of the corridor. In Assiut, a previous study reported that the average area for unplanned units is 250 m 2 [35]. Accordingly, the minimum patch area for edge-ribbon growth type would contain an average of 60 units. To identify different patch dimensions for different contexts, the QGIS plugin tool allows tuning the length and the width of the designated edge-ribbon growth type.
To explore the aggregated properties for HSG in Assiut as a whole, the Mean Expansion Index (MEI) and the Area Weighted Mean Expansion Index (AWMEI) were calculated for each period [46]. MEI is the simple average of the LEI values for all patches. MEI is calculated according to the following equation: where LEI i is the LEI for the patch, and N is the total number of newly grown patches in this period. Low MEI means the diffused expansion, and high MEI means a more compact growth. AWMEI is the sum of all new patches LEI values multiplied by the patch's proportional abundance (the patch area divided by the total patches' area). AWMEI is calculated as follows: where a i is the patch area, and A is the total area of the newly grown patches. More compact growth results in higher AWMEI, whereas diffused expansion gets low AWMEI.

Identifying Outlying Growth Types
Using the same QGIS plugin, outlying growth types were identified, as presented in Figure 6. The linear branch was defined as edge-ribbon in terms of width (less than 100 m) and length (more than 150 m). The only difference is that linear branch patches are outlying growth (has no intersection with any existing patch), whereas the edge-ribbon is edge-expansion growth (intersect with at least one existing patch).

Class Metrics
After identifying the eight growth types, four class metrics (number of patches, patch density, largest patch index, and mean patch area) were calculated to compare these growth types [50,51]. The number of patches is the total number of patches in each class. Patch density is the total number of patches divided by area (patches/100 ha). The largest patch index is the area of the largest patch in the class divided by the total area of all patches. The mean patch area is the area of all patches in the class divided by the number of patches. Fragstats was used to compute these metrics [52].

Results of Extracting the Built-Up Area from Satellite Imageries
Three radiometric parameters control the process of quantization: the radiometric quantization minimum level (rad_q_minlev), radiometric quantization maximum level (rad_q_maxlev); the number of radiometric quantization levels (rad_qlev). The radiometric quantization minimum level was 0.01, and the maximum level was 1. After several tests, the optimal number for quantization levels for the years 2005, 2010, 2015, and 2020 imageries were 640, 20, 275, and 130, respectively. This variance between these values may be a result of the atmospheric settings, which vary every day.
As shown in Table 3, SmlAvgSuppRad was within 100 and 1000. Assiut built-up area was 85.5 km 2 in 1999. The newly built area (Figure 7) between 1999 and 2005 was 57.5 km 2 (11% annual growth). In the second period, when the second land-use development plan started, the newly built area was 37.9 km 2 (5.3% annual growth). The third period (2005:2010) had an increase in the newly built area to be 52.6 km 2 (7.3% annual growth). Finally, the newly built area in the fourth period (2015:2020) was 36.5 km 2 (5.1% annual growth). The clustered growth was identified as the new patches of outlying growth that are more than or equal to 2700 m 2 in terms of area. The minimum cluster patch would contain more than ten units. The tool also allows changing the area to fit in a different context. Same as Wilson et al. (2003) [6], we defined scattered as any outlying growth that is neither a linear branch nor clustered.
To investigate the effect of corridors on other outlying types than the linear branch, we proposed Four subtypes. The first is the proximate cluster, which is any patch that was classified as cluster growth as well as being located in the corridor buffer area. In contrast, the second is isolated cluster, representing any cluster patch that is not located inside the corridor buffer area. The third is proximate scattered, which represents the scattered patches that are located inside the corridor buffer area. The fourth is isolated scattered, which is for any scattered patch that is not located inside the buffer area.

Class Metrics
After identifying the eight growth types, four class metrics (number of patches, patch density, largest patch index, and mean patch area) were calculated to compare these growth types [50,51]. The number of patches is the total number of patches in each class. Patch density is the total number of patches divided by area (patches/100 ha). The largest patch index is the area of the largest patch in the class divided by the total area of all patches. The mean patch area is the area of all patches in the class divided by the number of patches. Fragstats was used to compute these metrics [52].

Results of Extracting the Built-Up Area from Satellite Imageries
Three radiometric parameters control the process of quantization: the radiometric quantization minimum level (rad_q_minlev), radiometric quantization maximum level (rad_q_maxlev); the number of radiometric quantization levels (rad_qlev). The radiometric quantization minimum level was 0.01, and the maximum level was 1. After several tests, the optimal number for quantization levels for the years 2005, 2010, 2015, and 2020 imageries were 640, 20, 275, and 130, respectively. This variance between these values may be a result of the atmospheric settings, which vary every day.
As shown in Table 3, SmlAvgSuppRad was within 100 and 1000. Assiut built-up area was 85.5 km 2 in 1999. The newly built area (Figure 7) between 1999 and 2005 was 57.5 km 2 (11% annual growth). In the second period, when the second land-use development plan started, the newly built area was 37.9 km 2 (5.3% annual growth). The third period (2005:2010) had an increase in the newly built area to be 52.6 km 2 (7.3% annual growth). Finally, the newly built area in the fourth period (2015:2020) was 36.5 km 2 (5.1% annual growth).

Validation Results for the Training Sets and Classification Results
The validation results with the 500 randomly chosen points showed that the accuracy and the False Negative Rate (False Negative Rate = 1-Specificity) for the classification results and the training sets had similar patterns (Tables 4 and 5). In contrast, the Sensitivity rate had variant values. The sensitivity rate varies between 0.471 and 0.861. Among the classified layers, the estimated built-up area for 2020 (BU_2020) had the highest sensitivity rate (0.861), whereas the classified built-up area for 2005 and 2010 (i.e., BU_2005, and BU_2010) had the least rate (0.800). The training sets, however, had a drop in the sensitivity rate. The ALC had the highest sensitivity rate (0.590), followed by the GHSL_2000 (0.496). The lowest sensitivity rate between the training sets was the GHSL_2014 (0.471).

Results of Human Settlement Growth Types
Calculating the total area for HSG types presented comprehensive information about growth types within all periods. As Figure 9 shows, edge-expansion was the dominant growth type throughout all periods, except for the fourth period when the infill was more than expansion. In contrast, the linear branch was the least type except during the first period when edge-ribbon was the least. Calculating the total area for HSG types presented comprehensive information about growth types within all periods. As Figure 9 shows, edge-expansion was the dominant growth type throughout all periods, except for the fourth period when the infill was more than expansion. In contrast, the linear branch was the least type except during the first period when edge-ribbon was the least. Throughout the first period ( Figure 10), expansion was the primary growth type (31 km 2 ), while the cluster was the second (17.8 km 2 ). The isolated cluster area (10.2 km 2 ) was more than the proximate cluster (7.6 km 2 ). Scattered growth had the third-highest proportion of the newly built-up area (4.3 km 2 ). Unlike cluster growth types, the area of isolated scattered patches (3.7 km 2 ) was more than proximate scattered (0.7 km 2 ). The total area for the infill, linear branch, and edge-ribbon was only less than 9% of the newly built area. Throughout the first period ( Figure 10), expansion was the primary growth type (31 km 2 ), while the cluster was the second (17.8 km 2 ). The isolated cluster area (10.2 km 2 ) was more than the proximate cluster (7.6 km 2 ). Scattered growth had the third-highest proportion of the newly built-up area (4.3 km 2 ). Unlike cluster growth types, the area of isolated scattered patches (3.7 km 2 ) was more than proximate scattered (0.7 km 2 ). The total area for the infill, linear branch, and edge-ribbon was only less than 9% of the newly built area.
In the second period (Figure 11), expansion continued with the highest proportion of area (44.5%), whereas the area decreased to be 16.9 km 2 . Infill growth increased to be the second with 12.4 km 2 (32.8%) after it was 3.2 km 2 (5.6%) in the first period. Unlike infill, the cluster decreased to 3.5 km 2 . The area of the isolated cluster was 1.7 km 2 , while the proximate cluster area was 1.8 km 2 . Meanwhile, the scattered growth had (approximately) the same area in the first and second periods (4.2 km 2 ), while the proportion from the total built area increased from 7.4% in the first period to 11.3% in the second. Proximate scattered (3.2 km 2 ) was also more the isolated scattered (1.0 km 2 ), as in the previous period. Both linear branch and edge-ribbon were less than one km 2 .
In the third period (Figure 12), the pattern was similar to the first period. Expansion area increased to 27.7 km 2 (52.9%). Infill area was approximately (13.4 km 2 ), such that the proportion decreased to 25.5% after it was 32.8% in the second period. The cluster growth area increased to be 5.4 km 2 (10.3%), such that the proximate cluster increased (3.5 km 2 ), while the isolated cluster had a slight decrease (1.5 km 2 ). The area of scattered was 4.4 km 2 (8.4%), which was mostly proximate scattered (3.5 km 2 ). Finally, the linear branch and edge-ribbon were 0.7 and 0.9 km 2 , respectively. In the second period (Figure 11), expansion continued with the highest proportion of area (44.5%), whereas the area decreased to be 16.9 km 2 . Infill growth increased to be the second with 12.4 km 2 (32.8%) after it was 3.2 km 2 (5.6%) in the first period. Unlike infill, the cluster decreased to 3.5 km 2 . The area of the isolated cluster was 1.7 km 2 , while the proximate cluster area was 1.8 km 2 . Meanwhile, the scattered growth had (approximately) the same area in the first and second periods (4.2 km 2 ), while the proportion from the total built area increased from 7.4% in the first period to In the fourth period (Figure 13), infill and expansion recorded approximately a similar area (15.8 and 15.6 km 2 , respectively). Meanwhile, Linear branch (0.1 km 2 ), cluster (1.5 km 2 ), and scattered (3.0 km 2 ) had the least area, if compared to previous periods. Proximate cluster (1.0 km 2 ) was more than Isolated cluster (0.5 km 2 ). Similarly, Proximate scattered (3.7 km 2 ) was more the isolated scattered (0.7 km 2 ). Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 24 11.3% in the second. Proximate scattered (3.2 km 2 ) was also more the isolated scattered (1.0 km 2 ), as in the previous period. Both linear branch and edge-ribbon were less than one km 2 . In the third period (Figure 12), the pattern was similar to the first period. Expansion area increased to 27.7 km 2 (52.9%). Infill area was approximately (13.4 km 2 ), such that the proportion decreased to 25.5% after it was 32.8% in the second period. The cluster growth area increased to be 5.4 km 2 (10.3%), such that the proximate cluster increased (3.5 km 2 ), while the isolated cluster had a In terms of landscape properties, MEI and AWMEI results (Table 6) showed that HSG in Assiut had an ascending trend from dispersed to compact. MEI was 11.1, 32.6, 30.2, and 43.9 for the first, second, third, and fourth periods, respectively. The highest increase was in the second period, while the only decrease was in the third period. AWMEI for the first, second, third, and fourth periods was 18.0, 35.4, 32.5, and 44.2, respectively. Similarly, the highest increase was in the second period, and the only decrease was in the third period.
Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 24 slight decrease (1.5 km 2 ). The area of scattered was 4.4 km 2 (8.4%), which was mostly proximate scattered (3.5 km 2 ). Finally, the linear branch and edge-ribbon were 0.7 and 0.9 km 2 , respectively. In the fourth period (Figure 13), infill and expansion recorded approximately a similar area (15.8 and 15.6 km 2 , respectively). Meanwhile, Linear branch (0.1 km 2 ), cluster (1.5 km 2 ), and scattered (3.0 km 2 ) had the least area, if compared to previous periods. Proximate cluster (1.0 km 2 ) was more than  Isolated cluster (0.5 km 2 ). Similarly, Proximate scattered (3.7 km 2 ) was more the isolated scattered (0.7 km 2 ). In terms of landscape properties, MEI and AWMEI results ( Table 6) showed that HSG in Assiut had an ascending trend from dispersed to compact. MEI was 11.1, 32.6, 30.2, and 43.9 for the first, second, third, and fourth periods, respectively. The highest increase was in the second period, while the only decrease was in the third period. AWMEI for the first, second, third, and fourth periods was

Class Metrics
Comparing the eight growth types over all periods, only expansion experienced a continuous increase in the number of patches from 1999 until 2020, while proximate cluster decreased continuously (Figure 14a). While the number of patches for the six other types fluctuated, infill has increased very strongly overall, perhaps reflecting some desire for consolidation and more compact growth. From the patch density perspective (Figure 14b), all types had an increase in each period except for edge-ribbon, linear branch, and isolated cluster that exhibited little change. Expansion and isolated cluster had the highest largest patch index (Figure 14c), while for the mean patch area, all types showed an increase over time, with isolated cluster, linear branch, and expansion showing the highest values (Figure 14d). continuously (Figure 14a). While the number of patches for the six other types fluctuated, infill has increased very strongly overall, perhaps reflecting some desire for consolidation and more compact growth. From the patch density perspective (Figure 14b), all types had an increase in each period except for edge-ribbon, linear branch, and isolated cluster that exhibited little change. Expansion and isolated cluster had the highest largest patch index (Figure 14c), while for the mean patch area, all types showed an increase over time, with isolated cluster, linear branch, and expansion showing the highest values (Figure 14d).

Discussion
From a technical perspective, both the GHSL and the ALC were useful for estimating the built-up area in Assiut using SML, though the ALC had a higher probability of detecting built-up areas than the GHSL. This difference could be the result of tuning the parameters of the GHSL to global scale requirements, while ALC was tuned for Africa only. Nevertheless, despite their differences, both were useful as training sets, enabling us to tune the SML classifier and improve the accuracy of Assiut's built-up area maps. Future research could examine to what extent the settings used here could be transferred for classifying other parts of the Nile Delta.
The new QGIS Growth Classifier plugin has some advantages over existing tools. Lui et al. [46], for instance, developed a tool for a commercial GIS platform (ArcGIS) that defines three HSG types (i.e., infill, edge-expansion, and outlying). By contrast, Growth Classifier operates on the open-source QGIS platform and classifies eight HSG types, as described in Section 2, thereby providing a more detailed insight into HSG dynamics over time. As previous studies highlighted the need for identifying HSG dynamics to support modeling and simulating unplanned (informal) settlements using remote sensing and GIS [53,54], the more detailed information that Growth Classifier generates may prove to be useful in future simulation work. For example, it could provide further insights into the conditions under which ribbon and clustered development may consolidate or transform into new urban expansion areas over time.
From a substantive perspective, human settlements in Assiut Governorate, we have observed changes in the mix of growth types and rates since 1999. The epoch of 2015:2020 had the lowest growth area, whereas the period between 2010 and 2015 had the highest growth area. This change of areas between the four periods can, to some extent, be explained by the land-use development plan during each period.
Before 1986, rural human settlement growth in Egypt was autonomous. In 1986, the first land-use development plan for settlements in Assiut created planning schemes for 15 years, following the master planning approach. These plans failed to regulate settlement development beyond the physical boundaries of the plan [31] as they neglected the socioeconomic context and focused exclusively on establishing spatial and physical boundaries [55]. The failure of the first land-use development plan is clearly reflected by the results of the first period (1999-2005) when almost 58 km 2 was added to the settlement area.
In 2005, the second land-use development plan for 2005-2020 was launched [33]. This plan relied on a strategic planning schema, and the socioeconomic contextual characteristics were considered. The second plan was intended to stimulate infill growth and prevent all types of outlying growth. Accordingly, the annual growth rate decreased considerably between 2005 and 2010. Although the plan succeeded in utilizing the undeveloped pockets (infill growth), the growth rate (5.3% p.a.) was still higher than envisaged. Overall, the second land-use development plan was also unsuccessful in managing HSG in Assiut. Although the proportion of infill growth from the total growth area has increased, it has failed to limit the total growth area. While the plan's target was to limit settlement growth to 15% (1% p.a.) of the 2005 built-up area (approximately 23 km 2 ), the observed growth rate was more than 88.8% (almost 127 km 2 ). Further, the plan also failed to restrict outlying settlement growth. Although one may conclude that the second plan had a limited effect on controlling growth, it is also important to mention a major external force, the 2011 civil uprising, which disrupted all government operations, including this plan's implementation. The data show a large amount of settlement growth in the period 2010-2015 (52.6 km 2, which is at least 14 km 2 more than the second and the fourth periods), which might be partly attributed to the effect of the uprising.
The strong effect of accessibility on outlying growth types in the four periods is in accordance with Colsaet et al. [47], who found that roads (corridors) increase the probability of land development. Although edge-ribbon and linear branch were the least (in terms of area), other growth types (e.g., cluster) were also affected by their proximity to corridors. With a road network of more than 7000 km was built in Egypt between 2014 and 2020 [56], more studies should study the impact of these roads on expanding HSG types that are affected by corridors (edge_ribbon, linear branch, proximate cluster, and proximate scattered. To improve the identification of linear branch and edge-ribbon in the Nile Valley region, more studies should be conducted to define the optimal patch dimension that can describe cluster, edge-ribbon, and linear branch in the Nile Valley context.
Unplanned HSG in the Nile Valley is driven by many factors [57]. In terms of the policy, there is a lack of efficient participatory planning processes while developing and implementing the land-use plan. In addition, urban policy ignores residents' preferences and lifestyle requirements. The government has also failed to establish a consistent policy toward unplanned HSG. On the one hand, legislation criminalizes unplanned HSG while, on the other hand, it also sometimes adopts a reconciliation policy for regularizing unplanned settlements. Such contradictions encourage unplanned HSG as opportunities for reconciliation through regularization are often on the table. The regularization of informal settlements is often supported by international, pro-poor development agendas [58].
The presence of outlying growth types in all periods reflects insufficient subsidized housing opportunities for poorer households in Assiut. Consequently, more than 70% of unplanned HSG between 1986 and 2005 was for residential use for the lower-income groups [27]. Meanwhile, all new settlements that were built in the desert adjacent to the Nile Valley and Delta regions failed to attract their targeted populations [23]. Thus, future land-use development plans might continue to fail in preventing outlying growth unless the current housing policy toward the lower-income groups is reconsidered, such as presenting housing subsidies for these groups and providing subsidies for farmers to prevent them from building on the agricultural land.
Uncontrolled growth was not only caused by policy failures. Administrative, economic, and cultural dimensions have also contributed. Top-down decision-making processes, administrative fragmentation, and disrupted bureaucratic structures are some of the administrative drivers for unplanned HSG, and the local culture of parents providing housing on their land for their children's families also drives settlement growth [57].
By comparing the dynamics of HSG types in Assiut with other areas experiencing rapid growth such as Dongguan, China, edge-expansion type was the dominant urban growth type for both areas [46]. In Dongguan, edge-expansion growth was the dominant type from 1988 to 2006. By contrast, the outlying growth type had the lowest area in Dongguan, whereas it was the second-largest component of settlement growth in Assiut. Such differences may be attributed to the specific socioeconomic and geographic characteristics of the two locations. The difference also indicates that Dongguan's policy managed to achieve more compact growth than the policy in Assiut.
Dietzel et al. [59] showed that Houston's urban growth passes through two phases: diffusion and coalescence. Starting from an urban seed, the built-up area grows in scattered patches, which ultimately leads to the emergence of several urban cores (diffusion). Thereafter, edge expansion occurs around existing urban cores and then infilling occurs on the unoccupied areas between the cores (coalescence). As presented in Section 3, Assiut's growth showed high levels of diffusion in the period 1999-2005 and was expanding significantly between 2005 and 2020 as a symptom of uncontrolled growth. The increasing values of the mean patch size also indicate a trend against the policy of fixing the growth area between 2005 and 2020. These results suggest that the future phase may be expected to show increased signs of coalescence, as the spaces between existing settlements are subject to infilling, though this will also likely be accompanied by a degree of diffusion as both processes do co-exist. This is also reflected by the relative change in the number of patches per growth type (Section 3.4). Moreover, from an urban policy perspective, it would be worthwhile to give more attention to the four growth types that are adjoining corridors (i.e., edge-ribbon; linear branch; proximate cluster; proximate scattered) as some of these are increasing in number and these have the potential to open-up large tracts of farmland to the forces of urban development in a manner which is fragmented and may be challenging to service efficiently.

Conclusions
Our study makes several methodological and substantive contributions. We have demonstrated the practical value of using open remote sensing and GIS data (Landsat, GHSL, ALC, OpenStreetMap) and open processing tools (MASADA 1.3) to extract human settlement maps, which are tuned to the context of Assiut Governorate and potentially the wider Nile Delta region. Another contribution is the open-source QGIS plugin tool, Growth Classifier, which may help others to classify major settlement growth types, including five new HSG types that highlight the effect of corridors (e.g., road network) on settlement growth. Moreover, Growth Classifier can also be used for other purposes, such as analyzing deforestation/reforestation dynamics through time.
By tuning the SML parameters within MASADA 1.3, the classification accuracy of built-up areas was significantly improved. This higher quality assessment of the built-up area over time provides a better understanding of the human settlement growth processes through time. The classification of eight settlement growth types (infill, expansion, edge-ribbon, linear branch, proximate cluster, isolated cluster, proximate scattered, and isolated scattered) provides insights into the form and compactness of settlement growth and allowing the effectiveness of urban growth policy in Assiut to be assessed.
From a substantive perspective, we conclude that the land-use development policy in the Nile Valley region of Assiut has been mostly ineffective in managing unplanned HSG between 2005 and 2020. Although the rate of unplanned HSG did decrease after 2005, HSG was clearly still ongoing. Moreover, we observe that the ability to regulate such settlement growth processes is weakened by periods of civil unrest, such as the 2011 crisis in Egypt. There is, therefore, a need to critically evaluate the recent strategies for regulating land-use development in the next 2020-2035 plan. Further, institutionalizing an efficient settlement growth monitoring system to regularly inform policymakers, using tools and methods such as those used here is worthy of consideration.