Monitoring and Forecasting of Urban Expansion Using Machine Learning-Based Techniques and Remotely Sensed Data: A Case Study of Gharbia Governorate, Egypt

: Rapid population growth is the main driver of the accelerating urban sprawl into agricultural lands in Egypt. This is particularly obvious in governorates where there is no desert backyard (e.g., Gharbia) for urban expansion. This work presents an overview of machine learning-based and state-of-the-art remote sensing products and methodologies to address the issue of random urban expansion, which negatively impacts environmental sustainability. The study aims (1) to investigate the land-use/land-cover (LULC) changes over the past 27 years, and to simulate the future LULC dynamics over Gharbia; and (2) to produce an Urbanization Risk Map in order for the decision-makers to be informed of the districts with priority for sustainable planning. Time-series Landsat images were utilized to analyze the historical LULC change between 1991 and 2018, and to predict the LULC change by 2033 and 2048 based on a logistic regression–Markov chain model. The results show that there is a rapid urbanization trend corresponding to a diminution of the agricultural land. The agricultural sector represented 91.2% of the total land area in 1991, which was reduced to 83.7% in 2018. The built-up area exhibited a similar (but reversed) pattern. The results further reveal that the observed LULC dynamics will continue in a like manner in the future, conﬁrming a remarkable urban sprawl over the agricultural land from 2018 to 2048. The cultivated land changes have a strong negative correlation with the built-up cover changes (the R 2 were 0.73 in 1991–2003, and 0.99 in 2003–2018, respectively). Based on the Fuzzy TOPSIS technique, Mahalla Kubra and Tanta are the districts which were most susceptible to the undesirable environmental and socioeconomic impacts of the persistent urbanization. Such an unplanned loss of the fertile agricultural lands of the Nile Delta could negatively inﬂuence the production of premium agricultural crops for the local market and export. This study is substantial for the understanding of future trends of LULC changes, and for the proposal of alternative policies to reduce urban sprawl on fertile agricultural lands.


Introduction
The world is urbanizing rapidly; the urban covering is extending at twice the rate of population increase worldwide [1]. The superpower of urban convergence helps boost scientific and technological progress, and cultural exchanges. However, with the emergence of inequality in the division of wealth, this accelerated development is causing many sustainability challenges in terms of securing environmental sustainability, resource management, and the wellbeing of urban residents [2], e.g., biodiversity loss, increasing emissions of greenhouse gas, water scarcity, and environmental pollution [1,3]. Furthermore, many systems-such as transportation, housing, employment, privacy, and public morals-face enormous pressures and challenges, thus further negatively affecting human Remote Sens. 2021, 13,4498 2 of 25 life [4,5]. The consequences of urban growth are most critical in developing countries, where urbanization is prominent and frequently random [6]. Therefore, the focus on urban studies and the development of research plans has become an urgent issue.
The regular and timely monitoring and mapping of human settlements at multiple spatial scales, from the local to the global, is critical to the realization of the spatial and temporal variability of population distribution and fresh urbanization trends, and to support global scopes such as the 2030 Agenda for Sustainable Development [7,8]. Many previous studies addressed the issue of urban sprawl by adopting different methodologies, e.g., Ranagalage et al. [9] examined and forecasted land-use changes in the rapidly urbanizing hill station of Nuwara Eliya, Sri Lanka, based on a neural network-Markov model. The findings show that the changes in cultivated land indicate a strong negative relationship with the changes in built-up cover along the urban expansion. Furthermore, Rimal et al. [10] addressed the study area of Nepal's Terai region; a densely populated area that witnessed land-use changes owing to urban development. They utilized Landsat satellite images to examine past land-use change, highlighting urban growth and predicting its future trends using ANN and MC spatial models based on historical trends. Urban cover quadrupled over a 27-year period, and this expansion was roughly totally at the expense of agricultural diminution. The urban sprawl is predicted to continue according to the same trend over that study area.
Otherwise, our study focused on the areas suffering from accelerating urban sprawl on agricultural lands, resulting in food security challenges. In Egypt, the rapid population growth and growing demand for urban lands are the main drivers of the accelerating urban sprawl [11] into agricultural and arable lands. This is particularly obvious in the governorates that do not have a desert backyard. Gharbia governorate is a vivid example of a densely populated area (which experienced rapid urban growth over recent decades) where there is no desert backyard that could allow urbanization [12]. The main reason for the amplification of the problem of urban expansion in Egypt is that about 96% of the entire land area is unpopulated desert [13]. As such, the population is intensified completely in the Nile valley and the Delta. This unbalanced allocation and the massive rise in population led to critical socio-economic problems [2,14]. On the other hand, agriculture is a basic economic resource for Egypt [15], such that the diminution of agricultural land is one of the most important challenges facing agricultural development to achieve food security and agricultural rural stability. Even the reclamation of desert lands (new lands) cannot compensate for the deduction from the old lands in the short term due to the lower fertility of the new lands and the cost of the reclamation process. Therefore, the lack of agricultural production is a major problem for society in terms not only of the consumption of large amounts of foreign currency to fill the food gap resulting from the deduction of these areas but also the loss of products destined for export and the local markets for which the study area is famous, e.g., Egyptian cotton. Accordingly, understanding the dynamics of urban expansion and the motives behind the transgressions in such governorates, like Gharbia, is pivotal to promote sustainable land-use planning. This paper affords a survey of machine learning-based and state-of-the-art remote sensing-based products and methodologies that address the urban expansion over the agricultural and arable lands, and mapping of the transitional zones.
Furthermore, in order to make this study more productive, we employed Fuzzy TOPSIS to determine the level of susceptibility, for each district in the study area, to the urbanization risk in order for the decision-makers to be given the districts with priority for sustainable planning. As clarified, all of the previously mentioned studies focused on monitoring and forecasting the urban expansion over the different study areas, adopting different methodologies and achieving highly accurate and trustworthy results. However, what about the data-gap regions? If the area of interest is not covered by the satellite images, Fuzzy TOPSIS can be employed to evaluate the districts' susceptibility to the urbanization risk.
Geodata science and spatial analysis are powerful for studying and monitoring landuse/land-cover change (LULCC), and for creating models for potential scenarios, being a substantial tool for management and decision-making to guarantee optimal land use [16]. These techniques have made major advances thanks to the availability of Remote Sensing (RS) images, sensors and Internet of Things (IoT) data; multi-source geospatial data; and new methods that have emerged, such as artificial intelligence (AI), particularly in its application to Earth observation (EO) [17]. RS imagery can efficiently capture the updated LULC allocation spatially and temporally [18], from which trends of LULC dynamics can be derived, analysed, and predicted [19]. Moreover, machine learning algorithms are a common subset of AI. Using machine learning algorithms can train data for predictive analysis, which leads to greater accuracy in the results [17][18][19][20]. Machine learning can be highly successful in image analysis tasks, involving land-use classification, simulation, and predictive analysis [21].
For land-use classification, a Support Vector Machine (SVM) is one of the most common machine learning image classifiers [22][23][24][25]. An SVM was considered in this study to extract LULC from Landsat images because it can handle overlapping classifications [24,25]. It outperforms other algorithms when there are large features and fewer training data. Moreover, it presents a notably higher classification accuracy and classification speed than other algorithms [26], as confirmed by Yousefi et al. [27].
For simulation, there are diverse modeling methods capable of simulating and predicting LULC, including SLEUTH [28], and the Conversion of Land Use and its Effect (CLUE) [29]. The SLEUTH model, for example, simulates LULC variation in the future based on land-use behavioral change [30], whereas CLUE simulates spatio-temporal LULCC as a result of human and biophysical drivers. The Markov chain (MC) algorithm completes such spatial models of LULCC by a stochastic estimate of land-change transition potential without determining the spatial extent of such transitions [31,32]. Consequently, because spatial models such as cellular automata (CA) concentrate specifically on the land-change transitions spatially, usually based on past trends, a hybrid model that includes MC can estimate the location and quantification of conversions relying on historical LULC trends [33]. The logistic regression-Markov chain model is one of the superior hybrid models for the estimation of future land change transitions [33][34][35]. A substantial advantage is clear when compared with different models such as CA, which requires transfer rules to manage change procedures, or the SLEUTH model, which requires the parameter values to be known in advance. The logistic regression-Markov chain model was utilized to simulate LULC changes in Aswan, Egypt [34]; Tehran, Iran [33]; and Giao Thuy District, Vietnam [35]. The integration of MC (with superior potential for transition estimation) [36] and logistic regression as a spatial model (which focuses on the location of land-change transitions, usually based on archival trends) assists in the generation of transition probability maps [37].
The identification of spatiotemporal patterns of LULCC and regarding driving forces is necessary to promote perfect urban strategies and policies which can guarantee economic, social, and environmental sustainability [15]. Therefore, this study includes three principal objectives: (1) to investigate the LULC dynamics over the past 27 years, (2) to simulate the future LULCC for 2033 and 2048 in the Gharbia governorate, and (3) to produce an Urbanization Risk Map illustrating the susceptibility level to the urbanization risk. The study focuses on promoting the capability and understanding of policymakers and the local government to reduce the potential undesirable effects of rapid urban growth inside the study area.

Study Area
Gharbia governorate is located in the heart of the Nile Delta of Egypt, at 30.87 • N latitude and 31.03 • E longitude. It is bordered to the north by Kafr El-Sheikh governorate, to the south by Monufia governorate, and to the east and west, respectively, by the Damietta and Rosetta branches of the Nile. The Nile Delta is one of the most ancient agricultural areas worldwide. It has been ceaselessly cultivated for more than 5000 years [2,38]. The Nile River provided the Delta soil with natural fertilizers and therefore high agricultural production, creating an obvious green triangular region within a wide desert. The agricultural land as a LULC category is dominant in the governorate, which is famous for the cultivation of traditional crops, e.g., potatoes for the local market and export, in addition to the rice, grains, and cotton which is famous for its superior quality. Moreover, this governorate produces 86% of the flax crop in Egypt. Gharbia consists of eight districts, as shown in Figure 1. According to the population estimate of 2018, there are about 5,066,000 people residing in a total land area of 1999 km 2 . Otherwise, the population was only 4,011,320 and 3,790,670 in 2006 and 2001, respectively.

Study Area
Gharbia governorate is located in the heart of the Nile Delta of Egypt, at 30.87° N latitude and 31.03° E longitude. It is bordered to the north by Kafr El-Sheikh governorate, to the south by Monufia governorate, and to the east and west, respectively, by the Damietta and Rosetta branches of the Nile. The Nile Delta is one of the most ancient agricultural areas worldwide. It has been ceaselessly cultivated for more than 5000 years [2,38]. The Nile River provided the Delta soil with natural fertilizers and therefore high agricultural production, creating an obvious green triangular region within a wide desert. The agricultural land as a LULC category is dominant in the governorate, which is famous for the cultivation of traditional crops, e.g., potatoes for the local market and export, in addition to the rice, grains, and cotton which is famous for its superior quality. Moreover, this governorate produces 86% of the flax crop in Egypt. Gharbia consists of eight districts, as shown in Figure 1

Data Collection and Processing
This study employed time-series Landsat imageries gained from the United States Geological Survey (USGS) website. They were relied upon because of their rich and free archive that allows long-term study. The Landsat 5 Thematic Mapper (TM) and Landsat 8 OLI/TIRS were used to obtain LULC maps [39] for 1991, 2003, and 2018, and to extract the land-cover changes over the years [40]. The obtained images were pure, with no cloud

Data Collection and Processing
This study employed time-series Landsat imageries gained from the United States Geological Survey (USGS) website. They were relied upon because of their rich and free archive that allows long-term study. The Landsat 5 Thematic Mapper (TM) and Landsat 8 OLI/TIRS were used to obtain LULC maps [39] for 1991, 2003, and 2018, and to extract the land-cover changes over the years [40]. The obtained images were pure, with no cloud shades thanks to the dry summer season. The three images were atmospherically corrected. Before the image analysis, all of the scenes were stacked and put into subsets by the region of interest. The study area is located in two successive scenes, so a mosaic was employed. This paper has a further research area concerning the impact of LULCC on the land surface temperature over this study area. Therefore, it was preferred to select the images in the same season for a more realistic comparison over the years. However, if we only address the LULCC, there is no need to consider the capture date of the images for the different years. The road network gained from the open street map (OSM) was used as auxiliary data Remote Sens. 2021, 13, 4498 5 of 25 to extract the layer of "distance to the nearest road", which will be used later. Moreover, the training samples used for the supervised classification process and its validation were also acquired based on Google Earth Pro as mentioned in Table 1.  Figure 2 illustrates the conceptual flowchart of the applied methodology to fulfill the aforementioned objectives with five fundamental steps: (1) the extraction of LULC maps from multi-temporal Landsat images [41] using a machine learning algorithm for classification [42], followed by post-classification to limit the misclassification errors or salt and pepper noises caused by spectral confusion [43], and the evaluation of the accuracy of the classification process so that the change detection can be relied upon [44]; (2) choosing the most appropriate modeling variables which are the driving forces of the LULCC (representing the urban sprawl parameters) [45] and obtaining transition potential maps [34] illustrating the transition between the land-use types; (3) LULCC simulation/validation and future projection based on the Land Change Modeler (LCM) [46,47], adopting a logistic regression algorithm and the Markov chain model, followed by the validation of the simulation process to test the model's capability of prediction; (4) linear regression analysis is employed to investigate the correlation between the changes in different land-use types; and (5) applying a multi-criteria decision-making (MCDM) technique (Fuzzy TOPSIS) to determine the level of the susceptibility of the districts to the urbanization risk in the form of an Urbanization Risk Map [48], especially in the absence of satellite images.

LULC Classification
As visually interpreted through the used data images, the study area has only 3 LULC classes: built-up, water, and agricultural land. We relied on a pixel-based supervised classification method which is appropriate to the primary data used, which are themselves labeled as medium-resolution data (i.e., Landsat images). We depended on two classification techniques: Maximum Likelihood Classification (MLC) as a parametric pixel-based method, and SVM classification as a non-parametric pixel-based approach. However, even visually, the SVM-based classified image simulated the actual categories much better than the MLC technique. This is because SVM is a kernel-based algorithm that starts working with data in a relatively low dimension and then moves them into a relatively high dimensional space, finding a relatively high dimensional support vector classifier that can effectively classify the observations [49,50]. Subsequently, SVM was considered to extract LULC from Landsat images.
There was some misclassification in the first results of the classification process. This is why the majority filter was adopted in the post-classification to limit the misclassification errors or salt and pepper noises caused by spectral confusion [43,51]. In other words, the post-classification improvement is a successful technique to considerably upgrade the results' accuracy.

LULC Classification
As visually interpreted through the used data images, the study area has only 3 LULC classes: built-up, water, and agricultural land. We relied on a pixel-based supervised classification method which is appropriate to the primary data used, which are themselves labeled as medium-resolution data (i.e., Landsat images). We depended on two classification techniques: Maximum Likelihood Classification (MLC) as a parametric pixel-based method, and SVM classification as a non-parametric pixel-based approach. However, even visually, the SVM-based classified image simulated the actual categories much better than the MLC technique. This is because SVM is a kernel-based algorithm that starts working with data in a relatively low dimension and then moves them into a relatively high dimensional space, finding a relatively high dimensional support vector classifier that can effectively classify the observations [49,50]. Subsequently, SVM was considered to extract LULC from Landsat images.
There was some misclassification in the first results of the classification process. This is why the majority filter was adopted in the post-classification to limit the misclassification errors or salt and pepper noises caused by spectral confusion [43,51]. In other words, the post-classification improvement is a successful technique to considerably upgrade the results' accuracy.

Accuracy Assessment
So as to verify whether the classification results are accurate for change detection, a quantitative accuracy evaluation was performed. The most common technique to assess the accuracy of land cover maps is the confusion matrix [52,53]. The confusion matrix contrasts a classified image with its corresponding reference image on a category-by-category basis. The samples required for the training of all of the classes, in addition to those for validation, were obtained based on Google Earth historical images. We used a random

Accuracy Assessment
So as to verify whether the classification results are accurate for change detection, a quantitative accuracy evaluation was performed. The most common technique to assess the accuracy of land cover maps is the confusion matrix [52,53]. The confusion matrix contrasts a classified image with its corresponding reference image on a category-by-category basis. The samples required for the training of all of the classes, in addition to those for validation, were obtained based on Google Earth historical images. We used a random sampling method [54,55] to generate 1000 checkpoints for each image, which covered all of the LULC classes and were well distributed, in order to obtain more realistic classification accuracy [54]. Based on the error matrix, the Overall Accuracy (OA%) was used to express the classification accuracy assessment. However, OA doesn't detect whether the error was evenly distributed between the classes, or if some classes were really bad and others were really good. Therefore, three indices of accuracy were also computed: User's Accuracy (%), Producer's Accuracy (%), and the Kappa coefficient [56,57].

LULC Change Modeling
The land change modeling process involves three main steps: (1) model calibration, (2) model simulation and validation, and (3) model projection. Before the model calibration, the preparation of the data is required; the land cover images must have the same extent, projection, background areas and values, and legends. These steps were performed via LCM [58,59]. The LCM is integrated into the TerrSet Geospatial Monitoring and Modeling System software (Clarke Labs 2019 [60]), which is efficient at LULCC analysis, simulation and validation, and future projection [46]. The land change analysis is the exploration of the land change dynamics between two points of time. LCM provides a superior perception of LULCC patterns to support urban planning and sustainable development [46].

Model Calibration (1991-2003)
The model was calibrated for pixels that changed from each LULC class to other classes between 1991 and 2003. The "transition potentials" for each modeled transition were determined using an MC stochastic model [61], whereas a sub-model representing the transition potentials to the built-up area was specified using the logistic regression. Combining the MC approach-which has been popularly adopted to predict LULC distribution with spatial models, involving MLP-ANN, CA, and logistic regression models [62,63]-was recommended. A hybrid model of MC and logistic regression can gather the advantages of both the quantification and determination of the spatial extent of the conversion states among the different LULC types [31].
The selection of the most appropriate modeling variables for the sub-model is one of the significant steps in LULC modeling. These variables are the driving forces of the LULC dynamics when conducting spatiotemporal prediction [64,65]. For instance, in order to monitor urban growth, urban sprawl parameters have to be taken into account. According to previous studies related to the modeling of land change and urbanization [9,66], as well as the available data, some common driving forces of the LULCC-particularly when predicting the anthropogenic activities -include: a.
The distance from roads: Roads can provide access to previously remote areas promoting urbanization near roadways. b.
The distance from urban centers: Urban centers tend to grow and expand as the human population increases, such that the areas surrounding current urban centers are frequently susceptible to land change. c.
The distance from persistent built-up areas: Areas that have already been disturbed by humans often have the infrastructure in place to promote further urbanization along current persistent built-up edges. d. The distance from railway stations: This is the same effect of the driver of "distance from roads". e.
Digital Elevation Model (DEM): Because of the environmental gradient, characteristics such as temperature and rainfall alter with elevation; elevation is a proper indicator of areas that are appropriate for cultivation (and thus are prone to transition to agricultural land) and for subsequent development (for instance, the lowland area is more disposed to evolution). f.
Slope: The slope is the principal of determining whether the land is advantageous to humans. For example, agriculture and building require fairly gentle slopes, such that areas with these slopes may be more likely to experience land cover change.
Previous studies have confirmed that the fittest way to choose the variables depends on the properties of the study area, the detected LULCC overtime, and the expert familiarity with the different study areas [67,68]. Therefore, based on our knowledge of the study area, considering the development pattern of the area, we adopted four modeling variables (the aforementioned a, b, c and d, as shown in Figure 3) as driving forces of the LULCC in the study area. The DEM and slope can be neglected, as the study area is almost level. All of the distance-based variables were generated by the Euclidean distance method. After the variable extraction, a standardization technique was applied for the rescaling of the four variables between 0 and 1 (as the logistic regression, as a machine learning algorithm, always adopts), for the minimization of the self-impact of the variables having diverse ranges. Then, the logistic regression algorithm was applied to produce transition potential maps depicting the probability of each conversion at a specified location in the whole study area.

LULC Simulation and Model Validation (2003-2018)
After the calibration of the model, we simulated the LULC of 2018 based on the trends noticed during the calibration period (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003). Then, the simulated 2018 LULC map was compared with the observed 2018 LULC map produced earlier based on the SVM supervised classification of the Landsat image, in order to validate the model. The model validation was used to evaluate how effective the LCM is in simulating and then extrapolating the future LULC maps [69,70]. The model outputs the Cohen's Kappa coefficient [71] as an indicator of the degree of the matching between the simulated and observed 2018 LULC maps. The values of Kappa of 0 to 0.2 point out the slight relationship between the two maps, while the values of 0.81 to 1 are a nearly ideal agreement [72,73].

LULC Simulation and Model Validation (2003-2018)
After the calibration of the model, we simulated the LULC of 2018 based on the trends noticed during the calibration period (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003). Then, the simulated 2018 LULC map was compared with the observed 2018 LULC map produced earlier based on the SVM supervised classification of the Landsat image, in order to validate the model. The model validation was used to evaluate how effective the LCM is in simulating and then extrapolating the future LULC maps [69,70]. The model outputs the Cohen's Kappa coefficient [71] as an indicator of the degree of the matching between the simulated and observed 2018 LULC maps. The values of Kappa of 0 to 0.2 point out the slight relationship between the two maps, while the values of 0.81 to 1 are a nearly ideal agreement [72,73]. As the Kappa index may not be enough to assess the simulation potential properly [74], the Jaccard similarity coefficient was used to cross-check the results [75]. The Jaccard coefficient is used to characterize the performance of a model for the proper simulation of LULCC. It is defined as the ratio of properly simulated change area versus the entire area of properly and improperly simulated LULCC or constancy [76]. In other words, it is defined as the area of intersection between the two layers (simulated and actual) divided by their union area, as shown in Equation (1). It ranges from 0% to 100%, indicating the degree of agreement between the actual and simulated change, where 0% means no intersection between the two layers and 100% means perfect intersection. Because the remarkably observed transition was represented in the urban sprawl over the agricultural land, the observed and simulated built-up layer for 2018 was extracted, and then the Jaccard coefficient was derived.

Fuzzy TOPSIS Analysis
The Fuzzy TOPSIS (Technique for Order Preference by Similarities to Ideal Solution) is an MCDM technique, which was originally stated by Hwang and Yoon in 1981, with additional developments by Yoon in 1987, and Hwang, Lai and Liu in 1993 [77]. TOPSIS relies on the concept that the preferred alternative should have the shortest geometric distance from the positive ideal solution (PIS) and the longest geometric distance from the negative ideal solution (NIS) [48]. In order to avoid uncertainty in the data and analysis, a fuzzy technique was employed. Utilizing fuzzy numbers in TOPSIS for criteria analysis makes it easy to evaluate. The used fuzzy numbers were extracted based on the triangular membership function [78,79]. Gharbia governorate has eight urban centers, and we need to monitor the trend of urban growth through these centers to assist in producing an Urbanization Risk Map. We have applied Fuzzy TOPSIS for the determination of the district which is the most susceptible to urbanization risk, and thus has the priority to be considered by the decision-makers. The following are the steps of applying the Fuzzy TOPSIS method [48] (A spreadsheet of the numerical matrices has been inserted in the Supplementary Materials): Step 1: Alternatives rating by decision-makers and applying for fuzzy numbers.
Here, we have eight alternatives (representing the eight districts of the study area) for the decision, based on five criteria by which the regions could be judged in terms of the level of risk, including population growth, employment, local development, area, and socio-economic conditions. The criteria data for each alternative were obtained based on the government reports released on social media. Conversion metrics were applied to convert the linguistic terms into fuzzy numbers. Typically, a scale of 1 to 9 is applied to rank the criteria and alternatives. The intervals are selected to have a uniform representation from 1 to 9 for the fuzzy triangular numbers utilized for the five linguistic evaluations. Table 2 illustrates the fuzzy rating for the linguistic terms. The Combined Fuzzy Decision Matrix of the criteria and alternatives (shown in Table 3) was obtained from two fuzzy decision matrices, as we considered two decision-makers.
The fuzzy decision matrix has the ith alternative on the jth criterion, and the triangular fuzzy number is represented by (a ij , b ij , c ij ) " R = " r ij m * n i = 1, 2, ..., m; j = 1, 2, ..., n where, " R is the normalized fuzzy decision matrix: Note that each criterion is a benefit criterion. Furthermore, the normalized triangular fuzzy numbers range from 0 to 1.
Step 3: Computing the weighted normalized fuzzy decision matrix using Equation (3).
where W j is the weight of each criterion (3).
Select the maximum value from each row as Õ + , and select the minimum value from each row as Õ − .
Step 5: Computing the distance from each alternative to the FPIS and to the FNIS using Equation (6).
Do the same for d − Õ ij , A − j , then compute the sum of the distances using Equations (7) and (8): Step 6: Compute the Closeness Coefficient (CCi) for each alternative using Equation (9).

Accuracy of the LULC Maps
The classification result revealed three classes throughout the study area: built-up, water, and agricultural land. The built-up class involved buildings, roads, and all other impervious surfaces. The previously mentioned four indices of accuracy are summarized in Table 4 for the three images of 1991, 2003, and 2018: the Overall Accuracy (%); the Kappa Coefficient for each classified image, in addition to User's Accuracy (%); and the Producer's Accuracy (%) for each land cover type. The error matrices upon which the accuracy indices were computed have been inserted in the Supplementary Materials (Tables S1-S3). As is visible in Table 4, each of the three classification dates attained satisfying overall accuracies of 93.9 %, 94.3% and 94.3%, and corresponding desired kappa coefficients of 0.82, 0.79 and 0.73 for 1991, 2003 and 2018, respectively. SVM confirmed high classification potential. However, there is some mutual misclassification in the water and irrigated agricultural land owing to the close spectral similarity between these two classes in a single temporal image. Therefore, the seasonal behavior of crops is an essential component of successful image interpretation.

Spatiotemporal Analysis of LULCC
The classification and quantification of the images of the study area (illustrated in Figure 4 and summarized in Table 5) are vital for LULCC detection within the study area and over the study period. LULC maps for 1991, 2003, and 2018 are illustrated in Figure 4a-c, respectively. Table 5 Table 6) using Equation (10).
where A is the area under a specified land-use class for the second (new) year, B is the area under the same land-use class for the first (old) year, and % RD is the relative deviation in each class over a period of time: it may be positive, which confirms class area expansion, or negative, representing class area shrinkage.   Table 6) using Equation (10).
where A is the area under a specified land-use class for the second (new) year, B is the area under the same land-use class for the first (old) year, and % RD is the relative deviation in each class over a period of time: it may be positive, which confirms class area expansion, or negative, representing class area shrinkage.  (27 years) from 1991 to 2008 is +93.8%. This means that the built-up land almost doubled. The agricultural land exhibits a similar (but reverse) pattern from 1991 to 2018, because the urban expansion was at the expense of the reduction or loss of agricultural land.

LULC Change Modeling, Simulation, and Projection
Before the prediction of the future LULC, the transition probability and transition area matrices were retrieved depending on the MC stochastic model using 1991 and 2003 LULC maps to simulate the 2018 LULC map (Figure 6a) for the model validation. These matrices and the transition probability maps have been inserted in the Supplementary Materials (Tables S4-S6 and Figures S1-S3 respectively). The model directly outputs the Kappa coefficient and the details have been inserted in the Supplementary Materials ( Figure S4). Jaccard similarity coefficient were 88% and 52%, respectively. The Kappa coefficient directly output The Jaccard coefficient statistic better estimates the fit between the observed and simulated LULC maps. The results were considered adequately acceptable, and the model could be adopted for future simulation. The model was employed to extrapolate future LULC maps for 2033 and 2048, as clarified in Figure 6b,c, respectively.
For the future prediction step, the spatial modeling variables were updated from the first time period to the second time period before running the projection. Subsequently, the model predicted the potential changes in the following 30 years based on the 2018 LULC, assuming that there is no emergency, such as policy changes. The logistic regression algorithm was sufficient in the evaluation of the spatial relationships between the observed LULCC and predictor variables, generating the transition probability maps. According to our LULC predictions, the water bodies almost remain without changes. Therefore, this class is discarded. Thus, when we address the 2018-2048 period, we consider the change of built-up and agricultural land. It is predicted that built-up lands will cover 29% and 33% of the total land area by 2033 and 2048, respectively.
As clarified in Table 7, the results showed that the trend of urban growth on the agricultural land is persistent. For instance, the built-up area will increase by about 14% (285 km 2 ) and 18% (359 km 2 ) by 2033 and 2048, respectively. In other words, the size of the urban cover will more than double through the 2018-2033 period. On the other hand, the agricultural land will decrease by the same percentage within the same study periods. The probable urban growth over the agricultural land (LULCC) within the time periods-2018-2033 and 2018-2048-is demonstrated in Figure 7a,b respectively. Moreover, as Figure 7 shows, the built-up area will be persistent around the centers of the governorate and the infrastructure, e.g., roads and railway stations.  For the future prediction step, the spatial modeling variables were updated from the first time period to the second time period before running the projection. Subsequently, the model predicted the potential changes in the following 30 years based on the 2018 LULC, assuming that there is no emergency, such as policy changes. The logistic regression algorithm was sufficient in the evaluation of the spatial relationships between the observed LULCC and predictor variables, generating the transition probability maps. Ac-   Based on the results of the prediction, Figure 8 represents the size of the urban cover over the eight districts in 2018, 2033, and 2048. It reveals that Mahalla Kubra and Tanta are the districts suffering the most from urban growth dynamics, while Samanod suffers the least.

Linear Regression Analysis
A regression analysis was used to quantitatively estimate the relationship of the agricultural land change with the built-up area change [64] for the eight districts, as shown in Figure 9. The scatterplot illustrates the negative strong relationship, where the correlation coefficient (R 2 ) equals 0.73 and 0.99 for the study periods of 1991-2003 and 2003-2018, respectively (Figure 9a,b). Furthermore, the predicted LULC demonstrates the same trend of correlation that the linear regression confirmed; the totally negative relationships where R 2 = 1 for the future study periods of 2018-2033 and 2018-2048 (Figure 9c,d).

Linear Regression Analysis
A regression analysis was used to quantitatively estimate the relationship of the agricultural land change with the built-up area change [64] for the eight districts, as shown in Figure 9. The scatterplot illustrates the negative strong relationship, where the correlation coefficient (R 2 ) equals 0.73 and 0.99 for the study periods of 1991-2003 and 2003-2018, respectively (Figure 9a,b). Furthermore, the predicted LULC demonstrates the same trend of correlation that the linear regression confirmed; the totally negative relationships where R 2 = 1 for the future study periods of 2018-2033 and 2018-2048 (Figure 9c,d).
A regression analysis was used to quantitatively estimate the relationship of the agricultural land change with the built-up area change [64] for the eight districts, as shown in Figure 9. The scatterplot illustrates the negative strong relationship, where the correlation coefficient (R 2 ) equals 0.73 and 0.99 for the study periods of 1991-2003 and 2003-2018, respectively (Figure 9a,b). Furthermore, the predicted LULC demonstrates the same trend of correlation that the linear regression confirmed; the totally negative relationships where R 2 = 1 for the future study periods of 2018-2033 and 2018-2048 (Figure 9c,d).

Analysis of Fuzzy TOPSIS
Based on the Fuzzy TOPSIS analysis and the criteria selected to evaluate the susceptibility of the districts to the urbanization risk, the closeness coefficient CCi, computed for each alternative, is shown in Table 8. The results showed that the Mahalla Kubra and Tanta districts are the most susceptible to the urbanization risk, while Samanod is the least susceptible. These results fit the results of the logistic regression-Markov prediction model. To clarify, the districts predicted to have bigger urban expansion are themselves certain to be highly susceptible to urbanization risk based on Fuzzy TOPSIS analysis.
It means that the results of the Fuzzy TOPSIS analysis can be adopted. On the other hand, if the area is not covered by the satellite images, Fuzzy TOPSIS can be employed to evaluate the districts' susceptibility to the urbanization risk, in order for the decision-makers to be informed of the districts having the priority for sustainable planning, depending on the criteria analysis. Thus, producing the Urbanization Risk Map without satellite data is possible. Figure 10 shows a visual representation of the districts with their corresponding susceptibility level to urbanization risk. This confirms that Fuzzy TOPSIS analysis is superior for the production of an Urbanization Risk Map.  Figure 10. The Urbanization Risk Map. Figure 10. The Urbanization Risk Map.

Discussion
Based on the quantification of the LULC changes throughout the study area over the study period (27 years), from 1991 to 2018, the agricultural land was dominant. However, rapid and ongoing urbanization has occurred. The mounting population necessitates the construction of facilities and the development of transportation systems to meet the increasing social and economic requirements. This is why policymakers and planners are particularly concerned with urbanization patterns, as they are crucial for effective decision-making.
The LULC maps for the years 1991, 2003, and 2018 emphasize a persistent trend of urban growth in the study area. The built-up land almost doubled, while the agricultural land exhibited an inverse pattern over the study period, where the urban expansion corresponded to loss of agricultural land. The decreasing trend of agricultural land was more significant during the second time period (2003-2018) compared to the first time period (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003). This mentions that there is increased demand or pressure on land by the sprawling population due to increased rural-urban migration, and Gharbia governorate especially does not have a desert yard that permits urban expansion. Many previous studies have suffered from encroachment on arable and already-cultivated lands [13,34]. Therefore, the government seeks to deal with the crisis.
According to this study, the change in water bodies would be negligible. Therefore, the transition potential from agricultural land to built-up land, expressing the phenomenon of urban sprawl, was focused. We selected four spatial modeling variables as possible driving forces of LULCC, which have been specifically defined as urban sprawl parameters. In the model validation, the Kappa and Jaccard coefficients reached 88% and 52%, respectively. The Jaccard coefficient statistic was preferred to cross-check the results [71], as it considers how much correspondence-spatially and with quantitatively-there is between the two layers. In other words, it is a statistic which is used to gauge the similarity and diversity of sample sets. Therefore, the Jaccard coefficient is more realistic and reliable, as it considers the location of the LULCC, not only its quantity. The Jaccard value is affected by LULC classes, the annual rate of change, and transition steps [69]. The usually acceptable values are >60%. However, the model can be adopted for simulation and prediction. The urban sprawl on the agricultural land through the first period (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003) was much lower than the urban growth experienced through the second time period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) due to the lawlessness that happened in the country during the 2011 revolution. Therefore, the quantity of the actual urban area in 2018 was higher than the simulated one. Subsequently, the Jaccard coefficient of 52% in this study can be accepted, and the model can be used for further simulation and future projection. It is worthwhile to mention that if the acquired satellite images were chosen at close and equal intervals, and there is no sudden boom in a certain period (as in our case), the logistic regression-Markov model would give more satisfying results, and the validated outputs would match the actual land-use image spatially and quantitatively. We adopted the logistic regression owing to its demonstrated accuracy when modeling LULC transitions, and the unique advantage of its unneeded prior knowledge compared to other different models. The previous studies' results showed that the logistic regression model was qualified for the production of realistic results in the respective study areas [33][34][35]. For instance, Hamdy et al. [34] applied a hybrid approach of the Markov chain and logistic regression models to define future urban spread in 2037 in Abouelreesh, Aswan, Egypt. The consequences of their paper highlight the growing risks of urban expansion and the obstacles which object to the sustainable urban growth strategies recommended for this area. Furthermore, Arsanjani et al. [33] adopted a hybrid model including logistic regression, MC, and CA, which was prepared to upgrade the performance of the basic logistic regression model. The simulated map fitted the actual one by 89%, which was satisfactory for the adoption of the calibration process. The simulated maps show a fresh wave of suburban evolution in the proximity of Tehran over the coming decades. Nguyen et al. [35] also used the same hybrid model of the logistic regression, MC, and CA models for the prediction of the future land-use change in Nam Dinh Province, Vietnam. The hybrid model was sufficient, and the outcomes of the analysis present worthy knowledge for local planners and policymakers, supporting their efforts in organizing other sustainable urban planning schemas and environmental management policies. Therefore, this approach could be adopted for the identification of patterns of LULC changes in the Gharbia governorate.
The predicted maps show that the same trend of agricultural land loss due to urban growth can be expected for 2033 and 2048. A similar pattern of agricultural land diminution could be evident in the preceding studies. Our study did not only focus on monitoring and forecasting the urban expansion like all of the previous studies; it also adopted Fuzzy TOPSIS analysis based on criteria selected to evaluate the susceptibility of the districts to the urbanization risk, which is effective for the data-gap regions. In other words, if the satellite data are scarce for the area of interest, Fuzzy TOPSIS can evaluate the districts' susceptibility to the urbanization risk, in order for the decision-makers to be informed of the districts with the priority for sustainable planning. Thus, producing the Urbanization Risk Map without satellite data is possible. Otherwise, if the area of interest is covered by satellite images, our methodology outperforms other methodologies due to the further validation performed for the prediction results from the logistic regression-Markov model based on Fuzzy TOPSIS; the predicted size of the urban extent in a district corresponds to the level of urbanization risk for that district.
According to our prediction results, if the current condition of urban sprawl continues at the pace of 2018, about 285 km 2 of agricultural land will change into built-up land by 2033, and a further 73 km 2 by 2048, which could cause a considerable change to the social and economic conditions in the area. The built-up lands will cover 29% and 33% of the total land area by 2033 and 2048, respectively. This is a negative indication of the loss of agricultural land, hence the food security issue, especially with the population growing in a random way that is not proportional to the land area and the loss of agricultural land. If the government does not address urban encroachment on agricultural land, built-up land will represent more than a quarter of the total area by 2033, and a third by 2048. This rate of urban sprawl is a cause for concern.
The continuous and rapid urbanization over the cultivated land has serious consequences for agricultural productivity and the sustainability of the country's economic development [14]. The Egyptian government has lately initiated a number of projects to reclaim new desert lands to meet the rising demand for food due to the country's rapid population increase. These attempts can be regarded as a possible solution to the existing issue. However, it is not the best solution, as we mentioned before that the newly reclaimed lands do not compensate for the corresponding deduction of the old ones, because the soil is less fertile and has much lower nutrient levels than the old soil [2]. Moreover, the reclaimed desert land costs a lot to provide the required resources like power, water, and chemical fertilizers to increase the soil fertility, as well as the required labor and safe transportation to remote markets [2,38]. This means creates a need for considerable funding to upgrade the soil fertility. Therefore, most newly reclaimed desert lands are planted with products for export to achieve a profit that compensates for the cost of reclamation, and thus this policy does not participate in solving the problem of food and self-sufficiency for the country [2]. Accordingly, the government has to adopt other strategies to address massive population growth and rapid urbanization. For instance, as suggestions, (1) urbanization could expand vertically in the defined urban area, in addition to conservation of existing fertile soil areas and the maximization of the agricultural productivity of existing cultivated areas; (2) structure sustainable cities outside the borders of the delta, to accommodate the population increase, associated with the required infrastructure for a convenient life. The reconstruction of the desert is the best proposal to relieve the pressure on the delta region, and to stop more urban expansion inside this green zone of the Nile Delta. Accordingly, the expected changes in LULC may provide the proper planning for the practical and sustainable use of resources.
There are three prime sources of uncertainty in machine learning: noise in the data, the incomplete coverage of the domain, and imperfect models [80]. All of the measurements have an amount of uncertainty regardless of the precision and accuracy. Uncertainty is a common phenomenon in many application areas, such as medicine, image processing, and linguistics, etc. [81]. For example (for medicine), frequently, medical diagnosis is not improper, but there are different degrees of disease. Consequently, uncertainty occurs almost automatically in any machine learning application [81]. Therefore, we recommend further research to consider the uncertainty in machine learning methods, particularly in critical domains such as clinical diagnosis, and in safety-critical areas which need output decisions accompanied by a measure that allows the judgement of the certainty of the output.
In addition, another limitation in this study, which is considered as the most evident limitation in machine learning, is the data [82]. Several machine learning algorithms need considerable quantities of data before they can start giving beneficial results [83,84]. A good illustration of this is the neural network. Neural networks are data-consuming machines that demand extensive volumes of training data. Algorithms trained on insufficient datasets have uncertain performance when applied to further unseen data sets [85], and therefore the model will overfit. The overfitting model works very well on the data used to train it but works poorly on new data (test data) [86,87]. So, it's the user's responsibility to regulate the learning process. There are diverse methods to avoid overfitting [88], e.g., (1) training with more data [89], the model will not be capable of overfitting all of the samples and will have to generalize to get the results; (2) data augmentation [90], which makes sample data look rather various each time it is processed by the model. This prevents the model from learning the properties of the datasets; and (3) the data simplification method [86] is utilized to minimize overfitting by reducing the complexity of the model in order to make it adequately simple, such that it does not overfit. The performance of an algorithm relies not only on the data quantity but also on the data quality; hence, algorithms are as good as the data they are trained on [85]. Training algorithms the same way we learn ourselves comes with limitations. Unfortunately, the usage of unrepresentative or insufficient data has an adverse influence on performance. In the same way that a lack of good features can lead to poor algorithm performance, a lack of good ground truth data can also reduce the potential of the model [91]. Accordingly, we recommend depending on algorithms which need less data for training, like SVM, or providing the required data and taking into account the uncertainty.

Conclusions
This study focused on persistent urban growth at the expense of agricultural land in the eight districts of Gharbia governorate. This will cause gross problems later if the current land-use policy is ongoing. Therefore, the government has to find an alternative trend of land-use, e.g., desert development, to stop more urban expansion inside the green zone of the Nile Delta. This is frequently successful in preserving fertile cultivated lands.
In this study, the LULC dynamics in the Gharbia governorate were monitored and assessed over a 27-year study period from 1991 to 2018, and the future trends to 2033 and 2048 were investigated using a hybrid model of MC and logistic regression. Markov chain-logistic regression was sufficient in the modeling of the LULC transitions, where the MC model is very superior in quantifying the transitions and detecting the conversion rates between various LULC types, and logistic regression is capable of providing transition potential maps. There was a trend of rapid urbanization over the agricultural land. In the prediction of the future dynamics of LULC the same trend continues, indicating negative aspects such as the issue of food security. The agricultural land is declining at a worrisome rate; it occupied 91.2% of the total land area in 1991, and will decline to 65.8% in 2048. This persistent urbanization will affect the climate of the region (increasing the temperature and affecting residents' life) and cause the imbalance of ecosystem services. Furthermore, the loss of agricultural land may put the agricultural activity of the governorate at risk. Therefore, the government has to change policy and address such problems, especially in the areas without desert backyards.
Defining the pattern of LULC changes (in terms of quantity and direction) is substantial for the decision-makers and planners for sustainable urban planning and the confrontation of the challenges created by rapid unplanned urban growth. The innovation of this work is as follows: (1) this article presents machine learning-based and state-of-theart remote sensing-based products and methodologies for the continuous monitoring of LULC for sustainable land-use policies, and (2) the current paper highlights the vital role of the Fuzzy TOPSIS method, which eliminates uncertainties in the data and analysis for the production of the Urbanization Risk Map, in order to determine the susceptibility of the districts to the urbanization risk, especially in the case of unavailable satellite images, in order for the decision-maker to be given the districts with the priority for sustainable land-use planning.
The medium spatial resolution of Landsat images is considered a shortcoming. Usually, the free data have a lower spatial resolution which cannot allow the perfect quantification of the LULC dynamics, especially given that the LULCC affects the climate. Therefore, an archive of images is in demand. However, SVM is superior in the classification process. Furthermore, uncertainty is a common phenomenon in machine learning. All of the measurements have an amount of uncertainty regardless of their precision and accuracy. Therefore, we recommend further research to consider the uncertainty in machine learning methods, particularly in critical domains such as clinical diagnosis. We suggest that future studies persist with new data and models to evolve the simulations that address uncertainties in future LULC.