Prediction of Urban Area Expansion with Implementation of MLC, SAM and SVMs’ Classiﬁers Incorporating Artiﬁcial Neural Network Using Landsat Data

: A reliable land cover (LC) map is essential for planners, as missing proper land cover maps may deviate a project. This study is focusing on land cover classiﬁcation and prediction using three well known classiﬁers and remote sensing data. Maximum Likelihood classiﬁer (MLC), Spectral Angle Mapper (SAM), and Support Vector Machines (SVMs) algorithms are used as the representatives for parametric, non-parametric and subpixel capable methods for change detection and change prediction of Urmia City (Iran) and its suburbs. Landsat images of 2000, 2010, and 2020 have been used to provide land cover information. The results demonstrated 0.93–0.94 overall accuracies for MLC and SVMs’ algorithms, but it was around 0.79 for the SAM algorithm. The MLC performed slightly better than SVMs’ classiﬁer. Cellular Automata Artiﬁcial neural network method was used to predict land cover changes. Overall accuracy of MLC was higher than others at about 0.94 accuracy, although, SVMs were slightly more accurate for large area segments. Land cover maps were predicted for 2030, which demonstrate the city’s expansion from 5500 ha in 2000 to more than 9000 ha in 2030.


Introduction
A century ago, only 20% of the world's population resided in urban areas, which is predicted to increase to 70% until 2050 [1,2]. It is also predicted that the widest urban growth happens in developing countries in the coming years [3,4]. The USA's urban population was around 79% in 2000, and has increased to 81% in 2012 [5,6]. Iran's population has increased significantly during the past 60 years from 16.2 million people in 1951 to 75 million people in 2012 [7,8].
Land use change is an important part of global environmental changes [9][10][11][12][13]. Decisionmakers usually wish to manage and predict land use and land cover changes [14]. Land Use (LU) and Land Cover (LC) changes have been happening due to natural and social factors which impact ecosystems considerably [15]. It is very important for urban planning and landscape development to have a spatial understanding about urban expansion [16]. Accurate and up-to-date land cover information is necessary for understanding and evaluating the changes, which are important for development planners [17]. Population structure and the dynamism of motivations affect the land use; especially, the migration of the rural population to the urban suburbs turns rural areas to urban land uses [18,19]. The study of land use/land cover changes is important in various aspects, such as floods [20], land surface temperatures [21], climate change [22], reduced plant and animal diversity [23,24] and other issues.
Predicting and modeling urban expansion have been the subject of different studies since the 1990s [25]. Remote sensing data has been increasingly used as a useful source for detecting land use and land cover on local, regional, national, and global scales [26]. One of the most useful methods for information extraction is the Land Use/Land Cover (LULC) classification utilizing multispectral data, which is done based on pattern recognition techniques [27]. Remote sensing can effectively monitor short-term and long-term processes, patterns, and the consequences of land usage [28]. Estoque and Murayama [29] showed that the knowledge resulted from remote sensing and Geographical Information Systems (GIS) techniques about spatial pattern and the severity of urban land changes (for example, land use/land cover change from unbuilt lands to built-up areas) may support urban planners and decision-makers [30]. During the past decades, monitoring land cover changes has been among the hottest subjects in the field of remote sensing and GIS [31].
GIS provides new solutions, especially through implementation of specific models that open new horizons to the researchers for urban expansion analysis [32]. Many land use/land cover change models have been developed, aiming at precise prediction of changes [14]. Many researchers consider Urban Growth Models (UGM) as a regional planning instrument for the quantitative analysis of complicated urban systems [33]. Such methods have been developed in recent years and have been used widely in natural resources management and urban planning [34].
Classification algorithms of satellite imagery are conducted by different classifiers. Understanding which algorithm performs better, or can identify land cover more reliably, is a demanded research topic. Different satellite data, such as Landsat, MODIS [35], and IKONOS [36] can be used to produce land cover maps. Landsat satellite images are available free of charge with reasonable resolution (compared to most other free data), with a long-term global record since decades, and continuous data acquisition. Therefore, in this study, Landsat images were used to evaluate the algorithms of land cover classification and prediction.
This study addresses the problem of diversity in land cover classification techniques and unknown reliability of land cover prediction methods. This is achieved by comparison between three classifiers and the benefit of long-term constant data record of Landsat satellite data, which facilitates the classification of land cover and prediction of changes from 2000 to 2030. A comparison between predicted and classified land cover maps of 2020 could evaluate the reliability of prediction method.

Case Study
Urmia is a city in West Azerbaijan province located in the north-west of Iran (57 • 43 longitude, 36 • 12 latitude) with around 100 km 2 area. According to the latest census, the population of this city was around 800,000 in 2018 [37]. Historical documents and archeological evidences indicate that the habitation in this city dates back to the 1st millennium BC when a castle called Armate (Oramiyat) has been formed for the first time in the current location of Urmia [38]. About 45% of Urmia's employed population is involved in agricultural industry; in addition, tourism industry is one of the main economic sources of the city, so that Urmia has had the most growth among Iranian cities during recent decades [39,40]. Immigration caused an irregular growth of the city and has led to vertical transformations (apartment life) in the city with an irregular pattern [41]. According to the studies, areas 1, 3, and 4 of the city can be expanded [42]. Figure 1 shows the case study area.

Materials and Methods
Landsat satellite series provide the longest record of satellite observations. Therefore, Landsat is a valuable instrument for monitoring global changes and is the main source of land observations at medium spatial resolution for decision-making. Preprocessing of images was performed for transforming satellite data to ground surface reflection in order to facilitate extraction of ground surface information using existing algorithms. This study investigates different classification methods (pixel-based, subpixel, and segmentation) for land cover mapping and finally predicts land cover changes for 2030. Maximum Likelihood (MLC) algorithm, Spectral Angle Mapper (SAM), and Support Vector Machines (SVMs) were used for the pixel-based method, the subpixel method, and the segmentation method respectively [43][44][45].
The state of the land cover was mapped using three classifiers and the results were used to predict the feature state of land cover by Cellular Automata Artificial Neural Network.

Maximum Likelihood (MLC) Algorithm
Supposing that each class's reflective values have a normal distribution (Gaussian) in each band, the MLC algorithm allocates one pixel to the class with the most likelihood [46]. Pixel likelihood is analyzed by multivariate normal Cumulative Distribution Function (CDF) through mean, variance, and covariance of training samples [46,47]. MLC equation is represented in Formula (1): where: n = the number of multispectral bands, X = unknown measurement vector, V i = covariance matrix for each training class, M = mean vector of each training class.

Spectral Angle Mapper (SAM) Algorithm
"The Spectral Angle Mapper (SAM) algorithm is based on an ideal assumption that a single pixel of remote sensing image represents one or more certain ground cover materials, and can be uniquely assigned to ground cover classes. The SAM algorithm is based on the measurement of the spectral similarity between two spectra" [48]. SAM measures the similarity values of the spectral angle between the reference spectrum and each image pixel spectrum. SAM algorithm can be represented as a mathematical formula that shows a pure impulse function between an image subpixel and a reference reflection spectrum [47,49]. Equation (2) shows the SAM algorithm: where: D = angle between the reference spectrum (sample points spectrum) and image, X = image spectrum, Y = reference image spectrum (sample points spectrum).

Support Vector Machines' (SVMs) Algorithm
SVMs are a machine learning algorithm that has been suggested by Vapnik and Corinna Cortes in 1995 [50]. "An SVM was initially used for the classification of hyperspectral images but was later used to classify multispectral images. SVMs separate classes using a decision surface, called hyper planes that maximize the differences between the classes" [51]. Considering the principle of Structural Risk Minimization (SRM), K core function is introduced for drawing a nonlinear problem with low dimensions to a linear problem with high dimensions through a specific transformation for obtaining hyperplane [52][53][54]. where: T is the training set, (O i , I i ) is the ith study point, I i is input value and O i output value, I n and O n are input and output sets, respectively, m is the number of training points, n is the number of dimensions.
Landsat images (7 and 8) of the city Urmia captured on 03.06.2000, 30.05.2010, and 02.06.2020 were used. Images were downloaded from Earth Explorer website (https: //earthexplorer.usgs.gov/) free of charge. OLI images were preferred for 2020 due to the strip lines problem of ETM+ data. First, primary preprocessing of geometric and radiometric corrections (including de-striping and reflectance calculation) were conducted. FLAASH (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes) algorithm was implemented for atmospheric correction. Then, existing maps, high-resolution images (such as Google Earth and Esri's World Imagery data), NDVI (Normalized Difference Vegetation Index), and NDBI (Normalized Difference Biomass Index) were visually analyzed to select sample areas. In the next step, three MLC, SAM, and SVMs' algorithms were used for classifying the images, and the parameters of accuracy assessment were obtained. According to the classifications of three acquisition dates, changes were mapped and investigated for 2000-2010 and 2000-2020 periods. A 2020 land cover prediction map was obtained using Artificial Neural Network according to three classification algorithms in order to evaluate the prediction method. Accuracies of the results were evaluated based on the actual change maps (classification). At last, a predicted land cover map was provided for 2030.

Prediction of Land Cover
Prediction of land cover was carried out by Cellular Automata (CA)-ANN model implemented in the "Modules for Land Use Change Simulations" (MOLUSCE) plugin of QGIS. MOLUSCE is a free plugin for QGIS 2 that predicts the land cover change according to the method suggested by [55,56].
The simulations of land cover change and prediction were conducted using the CA-ANN model. ANN was used to determine the transition probability of land cover using multiple output neurons for simulating multiple land cover changes, within the structure of CA-ANN. CA was used to model the land use changes by applying the transition probabilities from the ANN learning process.
First step: was the definition of inputs to Neural Network for simulation. A simulation is based on a cell (pixel) and each cell introduces a set of n attributes (spatial variables) as the inputs to the neural network. Spatial variables can be defined as: where x i is the ith attribute and T is the transition. In this study, classification maps of 2000, 2010, and 2020 were the set of spatial variables, and predicted maps of 2020 and 2030 were provided from transition periods of 2010-2020 and 2020-2030. Second step: was to calculate the correlation between spatial variables by a raster comparison, such that any raster of the first variable is evaluated by a raster of the second variable.
Third step: was the simulation of the transition probability by ANN. Neural Network structure is made of three layers: input, hidden and output. In the hidden layer, the signal received by jth neuron and net j (k,t) is calculated from the input layer for kth cell at a specific time as: where W i,k are the weight between input and hidden layers, and X' i (k,t) is the ith feature scale related to ith neuron in the input layer according to the kth cell at time t.
Fourth step: was to use the transition probability for modeling the land cover change by CA simulating. Figure 2 shows the flowchart of the research method.  Figure 3 shows classification results of three classification methods. The first row shows MLC results, the middle three images represent SAM classification results, and the third row show the SVMs' classification results for 2000, 2010, and 2020, respectively, from left to right.

Results
Maps provided by MLC and SAM methods are more similar to each other for 2000 and 2010. In contrast, the 2020 land cover map provided by MLC and SVMs are more similar to each other. Despite differences among results, all three methods agree that Urmia city was expanded during the investigated period (2000 to 2020). Table 1 shows summary statistics related to the accuracy of classifications. According to Table 1, classification accuracy is higher than 90% for MLC and SVMs' methods, so that overall accuracy and kappa coefficient are close to 1 for 2020 classifications. SAM's classification results are less accurate. Although, overall accuracy of 2020 SAM's results is less than 80%, the producer's accuracies are higher than 90% for built-up, rocky and vegetation classes, whereas bare soil's classification is around 81% accurate. Among the methods which were used, the MLC method with 95.81% overall accuracy and 0.9430 kappa coefficient has been better than the others. Although, the SVMs performed very close to MLC with overall accuracy and kappa coefficient at 94.9% and 0.9369, respectively.      Figure 6 is the diagram of land cover classes resulted from the SAM classification. According to this diagram, urban and rocky areas have been increased, vegetation has been decreased, and bare soils have been decreased at 2010 and increased at 2020. Figure 7 is the diagram of land cover classes provided by the SVMs' method. According to this diagram, urban area was increased during 2000-2010 period, rocky lands did not significantly change, while vegetation and bare soil have been decreased. During 2010-2020, all land cover classes have increased except the bare soil, which has extremely decreased in extent. Finally, Artificial Neural Networks (ANN) were used to predict land cover changes. A 2020 prediction was implemented using 2000 and 2010 classification results to ensure and evaluate the artificial neural network functionality for such prediction. It is then compared to the land cover map provided from 2020's classification. Figure 8 shows 2020's land cover maps (upper images) and 2020's land cover prediction maps (lower images) provided by an Artificial Neural Network based on MLC, SAM, and SVMs' methods, respectively, from left to right.  Figure 9 shows the producer's accuracy diagram of 2020's prediction map. According to this diagram, the accuracy of all land cover predictions is higher than 80%. Most classes were predicted at accuracies higher than 85%. Producer's accuracies of MLC was around 95% with the exception of bare soil (84%). SAM has the lowest producer's accuracy for all classes. MLC has performed the best in all land cover classes except for bare soil that was predicted more accurately by SVMs. Comparing the overall accuracies shows that MLC results were more accurate than SVMs, and SAM was least accurate among the investigated classifiers.

Discussion and Conclusion
Accurate land cover maps of land surface features are conceivable for decision-making in many cases, as 80% of human decisions have a spatial aspect [57,58]. Land cover is a part of spatial information and issues. Utilizing different satellite images can provide different results. In addition, using different classification methods can produce different maps with different accuracy levels. Hence, reliability of data and methods should be evaluated for producing and predicting land cover maps.
This study has compared MLC, SAM, and SVMs' algorithms for land cover classification and prediction. Classification using MLC and SVMs have represented higher accuracies than SAM classifier.
MLC and SVMs have represented accuracies around 95% and more for most classes, whereas the SAM method has provided maps with accuracies between 70% and 80%, which does not indicate a high reliability as a land cover map. Investigating the overall accuracies and kappa coefficients also indicates the superiority of MLC and SVMs over SAM algorithm. The mean of kappa coefficients and overall accuracies demonstrates the MLC's slight preference to the SVMs. The highest degree of accuracy was related to vegetation. The lowest degree of accuracy was related to bare soil. The reliability of built areas detection is very high at around 98% accuracy for MLC and SVMs. It can also be concluded that built-up and vegetation are classified more accurately by SVMs whereas rocky and bare soil are better identified by MLC classifier. SVMs could classify classes with large extents more accurate than MLC but MLC performed better for small area land cover classes. This may be explained by the dependency of MLC to the data distribution in feature space. Classes with large extents suffer from high diversity of real world and affect the MLC performance, while SVMs, as non-parametric classifiers, are less dependent on data distribution.
This study showed that urban expansion could be predicted with over 90% accuracy for 2020. Still, the highest accuracy is related to vegetation class. Bare soils' change could be predicted by SVMs' algorithm with 93% accuracy, while the other two classifiers performed less than 85% producer's accuracy, which was the least accurate class prediction in this study. Overall accuracy of MLC's prediction was the best, closely followed by SVMs. According to the results of this study, MLC could be suggested for prediction of builtup, rocky, and vegetation classes, and SVMs for soil classification prediction. A hybrid approach could include the benefits of parametric and non-parametric classifiers. It can be also considered to split the soil class to homogenous sub classes, which are expected to be better classified by MLC.
This study showed that the Urmia urban area grows rapidly, so that it will increase from around 5500 ha in 2000 to over 9000 ha in 2030. This indicates that many natural and agricultural resources will be occupied by the city and many plant and animal species may be endangered. In this case, vertical city expansion was suggested by some researchers [59][60][61][62].
A comparison to similar works indicate similar results in terms of the accuracy of classification and prediction. Saputra