Spatially-Explicit Simulation of Urban Growth through Self-Adaptive Genetic Algorithm and Cellular Automata Modelling

This paper presents a method to optimise the calibration of parameters and land use transition rules of a cellular automata (CA) urban growth model using a self-adaptive genetic algorithm (SAGA). Optimal calibration is achieved through an algorithm that minimises the difference between the simulated and observed urban growth. The model was applied to simulate land use change from non-urban to urban in South East Queensland’s Logan City, Australia, from 1991 to 2001. The performance of the calibrated model was evaluated by comparing the empirical land use change maps from the Landsat imagery to the simulated land use change produced by the calibrated model. The simulation accuracies of the model show that the calibrated model generated 86.3% correctness, mostly due to observed persistence being simulated as persistence and some due to observed change being simulated as change. The 13.7% simulation error was due to nearly equal amounts of observed persistence being simulated as change (7.5%) and observed change being simulated as persistence (6.2%). Both the SAGA-CA model and a logistic-based CA model without SAGA optimisation have simulated more change than the amount of observed change over the simulation period; however, the overestimation is slightly more severe for the logistic-CA model. The SAGA-CA model also outperforms the logistic-CA model with fewer quantity and allocation errors and slightly more hits. For Logan City, the most OPEN ACCESS

A diverse range of statistical methods have been developed to select such variables and parameters; these include logistic regression [5], spatial logistic regression [27], multi-criteria evaluation (MCE) [28] and hybrid models [29].While the integration of spatial statistics and cellular automata has laid the foundation of modelling land use dynamics, it can become a challenge to define suitable transition rules or the relevant parameter values and to construct the architecture of the models [11].For instance, the MCE and logistic regression methods do not remove the effects caused by the multi-collinearity amongst the spatial variables [11,15].Principal components analysis (PCA) can extract independent components of information for the set of independent variables, but PCA does not guarantee that the principal components are relevant for the dependent variables [30].Consequently, results generated by CA models incorporating spatial statistics may not mimic the actual land use patterns, making them ineffective in capturing the spatial dynamics of urban land use change [6,15].
A CA model can be developed using a hybrid method due to the advantages of its spatial explicitness implied by the key drivers and the powerful computation capacity inherited from artificial intelligence [31].The development of genetic algorithms (GA) has provided researchers with new ways to identify and to search for suitable transition rules and their defining parameters in urban modelling [31][32][33][34][35][36].A GA can be used to search for an optimal solution to a problem based on the mechanics of natural genetics and natural selection [37].Compared with other evolutionary methods, such as particle swarm optimisation [15], the unique feature of the GA exists in its operators, including selection, crossover and mutation.Substantively, a GA is a randomised method rather than a simple random operation, because historical information is used to speculate on new candidate solutions [37].
In practice, the GA technique has been applied to address various optimisation problems of geographical systems [38][39][40][41][42][43].In CA-based urban research, the general method of integrating cellular automata with GA was initially proposed in [44] and subsequently adopted to parameterise the Markov cellular automata model for land use change simulation [45].The physical meanings of the "genes" in the CA model were defined in [33], which were then applied to retrieve the geographical CA transition rules.A modified GA was developed to search for optimal parameters and neighbourhood rules for a Markov chain model to simulate the spatio-temporal urban landscape change processes in China's Daqing City [45].Their results show that the combination of the GA and Markov model is capable of capturing the spatio-temporal trend in the landscape pattern associated with urbanization for their region.More recently, GA was applied to calibrate the SLEUTH model of urban growth, which not only reduced the computation time in model calibration, but also improved the simulation accuracy of the model [34].In another application, a pattern-calibrated and GA-optimized CA model was developed, which incorporates the percentage of landscape, patch metric and landscape division into the fitness function of the GA model [46].The GA optimisation technique has also been combined with statistical techniques to calibrate an urban CA for a small urban settlement of northwest Spain; their work shows that the model can be adapted to urban areas with various characteristics and dynamics [47].
This paper presents a method for optimising the land use transition rules and parameters of a cellular automata urban growth model using a self-adaptive genetic algorithm (SAGA).The model was applied to simulate the spatio-temporal pattern of non-urban to urban land conversion in Southeast Queensland's Logan City, Australia.Section 2 presents the study area and the data collected and processed for the modelling practice.Section 3 introduces the modelling approach, which includes a CA urban model, followed by the SAGA used to optimise the spatial parameters of the CA transaction rules.The optimisation process is guided through a fitness function to minimize the differences between the simulated and the observed urban growth.Section 4 presents model outcomes followed by discussions and conclusion in the last two sections.

Study Area and Data
Logan City in South East Queensland, Australia, was selected as the case study site to apply the SAGA-CA model to simulate its land use evolution from non-urban to urban during the 1991-2001 period.Logan City is situated between the state capital city of Brisbane to the north and Gold Coast City to the south.It also borders Redland City to the northeast, the City of Ipswich to the northwest and the Scenic Rim to the southwest (Figure 1).Originally established in 1978 as a local government area, Logan City's land size tripled in 2008, due to major changes to the local government in Queensland [48].The current land size is 913 km 2 with a total population of 278,000 in 2011 [49].
Two land cover maps were collected from the Department of Natural Resources and Mines of the State of Queensland and used in this study, one representing the land cover in 1991 and one in 2001.These land cover maps were the product of the Statewide Landcover and Trees Study (SLATS) program funded by Queensland Government [50].Landsat TM imageries acquired between June and September 1991 were used for the baseline land cover data, with a partial update of the baseline land cover dataset for 2001.The raw imagery has a nominal spatial resolution of 30 m.These raw imageries were pro-processed through several stages of automated and semi-automated image, classification together with visual interpretation, field calibration and validation [51,52].The final product was resampled to a spatial resolution of 25 m for release to the public [50,51].Three land categories were identified from the SLATS land cover data in 1991 and 2001: urban (consisting of settlement areas with more than a population of 200, delineated using an analysis of digital cadastral boundaries based on lot size and population statistics), non-urban (consisting mostly of pasture land, woody vegetation and bare lands), and excluded (consisting of crop land and water) (Figure 2).As the focus of this research is on land use change from non-urban to urban states, the presence of water bodies and crop land were considered as a hard constraint to urban growth; these areas were excluded from being urbanised.However, spatial proximities to water and crop land were taken as factors impacting urban growth.Table 1 lists the topographical and spatial factors, as well as the neighbourhood and stochastic factors used in the CA urban model.Other data collected include a 1-second DEM from the national Shuttle Radar Topographic Mission (SRTM) data and data on urban centres, main roads, railway lines and stations collected from the relevant local government authorities.A land slope data layer was generated from the DEM.The spatial proximity factors to urban centres (d_centres), to main roads (d_road), to railway lines (d_rail), to railway stations (d_railstn), to agricultural land (d_agri) and to water bodies (d_rivers) were generated using the Spatial Analyst Tools in ArcGIS from the relevant data sources.In addition, an external impact factor (I_external) measuring the impact of other urban centres in the neighbouring regions (including Brisbane, Ipswich, Gold Coast and Redland cities) on Logan City's urban growth was also considered in the model.This factor was quantified using the Journey to Work data from Logan to the surrounding regions; the earliest data available is in the 2006 census [53]; hence, this data was used as a proxy to quantify the external impact to urban growth of the city.A large external impact factor value indicates that more people have travelled to work outside the region rather than within the region.All data were processed to raster grids at 25-m spatial resolution to match with the spatial resolution of the SLATS land cover data; the values of the spatial variables were normalised to a range between 0 and 1 inclusive (Figure 3).

Method
The modelling framework consists of a generic CA urban growth model, which is linked to an optimisation module using SAGA to search for an optimal set of transition rules and parameters.The optimal set of rules and parameters will then be used by the CA model to simulate the dynamic process of urban growth (Figure 4).Each iteration of the model represents one year.

Cellular Automata Model
The generic CA urban growth model was initially configured using the logistic regression approach [54], where the land use conversion probability of a cell at location i from time t to the next time (t + 1) is represented as: where: P t i represents the overall land use conversion probability of cell i from time t to the next time; F t i is a land use conversion probability at location i from time t to the next time that is determined by the topographical features of the area and its spatial proximity to facilities and services.The topographical features impacting urban land conversion include slope and elevation, while the spatial proximity factors include distances to town centres, roads, railways, rivers and prime agricultural land.Therefore, F t i can be written as: where: a = a j (j = 0, 1, …, k), a 0 is a constant; a j (j = 0, 1, …, k) are parameters representing the impact of the j-th factor x t ij (including both topographical and spatial factors) on land use conversion probability at location i at time t.The values for a j were initially generated using the logistic regression method [54]; they were subsequently optimised using the SAGA optimisation approach.N t i is a land use conversion probability at location i from time t to the next time due to neighbourhood support.In this research, a square neighbourhood with m × m cells was adopted; the probability that a cell develops from one state to another was defined as a proportion of the accumulative state of urban cells within the m × m neighbourhood.For the case study applied in this research, the neighbourhood size is 5 × 5 cells.

(
)( ) where the processing cell i is not considered as a neighbouring cell, i.e., j ≠ i. Con represents a condition where simulated urban growth cannot occur at certain locations, such as large water bodies and land used primarily for farmland, which is prevented from undergoing urban development by land use planning regulations.Con takes on Boolean values of either 0 or 1, with 0 indicating that the cell is excluded from being urbanised and 1 indicating that the cell can change from non-urban to urban during a time step.
R is a stochastic disturbance factor on urban development, which is defined as [1,25,26]: where r is a random real number that ranges from 0 to 1, while a is a real number that ranges from 0 to 10 and controls the effect of the stochastic factor on urban development.

Encoding the Chromosomes
Chromosomes are the abstract representations of candidate solutions.A chromosome is a set of parameters that defines a proposed solution to the problem the GA is trying to solve.In the CA urban modelling practice, all possible CA transition rules impacting on urban growth are considered as chromosomes.Each chromosome is coded by a simple string as: where C represents a candidate solution; k is the total number of topographic and spatial variables (as in Equation ( 2)); β 0 is a constant, that is, a "gene" in a candidate solution; β 1 to β k represent the evaluation score of each variable in a candidate solution.The optimal candidate solution values of β 0 to β k are the parameter values of a j (j = 0, 1, …, k)) used in Equation ( 2).Initially, a number of chromosomes are randomly generated to form the possible solutions for the SAGA to begin its searching process.After many generations of selection, crossover and mutation operations, only those chromosomes that acquire better fitness values remain, resulting in the emergence of an optimal chromosome structure.

The Fitness Function
A fitness function is used to evaluate and quantify the optimality of a solution during the optimisation process [55].The fitness function is an objective function to quantify the difference between simulated and observed urban growth patterns.An optimal set of solutions is achieved when the fitness function value, that is, the difference between the simulated and observed urban growth pattern, is minimised.It is created by selecting sample cells, within the cellular urban space, to minimize the differences between the simulation results produced by the model and the observed urban growth patterns identified from remotely-sensed imagery.The optimisation objective is written as: Where: D(C) is the fitness function for the candidate solution C; M is the number of samples that were selected from the cellular urban space and used to retrieve the CA transition rules; P t i is the global conversion probability of the state of cell i during time t to t + 1, as defined in Equation (1) for candidate solution C; and U i is the reference conversion of cell I, which can only take one of the two values, 0 or 1, with U i = 0 meaning that the state of the cell i persists as non-urban and U i = 1 meaning the state of the cell changes from non-urban to urban for candidate solution C.
The simulation process of urban growth can be calibrated by dynamically updating the various parameters of the transition rules to minimize the value of the fitness function, so that the simulated urban patterns match the observed patterns of urban growth.The calibration process of the model is completed once the fitness function reaches a stable value over generations.At that point, the model's transition rules and their defining parameters can be considered optimised for operation in the CA model.

Selection, Crossover and Mutation
GA uses natural selection rules, including selection, crossover and mutation, to search for and optimise a solution from a set of chromosomes.Selection is the key operation in which individual genomes are chosen from a population of candidate solutions for later breeding, including recombination and crossover.Individual solutions are selected through a fitness-based process during each successive generation, where solutions with better fitness values are more likely to be selected.The smaller the difference between the simulated and observed results, the better the fitness value.Crossover is an exchange of genetic material between homologous individuals for final genetic recombination.Mutation is a genetic operator used to maintain genetic diversity from one generation of a population of chromosomes to the next.However, standard GA commonly uses fixed and non-interactive selection, crossover and mutation rates, which can be problematic, because such operators cannot be modified during the search and optimisation process [37,56,57].A SAGA can overcome this problem, because SAGA keeps population diversity and ensures the existence of all possible solutions in the solution domain; SAGA also ensures the identification of an optimal solution and improves the performance of the local and premature convergences [58,59].In addition, the SAGA improves the search speed and precision of the standard GA and, hence, accelerates the search and optimisation process for the problem solutions.The three operations used in the SAGA are illustrated as follows (Figure 5).Specifically, the selection operator used in this research was a ranking method that retains the best set of individuals that remain unchanged in the next generation of the selection operation [60].The crossover and mutation operators are defined through a probability measure, which changes in accordance with the fitness values [59].The crossover probability Pc and mutation probability Pm are defined as: where: Pc1 and Pc2 are the maximum and minimum crossover probabilities, the values of which were set to 0.95 and 0.45, respectively; Pm1 and Pm2 are the maximum and minimum mutation probabilities, the value of which were set to 0.1 and 0.001, respectively; fmin and fmax are the minimum and maximum fitness values, respectively; and favg is the average fitness value.

Model Implementation
A total of 25,000 sample pixels were selected randomly from the candidate region, meaning the entire study area minus the existing urban areas, agricultural land and water bodies in 1991.Therefore, the candidate region is areas that potentially can be developed into urban land use from 1991 onwards.Each sample includes a land use change variable from 1991 to 2001, as well as the external impact factor, distance factors, slope and DEM, as listed in Table 1 and shown in Figure 4.The land use change variable is a bi-value (with 0 or 1 value) indicating whether the state of a land cell has changed from non-urban to urban during 1991-2001 (in which case, the value is 1) or not (in which case, the value is 0).These sample data were used to build the logistic regression model (termed logistic-CA hereafter), which generates the initial values of a j (j = 0,1,…,k).This initial set of parameters was then used to form the possible solutions for the SAGA to construct the fitness function and begin its searching and optimisation process.

The Optimal Chromosome/CA Transition Rules
Figure 6 shows the fitness track in the evolutionary computation of the SAGA module, which initially converges rapidly to the best fitness line.After 200 generations, the fitness value stabilized with the best fitness value of 47.1 and a mean fitness value of 47.2 (Figure 6).The optimal chromosome (i.e., the optimal set of CA parameters) demonstrated the different impacts of the various factors on urban land use conversion in Logan City.According to Equation (2), a j (j = 0, 1, …, 9) is a string of parameters representing the constant, I_external, d_centres, d_agri, d_road, d_rail, d_railstn, d_rivers, slope and DEM, respectively.A negative parameter of a j leads to a larger F t i value, that is, a higher probability for a cell to convert from a non-urban to an urban state.Likewise, a positive a j value results in a smaller F t i ; hence, a lower probability for the cell to convert into an urban state the next time.The parameter values optimised by the SAGA show that the external impact factor has the largest positive value, with a 1 (I_external) = 1.19, which is larger than the 1.02 generated by the logistic regression model.This positive value indicates that a place with a large number of people commuting to work out of the place is associated with a smaller probability of urban growth at that place.This is an opportunity cost for a place in Logan to be urbanised, as more people commute to work outside the region.On the other hand, the distance from existing urban centres correlates negatively with the probability of urban growth.This is reflected by the lowest negative value of a 2 (d_centres) = −1.13;such an effect is stronger in the SAGA-CA model than the logistic regression-based CA model, given the value of −0.98 generated by the logistic regression approach.Likewise, elevation, land slope and distances to roads and railway stations are also associated negatively with the probabilities of simulated urban growth, with a 9 (DEM) = −1.12, a 8 (SLOPE) = −0.34, a 4 (d_road) = −1.05 and a 6 (d_railstn) = −1.11,respectively.However, close proximity to agricultural land, rivers or railway lines tends to decrease the probability of simulated urban growth with a 3 (d_agri) = 0.98, a 7 (d_rivers) = 0.68 and a 5 (d_rail) = 0.58 after optimisation by the SAGA.Hence, the closer a cell is to rivers or railway lines, the less likely the cell is to be developed as an urban state.

Simulation Accuracies of the SAGA-CA Model
Using the 1991 land use map as the initial input data, the SAGA-CA model was operated with a set of land use transition rules, optimised by the SAGA, to generate a series of land use patterns.Each iteration of the model represented one year.Hence, the result representing urban land use pattern in Logan City in 2001 was generated after 10 iterations of the model.
The simulated urban growth map during 1991-2001 produced by the SAGA-CA model was compared with the reference map of urban growth during 1991-2001 generated from the SLATS land cover data [50]; this was realised through the comparison of three maps using the approach introduced in [61].This produces a set of measures to evaluate the correctness and error of the simulation output (Figure 7a).Here, hits (H) indicate that growth areas from 1991 to 2001 shown on the reference maps were simulated as growth; misses (M) indicate that growth areas shown on the reference maps were simulated as persistence (i.e., no change in simulated state over the two time points); false alarms (FA) indicate that the persistence shown on the reference maps were simulated as growth; correct rejections (CR) indicate that persistence shown on the reference maps was simulated as persistence; and excluded indicates the union of water and agriculture at the base year in 1991.Figure 7b illustrates the simulation correctness and error by the logistic-CA model without SAGA optimisation.To assess the simulation accuracies of the model, existing urban areas, agricultural land and water bodies in 1991 were excluded from the calculations.According to the SAGA-CA model, many of the false alarms are near the misses; this is different from the result generated by the Logistic-CA model, where many false alarms are rings around the excluded region.Figure 8 shows the size of the misses, hits, false alarms and correct rejections for the candidate region by the SAGA-CA model in comparison with the result by the logistic-CA without SAGA optimisation.
Overall, the SAGA-CA model generated 86.3% correctness, mostly due to observed persistence being simulated as persistence (79.2%) and some due to observed change being simulated as change (7.1%).The 13.7% simulation error was due to either observed persistence being simulated as change (7.5%) or observed change being simulated as persistence (6.2%).The observed change (OC) accounts for 13.3% of the candidate region, whereas the simulated change (SC) by the SAGA-CA model accounts for 14.6% of the region.Compared to the logistic-CA model, the SAGA-CA has resulted in slightly less overall error, 13.7% versus 15.1%.Figure 8 shows that both the logistic-CA and SAGA-CA models have simulated more change than the amount of observed change, while the overestimation is slightly more severe for the logistic-CA model.
The simulation errors generated by the model can also be presented through the measures of error due to allocation (A) and error due to quantity (Q) [61][62][63][64][65].The error due to quantity measures how much less than perfect is the match between the observed and simulated quantity of change.The error due to allocation measures how much less than maximum is the match in the spatial allocation of the changes, given the specification of the quantities of the changes in the observed and simulated change maps [65].Amongst 13.7% of the total errors between the SAGA-CA simulated and the actual land use change for the 1991-2001 interval, 1.3% was due to quantity disagreement; the other 12.4% were due to allocation disagreement.The quantity disagreement occurred due to the SAGA-CA model simulating slightly more growth than the reference growth.The results also show that the SAGA-CA model outperforms the logistic-CA model with fewer quantity and allocation errors and slightly more hits (Figure 9).

Discussion
Cellular automata techniques have been developed to explore urban land use evolution over the last two decades, and the reliability of the models has been studied [16,35,62].The simulation accuracies of an urban CA model can be affected by the methodologies used in retrieving the transition rules, the spatial and thematic resolution of the model, as well as the physical, socio-environmental and institutional situations of the areas under study.
Firstly, the simulation accuracy of the model is subject to the characteristics of the region under study.In this research, three thematic categories (i.e., urban, non-urban and excluded) were extracted from a secondary data source (i.e., the SLATS land cover data) to simulate the process of land use change from non-urban to urban.These three land use categories are highly generalised from the more complex land use types on the ground.
The simulation accuracy of the CA-based urban models is also sensitive to the resolution of cells [15,66].Regardless of other factors impacting on the simulation accuracies of the CA model, a model configured at 250-m resolution can generate lower simulation accuracies compared to models configured at 30-m cell resolution [15].The low simulation accuracy of the model at a coarse scale is usually due to the isolation of urban cells at such a scale, where only a small number of isolated urban cells can be identified in the initial input data [15].
In addition, the methodology adopted in retrieving the CA's transition rules is crucial for plausible simulation results.A number of new methods have been developed to capture land use dynamics and to improve simulation accuracy.By integrating the SAGA method into a typical logistic regression-based CA model, the integrated model is capable of taking into account feedback from individual "genes" during the modelling process, leading to the identification of a set of optimised transition rules.

Conclusions
Spatially-explicit simulation of urban land use change has attracted widespread interest in recent years with the focus on the spatio-temporal dynamics of the urban system and its land use evolution.Many CA-based urban models have been developed and applied in various situations to simulate the dynamic change of urban land use over time.However, it remains challenging for urban modellers to identify suitable transition rules reflecting the driving factors on land use change in the modelling practice.
This paper contributes to the field by developing an urban CA model with its transition rules optimised by a SAGA.It builds on the evolutionary computation technique [58,59] by implementing a set of essential driving forces to urban land use change and using SAGA to optimize the design configuration, similar to evolutionary processes.Consequently, a set of optimised transition rules and their defining parameters were identified and used to simulate the spatio-temporal process of land use evolutions.For the application of the model in Logan City in Queensland, Australia, most of the areas changed from non-urban to urban occurred around the existing urban centres, due to the effect of existing urban centres attracting more growth within their proximity.Likewise, the spatial proximity to roads and railway stations are also important factors attracting urban growth.On the other hand, the probability of a place being urbanised is lower when more people are attracted to work in other regions, mostly in Brisbane, Ipswich and the Gold Coast cities.In addition, spatial proximities to agricultural land, to rivers and railway lines also discourage an area from developing into urban land use.The application of the SAGA-CA model to Logan City demonstrates that the SAGA technique can increase simulation accuracy compared to a conventional logistic method, because the SAGA technique optimizes the transition rules of an urban CA model for simulating urban growth.

Author Contributions
All authors contributed to the writing of the paper.Data processing, analysis and modelling were performed by Yan Liu and Yongjiu Feng.Reviewers' comments were responded to by all authors.

Figure 1 .
Figure 1.Logan City in South East Queensland, Australia.

Figure 2 .
Figure 2. Logan City land use change from 1991 to 2001.

1 R. 1 Figure 3 .
Figure 3. Spatial variables used in the SAGA-CA model in Logan City, Australia.

Figure 4 .
Figure 4.The SAGA process searching for the optimal CA transition rules.

Figure 5 .
Figure 5.The selection, crossover and mutation operations.Here, A, B, C and D are candidate solutions; Pc and Pm are the crossover and mutation probability, respectively.

Figure 6 .
Figure 6.Fitness track of the SAGA model.

Figure 7 .
Figure 7. Simulation correctness and error based on the reference growth versus the simulated growth during 1991-2001.(a) The simulation correctness and error by the SAGA-CA model; (b) the simulation correctness and error by the logistic-CA model without SAGA optimisation.

Figure 8 .
Figure 8. Misses, hits and false alarms in the non-urban area of 1991, where SC denotes simulated change and OC denotes observed change from 1991 to 2001.

Figure 9 .
Figure 9. Quantity error, allocation error and hits in the non-urban area of 1991.

Table 1 .
Variables used to compute land conversion probabilities.

Table 2 .
Optimized CA parameters by the SAGA in comparison with parameters generated by the logistic regression model.