Improving an Urban Cellular Automata Model Based on Auto-Calibrated and Trend-Adjusted Neighborhood

: Accurately simulating urban expansion is of great signiﬁcance for promoting sustainable urban development. The calculation of neighborhood effects is an important factor that affects the accuracy of urban expansion models. The purpose of this study is to improve the calculation of neighborhood effects in an urban expansion model, i.e., the land-use scenario dynamics-urban (LUSD-urban) model, by integrating the trend-adjusted neighborhood algorithm and the automatic rule detection procedure. Taking eight sample cities in China as examples, we evaluated the ac-curacies of the original model and the improved model. We found that the improved model can increase the accuracy of simulated urban expansion in terms of both the degree of spatial matching and the similarity of urban form. The increase of accuracy can be attributed to such integration comprehensively considers the effects of historical urban expansion trends and the inﬂuences of neighborhoods at different scales. Therefore, the improved model in this study can be widely used to simulate the process of urban expansion in different regions.


Introduction
With the growth of the economy and population, the world is experiencing rapid urban expansion (UE) [1][2][3]. Although the urban land area only accounts for a small part of the Earth's total surface area, UE has greatly changed the natural landscape, simultaneously introducing a series of ecological and environmental problems [1,4,5]. Accordingly, the UE model is an important tool for understanding urbanization, for evaluating the ecological and environmental effects of urbanization, and for urban planning [6][7][8]. Therefore, developing models that can accurately simulate UE is essential for future sustainable urban development [9][10][11].
The cellular automata (CA) model is widely used to simulate UE due to its simplicity, transparency, and flexibility [12,13]. In the CA model, the future state of a pixel is determined by the current state of the pixel and those of its neighboring pixels [14]. The basic idea of the CA model is to first calculate the probability of non-urban pixels being converted into urban pixels based on the conversion rules and neighborhoods [15][16][17]; then, constrained by the number of future urban pixels, the non-urban pixels that will be converted into urban pixels in the future are allocated according to the probability of conversion [11,18,19]. The CA model can effectively characterize and simulate the spatial decision-making processes related to the interaction between neighborhoods and can effectively simulate the spatiotemporal complexity of UE [20][21][22]. At present, many CA models have been applied to simulate UE; examples include the slope, land use, exclusion, urban extent, transportation, and hillshade (SLEUTH) model [23], logistic-CA model [9], conversion of land use and its effects (CLUE) model [24], future land use simulation (FLUS) model [25][26][27], and land-use scenario dynamics-urban (LUSD-urban) model [19].
Computing the neighborhood is an important step in using the CA model to simulate UE [28][29][30]. At present, the most common neighborhood calculation method uses the number of urban pixels in a certain range of a chosen non-urban pixel divided by the number of pixels in the range (except the pixel itself) to reflect the influence intensity of neighboring urban pixels on the non-urban pixel. However, this calculation method considers only the number of urban pixels around non-urban pixels; this approach is highly simplistic and leads to low simulation accuracy. For this reason, researchers have improved this calculation method in several ways. For example, Pan et al. [31] used neighborhoods of different sizes and shapes to improve the simulation accuracy. Liao et al. [32] proposed a neighborhood attenuation CA model based on particle swarm optimization to improve the simulation accuracy. Nevertheless, while the above two methods consider the impacts of urban land on non-urban land at different distances, the existing neighborhood calculation methods still do not account for differences in the impacts of new urban land in different periods on the neighborhood or the effects of different neighborhood attenuation calculation methods on the results. Thus, the simulation accuracy needs to be further improved.
Recently, two new approaches (i.e., the trend-adjusted neighborhood algorithm and the automatic rule detection (ARD) procedure) were developed to improve the calculation of neighborhood effects for CA-based UE models. The trend-adjusted neighborhood algorithm can distinguish the impacts of new urban land in different periods on the neighborhood. In general, the pixels surrounding recent UE areas are more likely to be converted into urban pixels than the pixels surrounding older UE areas [25]. Longterm satellite observations at national and global scales have confirmed this trend [33][34][35]. Recently, Li et al. [36] weighted neighborhoods based on historical UE trends to develop a trend-adjusted neighborhood algorithm and integrated this algorithm into the commonly used Logistic-CA model, thereby developing the Logistic-Trend-CA model. This model considers the influence of UE trends and more accurately simulates UE by adjusting the weights of neighborhoods.
In addition, the ARD procedure comprehensively considers the influences of different neighborhood attenuation calculation methods on the results, which improves the UE simulation accuracy. According to Tobler's [37] first law of geography, "everything is related to everything else, but near things are more related than distant things." Hence, a closer unit has a higher impact on the development of the central unit than does a more distant unit. Similarly, there exists a distance attenuation effect in the neighborhood area of the urban CA model [30,38], but choosing the optimal attenuation parameter is always difficult for the CA model. Accordingly, Roodposhti et al. [17] developed an ARD neighborhood calculation algorithm based on the Simulation for Land Use Change Using R (SIMLANDER) modeling framework to consider the neighborhood attenuation effect and automatically calibrate the relevant parameters of the attenuation law, thereby improving the simulation accuracy.
The purpose of this research is to integrate the trend-adjusted neighborhood algorithm and the ARD procedure to improve the simulation accuracy of CA-based UE models. To achieve this goal, we first improved the method of calculating the neighborhood effects based on the LUSD-urban model, a constrained CA-based UE model, by integrating the trend-adjusted neighborhood algorithm and ARD procedure. Then, we used the original and improved models to simulate UE for selected sample cities undergoing rapid urbanization in China. Finally, we evaluated the accuracy of these methods and discussed their advantages and disadvantages.

Data Preprocessing
The data used in this study included urban built-up area data, land cover data, digital elevation model (DEM) data, and auxiliary geographic data ( Table 1). The urban built-up area data originated from the long-term urban built-up area data set released by Gong et al. [34] (http://data.ess.tsinghua.edu.cn/, accessed on 10 November 2020), with a spatial resolution of 30 m and an overall accuracy (OA) of over 90%. We used data from six years: 1985, 1990, 1995, 2000, 2010, and 2015. The land cover data originated from the GlobalLand30 product (http://www.globallandcover.com/, accessed on 10 November 2020). We then reclassified the land cover data into eight types: cropland, forest, grassland, wetland, rural construction land, bare land, water, and urban. The DEM data were derived from the Geospatial Data Cloud Platform of the Computer Network Information Center of the Chinese Academy of Sciences (http://www.gscloud.cn, accessed on 10 November 2020), with a spatial resolution of 30 m. The auxiliary geographic data included administrative boundary data, urban center points, roads, and rivers from the National Basic Geographic Information Center (http://ngcc.sbsm.gov.cn/, accessed on 10 November 2020). To ensure data consistency, all data were resampled to a resolution of 100 m with a unified Albers projection ( Figure 1).

Data Preprocessing
The data used in this study included urban built-up area data, land cover data, digital elevation model (DEM) data, and auxiliary geographic data ( Table 1). The urban built-up area data originated from the long-term urban built-up area data set released by Gong et al. [34] (http://data.ess.tsinghua.edu.cn/, accessed on 10 November 2020), with a spatial resolution of 30 m and an overall accuracy (OA) of over 90%. We used data from six years: 1985, 1990, 1995, 2000, 2010, and 2015. The land cover data originated from the Global-Land30 product (http://www.globallandcover.com/, accessed on 10 November 2020). We then reclassified the land cover data into eight types: cropland, forest, grassland, wetland, rural construction land, bare land, water, and urban. The DEM data were derived from the Geospatial Data Cloud Platform of the Computer Network Information Center of the Chinese Academy of Sciences (http://www.gscloud.cn, accessed on 10 November 2020), with a spatial resolution of 30 m. The auxiliary geographic data included administrative boundary data, urban center points, roads, and rivers from the National Basic Geographic Information Center (http://ngcc.sbsm.gov.cn/, accessed on 10 November 2020). To ensure data consistency, all data were resampled to a resolution of 100 m with a unified Albers projection ( Figure 1).

Original LUSD-Urban Model
The LUSD-urban model calculates the probability of non-urban pixels being converted into urban pixels by considering the suitability factor, inheritance factor, neighborhood, and random interference factor of UE and then simulates the spatial distribution of UE [19]. Specifically, the probability t P K,x,y that a non-urban pixel (x,y) of land use type K is converted into an urban pixel at time t can be expressed as: where t S i,x,y represents the suitability factor i (1, . . . , m − 2) and W i is the weight of i. t N x,y represents the neighborhood effect, and W m−1 is its weight. The neighborhood effect can be expressed as: where t W c represents the influence weight of non-urban pixel (x,y) converted into an urban pixel at time t by an urban pixel at distance C of the neighborhood. The closer the distance, the greater the t W c ; thus, t W c can be expressed as the reciprocal of the K power function, K = 1, 2, 3, . . . . t G c is a binary variable. If the type of pixel at distance C is urban land, then t G c = 1; otherwise, t G c = 0. A is a scalar used to normalize t N x,y between 0 and 100. t I K,x,y represents the inheritance of pixel (x,y) of land use type K at time t, and W m represents its weight. t EC r,x,y represents natural constraints such as ecological reserves, permanent basic farmland, and reservoirs. t PC l,x,y signifies policy restrictions such as places of interest and prohibited boundaries of development. t V x,y is a random interference factor. To compare the performance of different models, we did not consider the influence of the random interference factor in simulating UE.

Improved LUSD-Urban Model with Trend-Adjusted Neighborhood
We referred to Li et al. [36] and assumed that the probability of non-urban pixels in the neighborhood of a pixel with recent UE being converted into urban pixels is greater than that of non-urban pixels in the neighborhood of a pixel with older UE being converted. The conversion probability of a pixel is calculated by using the historical path of UE as a weighting factor to calculate the trend-adjusted neighborhood. Specifically, the improved neighborhood effect ( t N x,y ) can be expressed as (Figures 2 and 3): where T u c represents the cumulative existence time of urban pixel at distance C in the interval T.

Improved LUSD-Urban Model with Automatic Rule Detection Procedure
We referred to Roodposhti et al. [17] to revise the neighborhood calculation in the LUSD-urban model with the ARD procedure. The revised neighborhood effect ( t N x,y ) can be expressed as (Figures 2 and 3): where β is the attenuation rate with the value range of (1, 30) and d x,y refers to the distance from non-urban pixel (x,y) in the neighborhood to the central urban pixel. d is a constant indicating the size of the pixel, which depends on the resolution of the data, which is 100 m in this study. CV represents the value of the center pixel, and the calculation method is: where the values of β and i are randomly generated for each simulation. The method used to calculate the neighborhood size R is: In each simulation, a random k value is generated.

Improved LUSD-Urban Model by Integrating the Trend-Adjusted Neighborhood and the Automatic Rule Detection Procedure
We integrated the trend-adjusted neighborhood algorithm and ARD procedure described to improve the neighborhood calculation in the LUSD-urban model (Figures 2-4). The method used to calculate the improved neighborhood effect is: where the T u cup represents the cumulative existence time of the central urban pixel in the interval T. The improved neighborhood effect can not only represent the historical path of UE but also be automatically calibrated.

Simulation and Accuracy Assessment
To verify the effectiveness of the improved model under different socioeconomic and natural conditions, we selected a city in each of China's eight major economic zones for simulation [39]. These cities include Beijing in the Northern Coastal Comprehensive Economic Zone, Shenyang in the Northeast Comprehensive Economic Zone, Huai'an in the Eastern Coastal Comprehensive Economic Zone, Dongguan in the Southern Coastal Economic Zone, Zhengzhou in the Middle Yellow River Comprehensive Economic Zone, Wuhan in the Middle Yangtze River Comprehensive Economic Zone, Chengdu in the Southwest Comprehensive Economic Zone, and Xining in the Great Northwest Comprehensive Economic Zone ( Figure 5).

Simulation and Accuracy Assessment
To verify the effectiveness of the improved model under different socioeconomic and natural conditions, we selected a city in each of China's eight major economic zones for simulation [39]. These cities include Beijing in the Northern Coastal Comprehensive Economic Zone, Shenyang in the Northeast Comprehensive Economic Zone, Huai'an in the Eastern Coastal Comprehensive Economic Zone, Dongguan in the Southern Coastal Economic Zone, Zhengzhou in the Middle Yellow River Comprehensive Economic Zone, Wuhan in the Middle Yangtze River Comprehensive Economic Zone, Chengdu in the Southwest Comprehensive Economic Zone, and Xining in the Great Northwest Comprehensive Economic Zone ( Figure 5). To ensure that the original and improved LUSD-urban model can effectively simulate UE, these models need to be calibrated. We employed the adaptive Monte Carlo method for model calibration, that is, by repeatedly simulating the regional historical UE 500 times, using the recall coefficient and landscape indices as the evaluation criterion, and finally choosing the optimal weight parameters from the result that are the closest to the actual situation [40]. We selected the recall coefficient and four landscape indices, including edge density (ED), landscape shape index (LSI), fractal dimension index (FDI), and clumpiness index (CLUMPY) to construct an evaluation system [41,42]. The equation used to calculate the recall coefficient is: where TP is the number of pixels converted from non-urban to urban pixels under both simulated and actual conditions, and FN is the number of pixels simulated as being converted from non-urban to urban pixels but are not actually converted into urban pixels in reality. The equations used to calculate ED, LSI, FDI, and CLUMPY are: To ensure that the original and improved LUSD-urban model can effectively simulate UE, these models need to be calibrated. We employed the adaptive Monte Carlo method for model calibration, that is, by repeatedly simulating the regional historical UE 500 times, using the recall coefficient and landscape indices as the evaluation criterion, and finally choosing the optimal weight parameters from the result that are the closest to the actual situation [40]. We selected the recall coefficient and four landscape indices, including edge density (ED), landscape shape index (LSI), fractal dimension index (FDI), and clumpiness index (CLUMPY) to construct an evaluation system [41,42]. The equation used to calculate the recall coefficient is: where TP is the number of pixels converted from non-urban to urban pixels under both simulated and actual conditions, and FN is the number of pixels simulated as being converted from non-urban to urban pixels but are not actually converted into urban pixels in reality. The equations used to calculate ED, LSI, FDI, and CLUMPY are: Given G i = g ii ∑ m k=1 g ik (13) where E represents the total length of the urban patch boundary in the landscape, and A represents the total landscape area. p ij represents the perimeter (m) of patch ij and a ij represents the area of the same patch. g ii represents the number of like adjacencies (joins) between pixels of patch type (class) i based on the double-count method. g ik represents the number of adjacencies (joins) between pixels of patch types (classes) i and k based on the double-count method. p i represents the proportion of the landscape occupied by patch type (class) i. The absolute deviation of the landscape indices between the 500 simulated urban expansion results and the actual urban expansion results ∆ SI was calculated as: where SI represents the value of landscape indices of the actual urban expansion result, and SI n represents the value of landscape indices of the n-th simulation result. We sorted the recall coefficient in descending order and arranged ∆ ED , ∆ LSI , ∆ FDI , and ∆ CLUMPY in ascending order to get the ranking R Recall , R ED , R LSI , R FDI , and R CLUMPY of each indicator of each simulation result. We calculated the weighted average of all rankings and arranged the weighted averages in ascending order to obtain the final rank R n of the simulation results.
Specifically, using 2000 land-use data as the basic data, we calculated 7 suitability factors in total: distance from the city center, distance from railways, distance from national and provincial roads, distance from county and township roads, distance from rivers, elevation, and slope (Appendix A Figure A1). We entered these suitability factors into the LUSD-urban model. The Monte Carlo method was then used to generate 500 sets of weights to simulate UE from 2000 to 2010, and the actual land use data in 2000 and 2010 were compared with the results. Finally, the set of weights with the highest weighted average rank of the recall coefficient and landscape indices was selected as the best set (Appendix A Tables A1 and A2). Based on the best set of wights, we used the original LUSDurban model, the improved model based on the trend-adjusted neighborhood algorithm, the improved model based on the ARD procedure, and the improved model by integrating the trend-adjusted neighborhood algorithm and ARD procedure to simulate the UE of the eight selected cities from 2010 to 2015. In the processes of calibration and simulation, the urban land demand for each sample city was obtained from the long-term urban built-up area data set released by Gong et al. [34]. Then, we used the above indicators to compare the accuracy of the results from the original LUSD-urban model and the different improved neighborhood methods.

Results
Among these four models, the improved LUSD-urban model by integrating the trendadjusted neighborhood algorithm and ARD procedure achieved the highest average rank: the average recall coefficient was 0.78, the average scores of ED, LSI, FDI, and CLUMPY were 2.53, 29.95, 0.0031, and 0.1124, respectively (Figures 6 and 7a, Table 2). Such a high accuracy represents that the improved model can effectively simulate urban expansion with a high degree of spatial matching and similar urban form. The improved LUSD-urban model with the trend-adjusted neighborhood had the second average rank, while the original LUSD-urban model and the improved LUSD-urban model with automatic rule detection procedure tied for the lowest average rank. Specifically, the original LUSD-urban model had the worst performance on the urban form (i.e., landscape indices); the improved LUSD-urban model with automatic rule detection procedure had the worst performance on spatial matching (i.e., recall coefficient). Compared with the original LUSD-urban model, the improved LUSD-urban model by integrating the trend-adjusted neighborhood algorithm and ARD procedure had higher average ranks in seven cities among the eight sample cities (Figure 7b, Table 2). These cities included Huai'an, Dongguan, Chengdu, Xining, Wuhan, Shenyang, and Beijing (Figure 7b, Table 2). In Zhengzhou, the original LUSD-urban model and the improved LUSD-urban model had the same average rank (Figure 7b, Table 2). In terms of the degree of spatial matching (i.e., recall coefficient), the improved LUSD-urban model by integrating the trend-adjusted neighborhood algorithm and ARD procedure had higher ranks in two cities (i.e., Zhengzhou and Chengdu), in comparison with the original LUSD-urban model (Figure 7b, Table 2). For five sample cities (i.e., Huai'an, Dongguan, Xining, Wuhan, and Beijing), the original LUSD-urban model and the improved LUSD-urban model had Compared with the original LUSD-urban model, the improved LUSD-urban model by integrating the trend-adjusted neighborhood algorithm and ARD procedure had higher average ranks in seven cities among the eight sample cities (Figure 7b, Table 2). These cities included Huai'an, Dongguan, Chengdu, Xining, Wuhan, Shenyang, and Beijing ( Figure 7b, Table 2). In Zhengzhou, the original LUSD-urban model and the improved LUSD-urban model had the same average rank (Figure 7b, Table 2). In terms of the degree of spatial matching (i.e., recall coefficient), the improved LUSD-urban model by integrating the trend-adjusted neighborhood algorithm and ARD procedure had higher ranks in two cities (i.e., Zhengzhou and Chengdu), in comparison with the original LUSD-urban model (Figure 7b, Table 2). For five sample cities (i.e., Huai'an, Dongguan, Xining, Wuhan, and Beijing), the original LUSD-urban model and the improved LUSD-urban model had the same ranks of recall coefficient. In terms of urban form (i.e., landscape indices), the improved LUSD-urban model by integrating the trend-adjusted neighborhood algorithm and ARD procedure had higher ranks in seven cities, excluding Zhengzhou, in comparison with the original LUSD-urban model (Figure 7b, Table 2).
The improved LUSD-urban model with trend-adjusted neighborhoods had higher average ranks in seven cities, excluding Huai'an, in comparison with the original LUSDurban model (Figure 7b, Table 2). In terms of the degree of spatial matching, the improved LUSD-urban model with trend-adjusted neighborhoods had higher ranks in only two cities (i.e., Huai'an and Zhengzhou). In terms of urban form, the improved LUSD-urban model with trend-adjusted neighborhoods had higher ranks in six cities, excluding Huai'an and Zhengzhou (Figure 7b, Table 2).
Land 2021, 10, x FOR PEER REVIEW 12 of 17 and ARD procedure had higher ranks in seven cities, excluding Zhengzhou, in comparison with the original LUSD-urban model (Figure 7b, Table 2). The improved LUSD-urban model with trend-adjusted neighborhoods had higher average ranks in seven cities, excluding Huai'an, in comparison with the original LUSDurban model (Figure 7b, Table 2). In terms of the degree of spatial matching, the improved LUSD-urban model with trend-adjusted neighborhoods had higher ranks in only two cities (i.e., Huai'an and Zhengzhou). In terms of urban form, the improved LUSD-urban model with trend-adjusted neighborhoods had higher ranks in six cities, excluding Huai'an and Zhengzhou (Figure 7b, Table 2).
The improved LUSD-urban model with ARD procedure had higher average ranks in three cities (i.e., Chengdu, Xining, and Beijing), in comparison with the original LUSDurban model (Figure 7b, Table 2). In terms of the degree of spatial matching, the improved LUSD-urban model with ARD procedure had a higher rank in only one city (i.e., Zhengzhou). In terms of urban form, the improved LUSD-urban model with ARD procedure had higher ranks in seven cities, excluding Zhengzhou (Figure 7b, Table 2). The improved LUSD-urban model with ARD procedure had higher average ranks in three cities (i.e., Chengdu, Xining, and Beijing), in comparison with the original LUSDurban model (Figure 7b, Table 2). In terms of the degree of spatial matching, the improved LUSD-urban model with ARD procedure had a higher rank in only one city (i.e., Zhengzhou). In terms of urban form, the improved LUSD-urban model with ARD procedure had higher ranks in seven cities, excluding Zhengzhou (Figure 7b, Table 2). The calculation of neighborhood effects in the original LUSD-urban model is empirical and does not consider the historical UE trend (Figure 3). The advantage of using the trend-adjusted neighborhood algorithm instead of the original LUSD-urban model to calculate the neighborhood effects is that historical UE information can be incorporated into the model. This increases the probability that non-urban pixels surrounding areas exhibiting recent UE will be converted into urban pixels ( Figure 3). Furthermore, the advantage of using the ARD procedure instead of the original LUSD-urban model to calculate the neighborhood effects is that the parameters of the neighborhood attenuation law are calibrated. Adjusting these parameters modifies the range of the neighborhood and the probability that non-urban pixels will be converted into urban pixels, making the result more reasonable (Figure 3). Finally, the advantage of integrating the trend-adjusted neighborhood algorithm and ARD procedure is that it both considers the UE trend and calibrates the parameters related to the neighborhood attenuation law. As a consequence, the simulated results based on the improved LUSD-urban model had higher accuracy, especially for representing urban form (Figure 7, Table 2).

Future Perspectives
Our results show that integrating the trend-adjusted neighborhood algorithm and ARD procedure can more accurately simulate UE than the original LUSD-urban model. The advantage of the proposed approach is that by considering the UE trend and setting more appropriate attenuation law-related parameters, the neighborhood effects can be calculated with superior effectivity.
In this study, we improved the calculation of neighborhood effects for a pixel-based urban growth model. Since the patch-based urban growth model can simulate the real process of urban expansion more effectively [43], we will further improve the calculation of neighborhood effects for the patch-based urban growth model in our future research. In addition, we will promote the wide application of the improved LUSD-urban model in various spatiotemporal scales.

Conclusions
This study integrated the trend-adjusted neighborhood algorithm and the ARD procedure to improve the neighborhood calculation of the LUSD-urban model and conducted simulations of eight cities to test the performance of the improved model. The results showed that the novel method of integrating the trend-adjusted neighborhood algorithm and the ARD procedure can simulate UE more accurately than the original LUSD-urban model in terms of both recall coefficient and landscape indices, by considering the UE trend and improving the method for calculating the parameters of the neighborhood attenuation law.    Huai'an  14  15  1  1  2  11  26  26  4  Zhengzhou  23  4  10  3  11  20  17  11  1  Dongguan  10  22  1  5  11  25  1  24  1  Chengdu  4  3  13  3  3  27  2  34  11  Xining  21  5  3  9  6  20  22  11  3  Wuhan  7  7  19  1  1  27  8  26  4  Shenyang  3  2  3  6  3  28  27  19  9  Beijing  11  22  4  1  18  4  3 11 26 Figure A1. Spatial variables for the simulation of urban growth.