Integrating Spatial Markov Chains and Geographically Weighted Regression-Based Cellular Automata to Simulate Urban Agglomeration Growth: A Case Study of the Guangdong–Hong Kong–Macao Greater Bay Area

: Urban agglomeration is an important spatial organization mode in China’s attempts to attain an advanced (mature) stage of urbanization, and to understand its consequences, accurate simulation scenarios are needed. Compared to traditional urban growth simulations, which operate on the scale of a single city, urban agglomeration considers interactions among multiple cities. In this study, we combined a spatial Markov chain (SMC) (a quantitative composition module) with geographically weighted regression-based cellular automata (GWRCA) (a spatial allocation module) to predict urban growth in the Guangdong–Hong Kong–Macao Greater Bay Area (GBA), an internationally important urban agglomeration in southern China. The SMC method improves on the traditional Markov chain technique by taking into account the interaction and inﬂuence between each city to predict growth quantitatively, whereas the geographically weighted regression (GWR) gives an empirical estimate of urban growth suitability based on geospatial differentiation on the scale of an urban agglomeration. Using the SMC model to forecast growth in the GBA in the year 2050, our results indicated that the rate of smaller cities will increase, while that of larger cities will slow down. The coastal belt in the core areas of the GBA as well as the region’s peripheral cities are most likely to be areas of development by 2050, while established cities such as Shenzhen and Dongguan will no longer experience rapid expansion. Compared with traditional simulation models, the SMC-GWRCA was able to consider spatiotemporal interactions among cities when forecasting changes to a large region like the GBA. This study put forward a development scenario for the GBA for 2050 on the scale of an urban agglomeration to provide a more credible scenario for spatial planning. It also provided evidence in support of using integrated SMC-GWRCA models, which, we maintain, offer a more efﬁcient approach for simulating urban agglomeration development than do traditional methods.


Introduction
Recent decades have seen a period of rapid urbanization, particularly in developing countries such as China [1][2][3], but the urban sprawl that results from economic and explosive population growth poses serious challenges to sustainable development [4][5][6][7]. Urbanization, regardless of country or region, has historically tended to follow a pattern: it accelerates at a rate of approximately 20%, slows down when it reaches around 60%, and stabilizes between 70 and 80% [8]. China's National Bureau of Statistics noted that in 2020 the rate had exceeded 60% [9]. When urbanization entered a mature stage, China adopted an organizational model based on the spatial structure of the urban agglomeration to promote high-quality development [10][11][12]. Within this context, models that can simulate Land 2021, 10, 633 2 of 19 growth scenarios are needed; therefore, research into such models has the potential to constitute an important aid to policymakers for promoting sustainable urbanization [13].
A broad spectrum of simulation models has been developed to provide insight into dynamic urban growth scenarios [14,15]. Usually, these contain two modules: one dedicated to quantitative composition and one to address spatial allocation. The former focuses on analyzing changes in quantitative land use, while the latter focuses on analyzing the evolution of spatial patterns [16,17]. While the quantity of land-use change is mainly determined by the development of regional economies and social conditions, the spatial pattern is largely dependent on location conditions [18]. For example, undertaking residential and industrial development requires a supply of land, but that is determined by spatial factors such as terrain and transportation. Because quantitative land-use demand and the corresponding spatial pattern of land use are both determined by these two driving mechanisms, modelers often need to include coupling models. [19][20][21].
Previous studies have used a range of different methods to perform quantitative predictions of urban growth. The quantitative demand module in such models is often calculated by means of an independent external model such as the Markov chain (MC) [22][23][24][25][26], or system dynamics (SD) [27][28][29]. The latter model considers interactions between land-use change and social and economic development and can predict land-use demand in complex situations [30]. However, it first needs to abstract land-use and socioeconomic development into several independent subsystems and then establish the input and feedback mechanisms that link them [31,32]. Although SD can accurately reflect the complex mechanisms in land-use change, it has several important limitations: subsystem abstraction requires a high level of modeling experience, and model calibration requires a large number of historical or empirical parameters [33,34]. The trend interpolation approach (one alternative to the system dynamics model) has similar limitations in that it needs long-term sequential observational data to establish a change curve. In contrast to these two methods, the Markov chain model is able to predict changes in land-use structure based only on the land-use transfer probability. For this reason, the Markov chain method is widely used to predict the size of urban growth [35][36][37].
Usually, Markov chain models are constructed for a single city, so there are obvious risks in employing it on the scale of an urban agglomeration because the development of each city is quite different [38,39]. Consequently, a feasible measure is to establish a Markov process for an agglomeration can be described as a "spatial Markov chain" (SMC) [40,41]. Using this approach to estimate changes in land use entails more than simply building a Markov model for each city. While each city does require its own independent Markov process, the interaction and influence between urban agglomerations must also be considered [42,43] because land-use changes in a given city may be driven by forces exerted by surrounding cities [44,45]. In addition, the process must also be calibrated to reflect a given city's level of urbanization and unique resource and environmental constraints, such as China's "red line" policy, which seeks to ensure ecological security and basic farmland protection, limits the growth of some cities [46,47].
Although the quantitative composition module of an urban growth simulation is very important, an additional module is required to predict the spatial pattern of corresponding land-use changes. The basic allocation module uses either spatial probability, cellular automata (CA), or agent-based modeling (ABM) [48,49]. Among these, an allocation model based mainly on an evaluation of urban growth suitability constitutes the prototype for almost all other spatial allocation models [50]. By assuming that the growth of a city can be described probabilistically, the modeler can, for instance, employ a series of spatial factors (terrain, traffic conditions, distance from existing cities) to estimate the probability of urban sprawl [51,52]. The estimation algorithms include logistic regression [53]; decision trees [54], support vector machines [55], random forest techniques [56], and, more recently, deep learning neural networks [57]. Although the performance of these algorithms is different, they all mine the sample data to obtain the required parameters to estimate suitability. Despite its advantages, though, spatial probability-based allocation models have natural defects [14,15]. For example, if an insufficient number of change samples is collected during model estimation, the developmental probability of a region with ostensibly high potential can be seriously underestimated [58]. Once a spatial allocation model has determined an area's probability, it tends to remain relatively unchanged, and that can cause researchers to miss dynamic effects [18]. This phenomenon is contrary to the findings of many observational studies that suggest that a region can drive the development of surrounding areas once it has reached a developed state. If neighborhood interaction is to be considered, the simulation framework must be able to discretize the whole into individuals, and the model to do this is cellular automata (CA) [48,59], at present the most widely used bottom-up model for simulating urban pattern evolution [49].
In general, the CA model focuses on analyzing spatial dynamic processes, while its counterpart, ABM, focuses more on analyzing the spatial movement of subjects [48]. For land-use change simulation, CA and ABM do not differ significantly from one another, so because this study focuses on the spatial pattern change of urban growth rather than an analysis of land-use change resulting from a particular socioeconomic behavior, we mainly used the CA model. As mentioned above, the essence of the CA approach is to consider neighborhood interaction based on an estimation of spatial probability, and estimating the spatial probability of urban sprawl was the key aim of the CA modeling in this study. The estimation was conducted in the same manner as using a traditional spatial probability allocation model. Whether researchers used the most advanced deep-learning neural network or the most classical logistic regression method to address the scale simulation of an urban agglomeration, the characteristics of geographic differentiation always had to be considered. Therefore, because historical change cannot fully account for a city's future development and since even the best parameter-fitting method will only fit a historical process [60], this study used classical logistic regression to estimate the probability of urban sprawl together with a geographically weighted regression to calibrate cellular automata (GWRCA) because of the urban agglomeration's varied geographical characteristics [61].
As discussed previously, the SMC model predicts land-use change while taking into account interactions between the cities in a network, while the GWRCA simulates largescale urban growth. However, no model has so far been developed that can integrate the SMC and GWRCA methods to simulate dynamic urban agglomeration development despite the fact that agglomerations are cited as an important spatial structure for promoting China's further urbanization. Therefore, this study presents a model framework that couples the SMC and GWRCA techniques, thereby allowing researchers to simulate the development of urban agglomerations more realistically. The remainder of this article is organized according to the following structure. In Section 2, the study area and the dataset for model implementation are introduced. In Section 3, the steps and methods employed in building the model are discussed in detail. Section 4 presents the results and a discussion of the model implementation to validate the proposed urban CA model in simulating urban growth. Finally, the conclusions are presented in Section 5.

Study Area
Extending over a total area of 56,000 km 2 , the Guangdong-Hong Kong-Macao Greater Bay Area (GBA) is composed of two Special Administrative Regions (Hong Kong (HK) and Macao (MO)) and nine cities in Guangdong Province (Guangzhou (GZ), Shenzhen (SZ), Zhuhai (ZH), Foshan (FS), Huizhou (HZ), Dongguan (DG), Zhongshan (ZS), Jiangmen (JM) and Zhaoqing (ZQ)) ( Figure 1). At the end of 2020, the GBA had a total population of over 70 million and a GDP in excess of 10 trillion CNY. The GBA is located in south-central China, on a delta in the Pearl River Estuary formed by the accumulation of sediment from the Xijiang, Beijiang, and Dongjiang rivers and their tributaries. The western, northern, and eastern parts of the GBA are surrounded by hills and mountains, which form a natural barrier, and the southern boundary is a long coastline with many islands. The GBA is the fourth largest bay area in the world after those in New York, San Francisco, and Tokyo. It is also one of China's most dynamic urban agglomerations [43]. The Chinese government requires that by 2050, the modernization level and the capacity of the land and space governance system be improved holistically to form a territory of "intensive and efficient production"; "livable and comfortable living space"; "beautiful ecological space"; and "safe, harmonious, competitive, and sustainable development" [62]. With that aim in mind, this study took 2050 as the target year to reveal the spatial pattern of the GBA through an SMC-GWRCA coupled simulation model, thereby specifying an urban spatial boundary to optimize the GBA infrastructure network.
in south-central China, on a delta in the Pearl River Estuary formed by the accumulation of sediment from the Xijiang, Beijiang, and Dongjiang rivers and their tributaries. The western, northern, and eastern parts of the GBA are surrounded by hills and mountains, which form a natural barrier, and the southern boundary is a long coastline with many islands. The GBA is the fourth largest bay area in the world after those in New York, San Francisco, and Tokyo. It is also one of China's most dynamic urban agglomerations [43]. The Chinese government requires that by 2050, the modernization level and the capacity of the land and space governance system be improved holistically to form a territory of "intensive and efficient production"; "livable and comfortable living space"; "beautiful ecological space"; and "safe, harmonious, competitive, and sustainable development" [62]. With that aim in mind, this study took 2050 as the target year to reveal the spatial pattern of the GBA through an SMC-GWRCA coupled simulation model, thereby specifying an urban spatial boundary to optimize the GBA infrastructure network.

Land Use Cover/Change, Subzones, and Ecological Sensitive Areas
To use the spatial Markov chain model for quantitative urban land-use predictions, it was necessary to ascertain the transfer probability at equivalent time steps. For this purpose, historical Landsat satellite images of the GBA in 1995 and 2015 were collected, and land use was obtained using the human-computer interaction interpretation method for both years (Figures 2a,b). From the land-use change between 1995 and 2015, it was clear that the urban area had grown rapidly over the previous 20 years, and there was little  To use the spatial Markov chain model for quantitative urban land-use predictions, it was necessary to ascertain the transfer probability at equivalent time steps. For this purpose, historical Landsat satellite images of the GBA in 1995 and 2015 were collected, and land use was obtained using the human-computer interaction interpretation method for both years (Figure 2a,b). From the land-use change between 1995 and 2015, it was clear that the urban area had grown rapidly over the previous 20 years, and there was little available land left in eastern Bay Area cities such as Shenzhen and Dongguan. Given these characteristics, we understood that using the Markov process to predict change over the next 20 years would lead to obvious inaccuracies caused by the incompatibility of spatial units. In response, this study took as a reference the "nomenclature of territorial units for statistics" classification used for spatial planning within the European Union (EU) [63]. It allowed us to merge county-level units in the central areas of large cities without destroying the urban boundary and having to construct a new unit for the spatial Markov analysis. For example, Guangzhou originally had 11 county-level units-Yuexiu, Liwan, Haizhu, Tianhe, Baiyun, Huangpu, Panyu, Huadu, Nansha, Zengcheng, and Conghua-but these were reduced to 7 after the spatial reorganization of Yuexiu, Liwan, and Baiyun into a new subzone, "GZ-1." The GBA's 49 county-level units and 2 special administrative regions became 36 standard subzones after the spatial merger ( Figure 2c). To simulate the future growth of an urban agglomeration, it was necessary to not only monitor historical development to discover trends but also establish ecological constraints. The geography of the GBA is characterized by relative height differences of more than 60 m due to mountains, a large river, a reservoir, coastal areas, and high-density forest land. If the maximum urban growth were not to exceed the upper limit of ecologically sensitive areas, all these areas had to be excluded ( Figure 2c). next 20 years would lead to obvious inaccuracies caused by the incompatibility of spatial units. In response, this study took as a reference the "nomenclature of territorial units for statistics" classification used for spatial planning within the European Union (EU) [63]. It allowed us to merge county-level units in the central areas of large cities without destroying the urban boundary and having to construct a new unit for the spatial Markov analysis. For example, Guangzhou originally had 11 county-level units-Yuexiu, Liwan, Haizhu, Tianhe, Baiyun, Huangpu, Panyu, Huadu, Nansha, Zengcheng, and Conghuabut these were reduced to 7 after the spatial reorganization of Yuexiu, Liwan, and Baiyun into a new subzone, "GZ-1." The GBA's 49 county-level units and 2 special administrative regions became 36 standard subzones after the spatial merger ( Figure 2c). To simulate the future growth of an urban agglomeration, it was necessary to not only monitor historical development to discover trends but also establish ecological constraints. The geography of the GBA is characterized by relative height differences of more than 60 m due to mountains, a large river, a reservoir, coastal areas, and high-density forest land. If the maximum urban growth were not to exceed the upper limit of ecologically sensitive areas, all these areas had to be excluded ( Figure 2c).

Spatial Variables for Evaluating Suitable Urban Growth
In addition to analyzing historical land-use change using the SMC approach, this study also evaluated urban growth suitability using the GWR model. Generally, many spatial factors need to be considered in evaluating urban growth suitability [35,52]. However, given that the present study performed such an evaluation on the scale of an urban

Spatial Variables for Evaluating Suitable Urban Growth
In addition to analyzing historical land-use change using the SMC approach, this study also evaluated urban growth suitability using the GWR model. Generally, many spatial factors need to be considered in evaluating urban growth suitability [35,52]. However, given that the present study performed such an evaluation on the scale of an urban agglomeration-especially since we used the Markov chain to reconstruct the spatial structure of the GBA-we felt that it was better to choose relatively few but highly stable spatial factors. As such, only four variables were used to investigate urban agglomeration growth: terrain flatness, distance to existing cities and towns, land-use conversion cost, and urban landscape density. "Terrain flatness", which reflects the spatial conditions for construction, can be calculated as the relative height difference through a digital elevation model (DEM) with a dynamic, 1 km 2 window (Figure 3a). "Distance from existing built-up land" was measured in 1995 mainly using Euclidean distance (Figure 3b). The "cost of land-use conversion" reflects the cost of converting different types of land into "built-up land", and for this measure, we were able to refer to the government's formulation of compensation standards for land acquisition (Figure 3c). "Urban landscape density" reflects the spatial pattern of an agglomeration on a large (regional) scale and can be estimated using a large, 10 km 2 window (Figure 3d). In addition to these four independent variables, urban growth between 1995 and 2015 constituted a dependent variable, and urban growth was a logical variable, which needed to be changed into a probability value before being used in a geographically weighted regression analysis. In this study, we used the Moore neighborhood resampling technique to change the discrete value of 0.1 to a continuous value range from 0 to 1. structure of the GBA-we felt that it was better to choose relatively few but highly stable spatial factors. As such, only four variables were used to investigate urban agglomeration growth: terrain flatness, distance to existing cities and towns, land-use conversion cost, and urban landscape density. "Terrain flatness", which reflects the spatial conditions for construction, can be calculated as the relative height difference through a digital elevation model (DEM) with a dynamic, 1 km 2 window (Figure 3a). "Distance from existing builtup land" was measured in 1995 mainly using Euclidean distance (Figure 3b). The "cost of land-use conversion" reflects the cost of converting different types of land into "built-up land", and for this measure, we were able to refer to the government's formulation of compensation standards for land acquisition (Figure 3c). "Urban landscape density" reflects the spatial pattern of an agglomeration on a large (regional) scale and can be estimated using a large, 10 km 2 window (Figure 3d). In addition to these four independent variables, urban growth between 1995 and 2015 constituted a dependent variable, and urban growth was a logical variable, which needed to be changed into a probability value before being used in a geographically weighted regression analysis. In this study, we used the Moore neighborhood resampling technique to change the discrete value of 0.1 to a continuous value range from 0 to 1.

Methodology
The growth of urban agglomeration includes two main aspects: the urban built-up land scale growth for each city, and the spatial form evolution under the corresponding space scale. In this paper, the SMC and GWRCA were used to simulate urban agglomeration growth, and the model flow chart is shown in Figure 4. The coupling model includes

Methodology
The growth of urban agglomeration includes two main aspects: the urban built-up land scale growth for each city, and the spatial form evolution under the corresponding space scale. In this paper, the SMC and GWRCA were used to simulate urban agglomeration growth, and the model flow chart is shown in Figure 4. The coupling model includes two submodules: quantity prediction and spatial form evolution. Quantity prediction is based on the classical Markov model, but interaction with an urban agglomeration should be considered a transition probability. The spatial form evolution module used a GWR model to estimate the initial growth probability and then used neighborhood interactions of CA to simulate the change in spatial morphology. The relevant model principles are discussed in Sections 2.3.1 and 2.3.2. based on the classical Markov model, but interaction with an urban agglomeration should be considered a transition probability. The spatial form evolution module used a GWR model to estimate the initial growth probability and then used neighborhood interactions of CA to simulate the change in spatial morphology. The relevant model principles are discussed in Sections 2.3.1 and 2.3.2.

Spatial Markov Chain
The Markov chain (MC) is a stochastic process that is often used to predict changes in land-use structure [21,35,64] and a Markov model can be expressed by Equation (1): ( 1) n Q t  represents the quantity of at time t and t − 1, and P n is the transition probability matrix. In general, P n can be extracted from the historical urban growth process of subzone n, but it is necessary to ensure that there is no obvious mutation in the rate of change. Because China expects to conclude rapid urbanization by 2050, we assumed that the simulation of change from 2015 to 2050 could build on the transfer probability obtained from changes from 1995 to 2015. We acknowledged that change is also affected by government policies (for instance, ecological protection measures) and that the transfer matrix P n cannot be increased indefinitely-it must remain within the maximum

Spatial Markov Chain
The Markov chain (MC) is a stochastic process that is often used to predict changes in land-use structure [21,35,64] and a Markov model can be expressed by Equation (1): where Q n (t) and Q n (t − 1) represents the quantity of at time t and t − 1, and P n is the transition probability matrix. In general, P n can be extracted from the historical urban growth process of subzone n, but it is necessary to ensure that there is no obvious mutation in the rate of change. Because China expects to conclude rapid urbanization by 2050, we assumed that the simulation of change from 2015 to 2050 could build on the transfer probability obtained from changes from 1995 to 2015. We acknowledged that change is also affected by government policies (for instance, ecological protection measures) and that the transfer matrix P n cannot be increased indefinitely-it must remain within the maximum increase range of the region. In other words, Q n ≤ Q n max . Q n max is the maximum amount that can be used for urban growth, excluding ecological or farmland protection.
Traditional Markov processes tend to construct one comprehensive land-use transfer matrix, which is effective for a single city but not suitable for an urban agglomeration. Since each city in an agglomeration maintains a relatively independent development authority, a Markov process must be established for each one. However, if the whole agglomeration were simply divided into multiple independent Markov processes, it would not be possible to describe the essential character of the urban agglomeration as a system because each city engages with, and exerts influence over, other cities. To fully consider the space-time interactions among the cities in the GBA, we proposed using a spatial Markov chain. The SMC considered not only the transfer probability of each unit but also the driving effects exerted by surrounding cities. As such, if a spatial Markov analysis showed the growth rate of comprehensive radiation around a city to be higher than the speed of the city's growth, then the city is said to exhibit an accelerating growth trend; on the other hand, if the growth rate of the surrounding cities is lower, then the city may be the growth pole. Therefore, a city may change its own growth speed and potential speed in line with the effects of surrounding cities. The modified P can be calculated by Equation (2): where ∑ n w n p n represents the comprehensive interaction effect from the surrounding cities n' (n' = n); the coefficient w n can be set by the method of inverse distance. ∑ n w n p n − P n is the difference between the transfer value from the surrounding cities and its own value. If this value is positive, it indicates that the surrounding cities exert a pulling effect; if it is negative, they exert a slowing effect or no effect. α is the adjustment coefficient which ranges from 0 to 1, it is used to describe the radiation-driven intensity of the effect of the surrounding subregions (or cities) on subregion n. If α = 1, it means that subregion n is completely affected by its surrounding subregions; conversely, if α = 0, it indicates that subregion n is completely unaffected. Otherwise, it is affected by a degree of interaction effect from surrounding subregions.

Geographically Weighted Regression Based on Cellular Automata
Because the SMC model can only predict the quantity of built-up land in each subregion [60], we also employed a spatial evolution model to simulate the corresponding spatial pattern. Cellular automata (CA), which are able to achieve dynamic results by means of simple rules, have become important in urban growth simulations [65]. A typical CA simulation framework can be expressed as in Equation (3): where U n ij (t) represents the state of cell ij at time t in subzone n, (e.g., 0 means non-growth and 1 means growth); function f is the iterative operation of CA, which produces a series of spatial factors, including the suitability of urban growth, and neighborhood interaction; Q n (t) represents the quantity of the demand for urban land in subzone n at time t; S ij is the suitability of cell ij for urban growth, which is usually estimated by means of data mining techniques such as regression analysis; Ω ij (t) indicates that cell ij is affected by the development of neighboring cells at time t, which can be characterized as development density; Ψ ij denotes policy restrictions that prevent development (e.g., ecological conservation measures).
In a CA model, both Q n (t) and Ψ ij are external input parameters, and Ω ij (t) is only related to the window size calculation. As such, the most important parameter affecting the spatial pattern is suitability, and the simulation scenarios that CA works with rely mainly on the generation of a suitability map. In previous studies, it was commonly estimated by means of sample data mining, which was also used in this paper. In this study, pronounced geographical spatial differentiation was present due to the size of the urban agglomeration we addressed. If the overall parameter estimation were used in such a situation, the individual differences of the study area (the GBA urban agglomeration) could easily be missed; conversely, if each city were analyzed separately, it would split the overall relationship of the groups. Given these limitations, we also employed the classical method of geographically weighted regression (GWR) to calculate the suitability map [61]. The GWR, essentially an extension of the ordinary linear regression (OLR) model, is predominantly used to analyze spatial heterogeneity by considering local autocorrelation. If the local space is extended to the global, then the GWR will be transformed into an OLR. A standard GWR model is as follows: where y ij is a dependent variable describing the probability of land constituting builtup land; ij is the position coordinate of cell ij, which can be expressed as longitude and latitude or as a row-and-column index in raster data format; and β ijk is the k-th regression coefficient. We noted that, as opposed to the OLR, each spatial factor in the GWR had different coefficients. In the above formula, ε ij is the model estimation error term, which satisfied the normal distribution. Using a coefficient matrix and the spatial factors obtained through the GWR, the suitability values of all positions were estimated. In practice, the suitability evaluation cannot be based completely on the results of data mining; it can be fine-tuned in accordance with the strategic aims for spatial development contained in relevant plans for the area. Table 1 shows the results of our size estimate of the GBA urban area and its subregions in 2050. According to the results from the SMC model, the total area will grow to 14,699.61 km 2 , and within that urban agglomeration, Guangzhou will be the largest city because the amount of land available for development is relatively large and the strength of the city's social and economic development will stimulate adequate demand. However, quantity analysis alone cannot provide an image of how the GBA will develop. In addition, from a historical development perspective, our results showed that the GBA will enter the mature development stage by 2050.

Growth Scenario of the SMC-GWRCA Model
Spatial pattern simulation is helpful for scholars to understand development trends in urban agglomerations, and it provides support for decisions regarding infrastructure construction. The growth scenario for China's GBA simulated in this study using the SMC-GWRCA model is shown in Figure 5a. To analyze the reliability of the results further, the simulation pattern for 2050 was transformed into a series of urban growth boundaries (UGBs), which are shown as red lines in Figure 5. Furthermore, the highest resolution image provided by Google Earth in 2019 was used to further review the rationality of the simulation results (Figure 5b,c). From Figure 5, some inconsistencies between the UGB and the actual geographical features can be seen, but these can be attributed to restrictions in data refinement. We acknowledge that, in any practical application, our findings for the simulation results would need to be modified manually to meet these control boundary conditions. However, for the purposes of the quantitative modeling discussed in this paper, we chose to focus our analysis on the overall pattern and not address areas where boundary details were inconsistent. We selected two typical areas from the GBA for discussion, and these are marked as the rectangular zones A and B. Zone A is located in the northeastern part of Zhaoqing, a city on the edge of the GBA. Judging from the remote sensing image, the urban growth that occurred in this area between 2015 and 2019 was almost totally within the UGB. Further analysis revealed the formation of a street network in region A1, and a great deal of built-up land emerged in region B1. In general, our simulation results for the GBA using the SMC-GWRCA model are considered feasible.

Spatiotemporal Interaction of Urban Agglomeration
To evaluate whether the SMC method is more suitable than the MC method for predicting the development of urban agglomerations, we analyzed historical changes in the whole GBA urban agglomeration and compared them with the results of a traditional MC prediction. Table 2 shows the quantity of built-up land in the GBA with respect to historical changes and our forecast. In 1995, an area of about 4683.15 km 2 was dedicated for urban development, but by 2015 this area had grown to 9282.69 km 2 , an increase of 4599.54 km 2 over 20 years. Excluding ecological reserves, the maximum area available for urbanization was 34,563.60 km 2 , which shows that the GBA retained a significant degree of overall development potential. However, the process of urbanization differed among subregions. For example, the current scale of development in Dongguan (DG), Shenzhen (SZ),

Spatiotemporal Interaction of Urban Agglomeration
To evaluate whether the SMC method is more suitable than the MC method for predicting the development of urban agglomerations, we analyzed historical changes in the whole GBA urban agglomeration and compared them with the results of a traditional MC prediction. Table 2 shows the quantity of built-up land in the GBA with respect to historical changes and our forecast. In 1995, an area of about 4683.15 km 2 was dedicated for urban development, but by 2015 this area had grown to 9282.69 km 2 , an increase of 4599.54 km 2 over 20 years. Excluding ecological reserves, the maximum area available for urbanization was 34,563.60 km 2 , which shows that the GBA retained a significant degree of overall development potential. However, the process of urbanization differed among subregions. which severely restrict expansion. In these cases, potential development demands can be transmitted to surrounding cities through radiation effects. The prediction results of the SMC for each subregion showed that the growth rates of some subregions will increase, while others will be expected to slow. For example, in the DG subregion, the MC prediction result was 2031.93 km 2 , which indicated that the subregion will reach its maximum growth limit in the next 20 years, but the SMC prediction suggested only 1864.71 km 2 , a reduction of 167.22 km 2 that can be expected to be transmitted to surrounding cities. Among these subregions, MO is fully urbanized, and without land reclamation, there is no space for growth. In addition, the density of urban development in HK is already very high, and there is a great demand for growth. However, the government's compulsory control makes reserve land resources in HK temporarily unavailable for development. This paper did not consider the problems caused by this complex administrative system. Development intensity in the GBA (total built-up land/total land area) was found to be 7% in 1995 and 14% in 2015, and the results of the SMC modeling projected a rise to about 26% for 2050. Although the overall development intensity was shown to be relatively low, it was different for each city. For example, the development intensity of GZ-1, GZ-2, GZ-3, SZ, FS-1, FS-2, DG, and other regions all exceeded 50%, and some of these subregions were identified as approaching the upper limit of development. Compared with the MC prediction, the results obtained from the SMC increased by 1007.19 km 2 , with an average of fewer than 30 km 2 for 36 units, which can be read as denoting uncertainty or elastic space. Compared with the MC prediction, the results of the SMC show future development trends within the GBA more accurately.

Geographical Spatial Differentiation in Urban Agglomerations
Urban development is very sensitive to spatial conditions like terrain and other spatial factors, and for this reason, it is difficult to simulate real urban spatiotemporal evolution if the urban CA is not corrected by the geospatial environment. In performing a regression analysis, this study took urban spatial change between 1995 and 2015 as the dependent variable, which we analyzed in relation to four spatial variables: terrain conditions, distance to existing urban areas, land-use conversion cost, and urban landscape pattern. In this study, 59,777 training samples were randomly sampled from one sample point per square kilometer. We then used statistical analysis tools provided by ArcGIS to model suitability evolution. Regression results first needed to be adjusted manually before simulation through a procedure where we set to 0 the ecologically sensitive areas where development is prohibited and then enhanced the spatial development probability along the coast of the GBA (the elliptical region in Figure 6) by increasing it by 0.05 for each cell. Due to the large area of the urban agglomeration, there was obvious geographical spatial differentiation. If traditional overall estimation, rather than a local estimation, is used with such preconditions, it may not be able to give an accurate analysis. The spatial driving probability results of the GWR and OLR models are shown in Figure 6. Compared with the OLR's global average estimation, the GWR balanced the development differences in relation to urban agglomerations better. The results of the OLR model indicated that peripheral cities such as Jiangmen, Huizhou, Zhaoqing, are basically devoid of high development probability results. If this kind of spatial probability is used to drive the development of an urban agglomeration, these cities will have reduced development opportunities, which is obviously not in line with the trend toward integrated development within the GBA. Therefore, we believe that it is necessary to consider spatial autocorrelation in urban agglomeration simulation. Moreover, the suitability of data estimation based on historical change can only represent the law of historical processes, which does not mean that future urbanization will be the same. Given this, it is necessary to revise the suitability map manually for regions with obvious spatial development policy regulation; otherwise, the CA model will have difficulty simulating the expected pattern.

Advantages of the SMC-GWRCA for Urban Agglomeration Simulation
To analyze the advantages of the SMC-GWRCA for simulating the GBA spatial pattern in 2050, all parameters of the CA model employed in this paper were set conventionally (e.g., Ω adopted the conventional Moore neighborhood, and neighborhood Ω and suitability S were coupled with equal weight). The total number of iterations of the GWRCA was 200 (10 iterations per year), and each iteration was an equivalent conversion. The simulation patterns for the GBA in 2050 obtained by coupling the SMC-GWRCA, MC-GWRCA, SMC-OLRCA, and MC-OLRCA models are shown in Figure 7. From these results, the spatial patterns simulated by the four models were quite different though they worked under largely the same parameters (except Q and S). To analyze whether the MC or SMC prediction results were more reasonable, the subregion marked as GZ-4 in Figure 7 was used as an example for comparative analysis. GZ-4 is the Nansha New Area, which has been planned as the Vice-Center of Guangzhou, and is bound to be subjected to strong growth trends. Using only the MC method, projected increases up to 2050 are expected to be small due to the low level of historical transformation between 1995 and 2015. On the other hand, the SMC was able to take into account the radiation of high-speed growth cities in the east such as SZ and DG. In the past, the eastern part of the GBA was largely disconnected from the western, but this is no longer the case as the east-west connection across is becoming increasingly convenient: the Hong Kong-Zhuhai-Macao, Humen, and Nansha bridges have all been opened, and the Shenzhen-Zhongshan channel is under construction. From this evidence, we can see that the SMC constitutes a more reasonable method than the MC. Comparing the GWR and OLR methods, the patterns derived through OLR simulation are concentrated around big cities, while the peripheral areas were too scattered, which conclusively demonstrates that the GWR is the more suitable for urban agglomeration simulation.

Advantages of the SMC-GWRCA for Urban Agglomeration Simulation
To analyze the advantages of the SMC-GWRCA for simulating the GBA spatial pattern in 2050, all parameters of the CA model employed in this paper were set conventionally (e.g.,  adopted the conventional Moore neighborhood, and neighborhood  and suitability S were coupled with equal weight). The total number of iterations of the GWRCA was 200 (10 iterations per year), and each iteration was an equivalent conversion. The simulation patterns for the GBA in 2050 obtained by coupling the SMC-GWRCA, MC-GWRCA, SMC-OLRCA, and MC-OLRCA models are shown in Figure 7. From these results, the spatial patterns simulated by the four models were quite different though they worked under largely the same parameters (except Q and S). To analyze whether the MC or SMC prediction results were more reasonable, the subregion marked as GZ-4 in Figure  7 was used as an example for comparative analysis. GZ-4 is the Nansha New Area, which has been planned as the Vice-Center of Guangzhou, and is bound to be subjected to strong growth trends. Using only the MC method, projected increases up to 2050 are expected to be small due to the low level of historical transformation between 1995 and 2015. On the other hand, the SMC was able to take into account the radiation of high-speed growth cities in the east such as SZ and DG. In the past, the eastern part of the GBA was largely disconnected from the western, but this is no longer the case as the east-west connection across is becoming increasingly convenient: the Hong Kong-Zhuhai-Macao, Humen, and Nansha bridges have all been opened, and the Shenzhen-Zhongshan channel is under construction. From this evidence, we can see that the SMC constitutes a more reasonable method than the MC. Comparing the GWR and OLR methods, the patterns derived through OLR simulation are concentrated around big cities, while the peripheral areas were too scattered, which conclusively demonstrates that the GWR is the more suitable for urban agglomeration simulation. From the view of spatial patterns, the development scenario simulation with the SMC-GWRCA model was more consistent with actual requirements. To further evaluate the differences among the four simulation models, the urban agglomeration pattern observed by remote sensing in 2020 was compared with that simulated by each model in this study, and the accuracy was evaluated by the Figure of Merit (FOM) index, which was significantly higher because the detection time was short) [66,67] (Table 3). We found that the accuracy of the SMC-GWRCA model was better than the others for the FOM shape index and prediction accuracy (PA). For example, the FOM index of the SMC-GWRCA From the view of spatial patterns, the development scenario simulation with the SMC-GWRCA model was more consistent with actual requirements. To further evaluate the differences among the four simulation models, the urban agglomeration pattern observed by remote sensing in 2020 was compared with that simulated by each model in this study, and the accuracy was evaluated by the Figure of Merit (FOM) index, which was significantly higher because the detection time was short) [66,67] (Table 3). We found that the accuracy of the SMC-GWRCA model was better than the others for the FOM shape index and prediction accuracy (PA). For example, the FOM index of the SMC-GWRCA was 0.6881, higher than that of classical MC-OLRCA, and the prediction accuracy improved from 72.77 to 81.53%. This indicated that more than 80% of the new units predicted by the SMC-GWRCA from 2015 to 2020 had occurred, and the improved model can better simulate the development of new areas. Overall, the SMC-GWR coupling model was a great improvement over the traditional model (the SMC for quantity prediction and the GWR for spatial form evolution) and showed that the urban agglomeration scale must take into account the interaction between cities and spatial differentiation. Urban agglomeration simulation cannot simply copy the classical CA modeling process.

Conclusions
As urbanization reaches an advanced stage globally, agglomerations have emerged as an increasingly prevalent form of spatial organization. Their development trends must be analyzed to provide a more scientific basis for spatial planning decisions, such as how to optimize infrastructure networks for optimal group connection. However, as the research object shifts from single cities to groups of cities, the scale of the geographical space being studied can become very large, and interactions among cities become more and more obvious, so it is very important to establish an effective simulation method to deal with these. To simulate the spatiotemporal evolution of urban agglomerations, this study introduced a simulation model that integrated the spatial Markov chain and the geographic-weighted regression-based cellular automata methods. The effect of the model was analyzed in relation to the Guangdong-Hong Kong-Macao Greater Bay Area (GBA) urban agglomeration.
This research shows that the spatial Markov chain was fully able to reflect radiationdriven utility among the cities, and taking these intercity interactions into account, the development of the urban agglomeration was reconstructed and projected. Taking the Nansha New Area, the geometric center of the GBA as an example, we noted that little obvious urban growth occurred in the period 1995-2015. However, this area is expected to become the Vice-Center of Guangzhou and is bound to face large-scale growth demands. This kind of demand cannot be predicted by means of a traditional MC process. The SMC approach, however, can deal with the radiation of the development demand of surrounding areas. The Nansha New Area, as the geometric center of GBA, is bound to receive the most intense radiation from surrounding cities. Our comparison of the simulation results obtained under these two methods fully supports this conclusion.
There is no doubt that using a spatial Markov chain model to simulate the quantity structure of urban agglomerations is feasible, but the spatial Markov model also has application limitations. First, it must assume that the transfer process is relatively stable, which is deeply related to the urbanization stage of the agglomeration or its region. If the region has reached the mature stage, even if there has been a past large-scale transformation, this principle cannot be used to predict the next stage of transformation. To eliminate this potential limitation, we drew inspiration from the European Union's nomenclature of territorial units for statistics and merged a number of urban core areas to ensure the trend of analysis unit transfer. Second, the spatial Markov chain still needs to verify the strength of the interactions among the cities in an urban agglomeration. In this paper, inverse-distance weighting was used directly, but we noted that the method for defining the interaction between cities scientifically requires further exploration.
In the spatial pattern simulation performed in this paper, four driving factors were selected for their stability to mine the urban growth probability. Factors such as the street network and the location of development centers, which are used in traditional CA simulation, were not used. Because their quantities vary over time, such factors cannot be used to explain the drivers of the long-term development of urban agglomerations. For example, a new highway or subway built by the government can completely change this pattern of drivers. For this reason, we maintain that it is more appropriate to use spatially stable factors to explain the long-term spatial patterns of the drivers of urban development when research is conducted on the scale of the urban agglomeration.
In addition, the question of how to integrate quantitative data mining methods to identify drivers and qualitative spatial policy regulation to form a more accurate fitness map are both key points for simulating the long-term growth of urban agglomerations. Our research showed that it is feasible to extract the driving force coefficients of various spatial factors from historical urban growth. Compared with the global regression method, we considered the geographic-weighted regression method to be more suitable for analyzing the urban agglomeration. However, studying historical change can only provide insight into the past development of urban agglomerations, trends that will not necessarily continue. A new development strategy from government spatial planners can completely change the previous development model. As such, research must combine qualitative and quantitative methods in simulations, and avoid overreliance on purely quantitative results. The model is, after all, only a model.
Although there are many limitations to spatial planning scenario analyses, we believe that as long as the simulation model is fed enough refined data, the SMC-GWRCA method can competently simulate the spatiotemporal evolution process of urban agglomerations. In general, simulation models can help us understand the development mechanisms at work and provide a reference for decision-making within territorial spatial planning. The simulation framework studied in this paper only needed a small amount of data and a limited number of parameters to simulate the growth of urban agglomerations. Through future research, we would like to try a larger-scale simulation, such as modeling urban agglomerations over the whole of China.