Development Process, Quantitative Models, and Future Directions in Driving Analysis of Urban Expansion

: Driving analysis of urban expansion (DAUE) is usually implemented to identify the driving factors and their corresponding driving effects/mechanisms for the expansion processes of urban land, aiming to provide scientiﬁc guidance for urban planning and management. Based on a thorough analysis and summarization of the development process and quantitative models, four major limitations in existing DAUE studies have been uncovered: (1) the interactions in hierarchical urban systems have not been fully explored; (2) the employed data cannot fully depict urban dynamic through ﬁner social perspectives; (3) the employed models cannot deal with high-level feature correlations; and (4) the simulation and analysis models are still not intrinsically integrated. Four future directions are thus proposed: (1) to pay attention to the hierarchical characteristics of urban systems and conduct multi-scale research on the complex interactions within them to capture dynamic features; (2) to leverage remote sensing data so as to obtain diverse urban expansion data and assimilate multi-source spatiotemporal big data to supplement novel socio-economic driving factors; (3) to integrate with interpretable data-driven machine learning techniques to bolster the performance and reliability of DAUE models; and (4) to construct mechanism-coupled urban simulation to achieve a complementary enhancement and facilitate theory development and testing for urban land systems.


Introduction
At present, urban areas on earth accommodate approximately one-half of the world's population, and this proportion will increase to about two-thirds by 2050 [1].This continuous urbanization will inevitably lead to urban expansion in order to provide urban residents with adequate living space and sufficient utilities.However, unorderly urban expansion will result in the occupation of fertile cultivated land and public green areas, leading to environmental issues such as ecosystem degradation, eventually affecting sustainable development.
Thus, scholars have carried out extensive studies on the driving analysis of urban expansion (abbreviated to DAUE) with the goal of providing scientific guidance on rational urban land allocation for urban management and planning (Figure 1).Here, DAUE is defined to cover all related driving effect/relationship/mechanism research on urban expansion and includes the employed analysis methods within them, i.e., correlation analysis, regression analysis, causal analysis, and so on.The key research areas of current DAUE include (1) exploring the qualitative/quantitative relationship between urban expansion and certain driving factors(e.g., [2,3]); (2) identifying the major driving factors of urban expansion from socio-economic, policy, topographic, and other perspectives (e.g., [4,5]); (3) understanding the spatiotemporal dynamics of driving effects or mechanisms during the process of long-term urban expansion (e.g., [6,7]); and (4) providing theoretical support for urban expansion modelling (e.g., [8]).
urban expansion.However, in-depth summaries and discussions of the development pro-cess and research methods of DAUE have not yet been implemented.Specifically, the works of Lambin et al. [9] , Van Vliet et al. [10] and Shaw et al. [11] have explored much broader research topics such as urbanization, land use/cover change and land system science, meaning that DAUE-related content is limited; Wahyudi and Liu [12], Hersperger et al. [13], and Kim et al. [14] have only summarized all the driving factors that have been discussed in existing studies of urban expansion/land use change simulation; Dadashpoor and Ahani [15], Colsaet et al. [16] , Kasraian et al. [17] , and Seto et al. [18,19] have only summarized the driving effects or mechanisms of potential driving factors on urban expansion.Based on the comments above, to facilitate discovering research gaps in DAUE and enlightening future directions, we elaborate upon the development process and mainstream quantitative models or methods in this study by discussing the following three topics: (1) What is the chronological development of DAUE?
(2) What quantitative models/methods have been used in DAUE case studies, and what are the application scenarios for each approach?
(3) What are the limitations of current DAUE studies and the future research agenda in DAUE?

The Development Process of DAUE
The collected DAUE case studies for review are retrieved from the Web of Science Core Collection (WOS CC), which comprises 399 papers from 1961 to 2023.The details of the literature retrieval are introduced in Appendix A. Based on this, we discuss and analyse the evolutionary characteristics of article publication, research collaboration, and hot research themes using a bibliometric analysis.The employed bibliometric analysis methods are provided in Appendix B.
Based on the comments above, to facilitate discovering research gaps in DAUE and enlightening future directions, we elaborate upon the development process and mainstream quantitative models or methods in this study by discussing the following three topics: (1) What is the chronological development of DAUE?
(2) What quantitative models/methods have been used in DAUE case studies, and what are the application scenarios for each approach?
(3) What are the limitations of current DAUE studies and the future research agenda in DAUE?

The Development Process of DAUE
The collected DAUE case studies for review are retrieved from the Web of Science Core Collection (WOS CC), which comprises 399 papers from 1961 to 2023.The details of the literature retrieval are introduced in Appendix A. Based on this, we discuss and analyse the evolutionary characteristics of article publication, research collaboration, and hot research themes using a bibliometric analysis.The employed bibliometric analysis methods are provided in Appendix B.  , growing stage (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013) and booming stage (2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021)(2022)(2023).The publication volume in these three stages accounts for 3%, 16%, and 81%, respectively of the total publications.In the start-up stage, only one DAUE paper was published per year; during the growing stage, the number of publications increased slowly and unsteadily, with the count of annual publications at no more than ten, and the citation frequency generally below 500.Comparatively, during the booming stage, the annual publication volume and total citation frequency underwent an obviously explosive growth, indicating that DAUE attracted extensive attention and experienced rapid development during this stage.
ISPRS Int.J. Geo-Inf.2023, 12, x FOR PEER REVIEW respectively of the total publications.In the start-up stage, only one DAUE pape published per year; during the growing stage, the number of publications incr slowly and unsteadily, with the count of annual publications at no more than ten, an citation frequency generally below 500.Comparatively, during the booming stage, t nual publication volume and total citation frequency underwent an obviously exp growth, indicating that DAUE attracted extensive attention and experienced rapid opment during this stage.

Increasingly Multidisciplinary Subjects of DAUE
The DAUE papers in the database were published in a total of 116 different jou Figure 3 presents the top ten most influential journals, in which 202 papers were lished, accounting for about 51% of the total.The most popular journals in the fi DAUE include Sustainability, Land Use Policy, Habitat International, Land, Landscape an ban Planning, and Applied Geography.Among these, Land Use Policy, Landscape and Planning, Habitat International, and Applied Geography, with their high 5-year impact (IF5), have witnessed an impressive total number of DAUE-related publications th In addition, DAUE case studies are mainly published in the journals of SCIE and However, it can be observed that the average volume and IF5 of SSCI journals in th ten journals are higher than those of SCIE journals.This indicates that although DA a multidisciplinary research field, it exerts a more significant influence on social sci

Increasingly Multidisciplinary Subjects of DAUE
The DAUE papers in the database were published in a total of 116 different journals.Figure 3 presents the top ten most influential journals, in which 202 papers were published, accounting for about 51% of the total.The most popular journals in the field of DAUE include Sustainability, Land Use Policy, Habitat International, Land, Landscape and Urban Planning, and Applied Geography.Among these, Land Use Policy, Landscape and Urban Planning, Habitat International, and Applied Geography, with their high 5-year impact factor (IF5), have witnessed an impressive total number of DAUE-related publications thus far.In addition, DAUE case studies are mainly published in the journals of SCIE and SSCI.However, it can be observed that the average volume and IF5 of SSCI journals in the top ten journals are higher than those of SCIE journals.This indicates that although DAUE is a multidisciplinary research field, it exerts a more significant influence on social science.
respectively of the total publications.In the start-up stage, only one DAUE paper was published per year; during the growing stage, the number of publications increased slowly and unsteadily, with the count of annual publications at no more than ten, and the citation frequency generally below 500.Comparatively, during the booming stage, the annual publication volume and total citation frequency underwent an obviously explosive growth, indicating that DAUE attracted extensive attention and experienced rapid development during this stage.

Increasingly Multidisciplinary Subjects of DAUE
The DAUE papers in the database were published in a total of 116 different journals.Figure 3 presents the top ten most influential journals, in which 202 papers were published, accounting for about 51% of the total.The most popular journals in the field of DAUE include Sustainability, Land Use Policy, Habitat International, Land, Landscape and Urban Planning, and Applied Geography.Among these, Land Use Policy, Landscape and Urban Planning, Habitat International, and Applied Geography, with their high 5-year impact factor (IF5), have witnessed an impressive total number of DAUE-related publications thus far.In addition, DAUE case studies are mainly published in the journals of SCIE and SSCI.However, it can be observed that the average volume and IF5 of SSCI journals in the top ten journals are higher than those of SCIE journals.This indicates that although DAUE is a multidisciplinary research field, it exerts a more significant influence on social science.Since each journal indexed by WOS CC has been classified into at least one subject category by specific algorithms or professional experts, analysing the chronological extension of these subjects can aid in the understanding of the research domains involved in DAUE. Figure 4 demonstrates that the subjects of these 116 DAUE-related journals can be further divided into five categories: economics, urban planning, geography, environmental protection, and frontier technology.The earliest researchers (e.g., [20]) mainly analysed the driving mechanism of urban expansion from the perspective of economics, and subsequent studies were also constantly influenced by economic theories (e.g., [21]).Later, with the continuous development of DAUE, the subjects were gradually enriched, especially those related to environmental protection and sustainable development.Since 2000, DAUE research has begun to address the subjects of "frontier technology", such as remote sensing, engineering and computer science.
Since each journal indexed by WOS CC has been classified into at least one subject category by specific algorithms or professional experts, analysing the chronological extension of these subjects can aid in the understanding of the research domains involved in DAUE. Figure 4 demonstrates that the subjects of these 116 DAUE-related journals can be further divided into five categories: economics, urban planning, geography, environmental protection, and frontier technology.The earliest researchers (e.g., [20]) mainly analysed the driving mechanism of urban expansion from the perspective of economics, and subsequent studies were also constantly influenced by economic theories (e.g., [21]).Later, with the continuous development of DAUE, the subjects were gradually enriched, especially those related to environmental protection and sustainable development.Since 2000, DAUE research has begun to address the subjects of "frontier technology", such as remote sensing, engineering and computer science.

The Transition of Involved DAUE Research Institutions
Figure 5 shows the major countries that have contributed to DAUE study, along with the cooperative network built from them.Before the year 2000, most publications came from Germany and the USA; since 2000, however, the hotspot of DAUE publications has gradually shifted to China, which has become the most important hub for the knowledge exchange and research cooperation of DAUE study (as indicated by the thickest violet outer ring).Furthermore, as shown in Table 1, Chinese institutions have conducted the highest number of DAUE studies overall, followed by institutions in the USA.Judging from the betweenness centrality of institutions, the Chinese Academy of Sciences has contributed most to DAUE publications.

The Transition of Involved DAUE Research Institutions
Figure 5 shows the major countries that have contributed to DAUE study, along with the cooperative network built from them.Before the year 2000, most publications came from Germany and the USA; since 2000, however, the hotspot of DAUE publications has gradually shifted to China, which has become the most important hub for the knowledge exchange and research cooperation of DAUE study (as indicated by the thickest violet outer ring).Furthermore, as shown in Table 1, Chinese institutions have conducted the highest number of DAUE studies overall, followed by institutions in the USA.Judging from the betweenness centrality of institutions, the Chinese Academy of Sciences has contributed most to DAUE publications.
sion of these subjects can aid in the understanding of the research domains involved in DAUE. Figure 4 demonstrates that the subjects of these 116 DAUE-related journals can be further divided into five categories: economics, urban planning, geography, environmental protection, and frontier technology.The earliest researchers (e.g., [20]) mainly analysed the driving mechanism of urban expansion from the perspective of economics, and subsequent studies were also constantly influenced by economic theories (e.g., [21]).Later with the continuous development of DAUE, the subjects were gradually enriched, especially those related to environmental protection and sustainable development.Since 2000 DAUE research has begun to address the subjects of "frontier technology", such as remote sensing, engineering and computer science.

The Transition of Involved DAUE Research Institutions
Figure 5 shows the major countries that have contributed to DAUE study, along with the cooperative network built from them.Before the year 2000, most publications came from Germany and the USA; since 2000, however, the hotspot of DAUE publications has gradually shifted to China, which has become the most important hub for the knowledge exchange and research cooperation of DAUE study (as indicated by the thickest violet outer ring).Furthermore, as shown in Table 1, Chinese institutions have conducted the highest number of DAUE studies overall, followed by institutions in the USA.Judging from the betweenness centrality of institutions, the Chinese Academy of Sciences has contributed most to DAUE publications.  The year in which the institution published its first DAUE case study.

Evolution of DAUE Hot Themes
A total of 1211 research themes were extracted from publications in our database.Among them, 155 so-called "hot" themes are identified using the Jenks natural break point method; these account for about 13% of the total themes.

Hot Research Themes and Their Diachronic Extension
In more detail, we divide the hot themes into four categories-keywords of urban expansion, driving analysis, study area and driving factors-and visualize each of their corresponding diachronic extensions.
As shown in Figure 6, keywords related to urban expansion have evolved from relatively narrow to more diverse, and from implicit to explicit.For example, "urban expansion" involves more expansion patterns beyond the special form of "urban sprawl", and clearly stresses the spatial characteristic of urban land change compared with "urban growth".Moreover, many of the themes that developed later, such as built-up expansion, impervious surface and construction land, clarify the specific land cover change types of urban expansion in current research.  The year in which the institution published its first DAUE case study.

Evolution of DAUE Hot Themes
A total of 1211 research themes were extracted from publications in our databas Among them, 155 so-called "hot" themes are identified using the Jenks natural break poi method; these account for about 13% of the total themes.

Hot Research Themes and Their Diachronic Extension
In more detail, we divide the hot themes into four categories-keywords of urba expansion, driving analysis, study area and driving factors-and visualize each of the corresponding diachronic extensions.
As shown in Figure 6, keywords related to urban expansion have evolved from rel tively narrow to more diverse, and from implicit to explicit.For example, "urban expa sion" involves more expansion patterns beyond the special form of "urban sprawl", an clearly stresses the spatial characteristic of urban land change compared with "urba growth".Moreover, many of the themes that developed later, such as built-up expansio impervious surface and construction land, clarify the specific land cover change types urban expansion in current research.Figure 7 shows that although the specific content of driving analysis has gradually become more in-depth and complex, studies addressing driving mechanisms (how the driving factors impact urban expansion) are far fewer than those merely focusing on the driving effects/relationships (to what extent driving factors influence urban expansion).In the selection of driving factors, since approximately 2013, researchers have explicitly extended their sphere of interest from only considering city-level macro driving factors to further exploring how "spatial determinants" at the parcel or pixel level (such as distance to rivers) impact urban structure and urban land allocation.Figure 8 shows that DAUE researchers have gradually come to pay consider tention to urban agglomerations and the megacities within them, such as Guang the Pearl River Delta, Wuhan in the Yangtze River Middle Reaches Urban Agglom and Beijing in the Beijing-Tianjin-Hebei Urban Agglomeration.Increasingly com urban systems prompt multilevel and interactive urban expansion processes; th study area tends to be diversified and multi-scale, extending from peri-urban area gle cities, metropolitan areas or megacities, and subsequently to urban agglom with complex internal structures.Figure 9 shows that the driving factors constructed in DAUE can be classified tors related to socio-economics, governance and institutions, and social context.T liest involved factors such as economy, population, land development and transp are relevant to discussions of suburbanization and urban sprawl in American an pean countries [22,23].Later, against the backdrop of China's reform and opening core research focus transferred to China, meaning that the driving factors analys greatly influenced by China's urbanization strategy and policy (e.g., government, investment, markets, and high-speed railway).Recently, moreover, some fact quantify spatial interaction, such as accessibility and migration, have also attracte spread attention.Figure 8 shows that DAUE researchers have gradually come to pay considerable attention to urban agglomerations and the megacities within them, such as Guangzhou in the Pearl River Delta, Wuhan in the Yangtze River Middle Reaches Urban Agglomeration, and Beijing in the Beijing-Tianjin-Hebei Urban Agglomeration.Increasingly complicated urban systems prompt multilevel and interactive urban expansion processes; thus, the study area tends to be diversified and multi-scale, extending from peri-urban areas to single cities, metropolitan areas or megacities, and subsequently to urban agglomerations with complex internal structures.Figure 8 shows that DAUE researchers have gradually come to pay considerable attention to urban agglomerations and the megacities within them, such as Guangzhou in the Pearl River Delta, Wuhan in the Yangtze River Middle Reaches Urban Agglomeration, and Beijing in the Beijing-Tianjin-Hebei Urban Agglomeration.Increasingly complicated urban systems prompt multilevel and interactive urban expansion processes; thus, the study area tends to be diversified and multi-scale, extending from peri-urban areas to single cities, metropolitan areas or megacities, and subsequently to urban agglomerations with complex internal structures.Figure 9 shows that the driving factors constructed in DAUE can be classified as factors related to socio-economics, governance and institutions, and social context.The earliest involved factors such as economy, population, land development and transportation are relevant to discussions of suburbanization and urban sprawl in American and European countries [22,23].Later, against the backdrop of China's reform and opening up, the core research focus transferred to China, meaning that the driving factors analysed were greatly influenced by China's urbanization strategy and policy (e.g., government, foreign investment, markets, and high-speed railway).Recently, moreover, some factors that quantify spatial interaction, such as accessibility and migration, have also attracted widespread attention.Figure 9 shows that the driving factors constructed in DAUE can be classified as factors related to socio-economics, governance and institutions, and social context.The earliest involved factors such as economy, population, land development and transportation are relevant to discussions of suburbanization and urban sprawl in American and European countries [22,23].Later, against the backdrop of China's reform and opening up, the core research focus transferred to China, meaning that the driving factors analysed were greatly influenced by China's urbanization strategy and policy (e.g., government, foreign investment, markets, and high-speed railway).Recently, moreover, some factors that quantify spatial interaction, such as accessibility and migration, have also attracted widespread attention.
are relevant to discussions of suburbanization and urban sprawl in American and European countries [22,23].Later, against the backdrop of China's reform and opening up, the core research focus transferred to China, meaning that the driving factors analysed were greatly influenced by China's urbanization strategy and policy (e.g., government, foreign investment, markets, and high-speed railway).Recently, moreover, some factors that quantify spatial interaction, such as accessibility and migration, have also attracted widespread attention.

Burst Detection for Hot Themes
Four hot themes in DAUE are captured as burst terms: namely, economic transition, remote sensing, GIS, and cellular automata (Table 2).Among them, economic transition describes the social context of urban expansion; remote sensing is an important data source for DAUE; GIS is the technical support platform; and finally, cellular automata can be regarded as a typical application scenario of DAUE.The economic transition's burst stage ranges from 2015 to 2017, mainly owing to the transformation of economic development during China's reform and opening-up.Moreover, it also refers to the economic restructuring in the process of urban development [24] or the economic transformation of resource-based cities resulting from the depletion of resources, e.g., coal, oil, etc. [25].
GIS was applied as an important data analysis and processing tool that attracted a lot of attention in DAUE research from 2005 to 2013.Remote sensing (RS) datasets, especially Landsat data collections, were also widely employed as a new data source from 2002 to 2013.In the earlier stages of DAUE study, the advanced technologies of GIS and RS could meet researchers' needs for data supplementation, processing, and modelling.After being mastered by more researchers, they have since become basic and indispensable research tools, and are thus no longer highlighted in new DAUE publications after 2013.
The burst stage of cellular automata (CA) ranges from 2017 to 2020.CA has a strong capability to simulate the spatiotemporal dynamics of land use change in complex urban systems.Although CA-based urban simulation places more emphasis on model accuracy, the essence of model calibration in CA is inherently related to DAUE.From approximately 2017 to 2020, some researchers in CA-based urban simulation began to pay close attention to DAUE in order to improve the performance of CA models (e.g., [26][27][28]), or to take DAUE as an auxiliary work in CA (e.g., [29]).

Quantification Models in DAUE
A series of quantitative methods have been employed to conduct the DAUE research and explain the driving effects.As shown in Figure 10, models involved in DAUE can be divided into three categories: (1) traditional correlation analysis and regression models that focus on measuring the correlativity between each driving factor (X) and urban expansion (Y); (2) geographical regression models that specifically consider spatial and temporal effects; and (3) machine learning-based models.The former two types of models are mainly mechanism-driven models with specific statistical assumptions, while machine learning-based models are data-driven models.

Correlation Analysis
Scatter plots, gradient analyses, and correlation coefficients are methods that have been employed to visualize/measure correlation relationships since the early days of empirical DAUE research.The former two methods present correlation patterns of X and Y by visualizing a set of discrete data points, while the last method can measure the degree of correlation.A scatter plot visualizes the correlation pattern between each driving factor and urban expansion in the form of a 2D plot.Gradient analysis can expose the relationship between the indicators of urban land distribution/expansion and the distance to diverse site factors, such as city centres [30], roads [31], rivers [32], coastal lines [33], subway stations, and highway exits [34].However, although scatter plots and gradient analysis can be used to display details of correlation patterns, the corresponding correlation strengths cannot be accurately estimated.
Correlation coefficients are indicators that measure the strength of the correlations between driving factors and urban expansion.In DAUE, the Pearson correlation coefficient (PCC) is the most widely used linear correlation analysis, while grey relational analysis (GRA) has been used for nonlinear correlation analysis.Although PCC assumes that observed data were drawn from stochastic processes obeying normal distribution, only very few DAUE studies have conducted the corresponding statistical test before applying it.By contrast, GRA measures the correlation between two variables based on the similarity of their sequence trends, so it imposes fewer limitations on variable distribution.However, GRA cannot identify whether a correlation relationship is positive or negative, since the output values range from 0 to 1.
Currently, due to the diversity of driving factors and the complexity of driving mechanisms, these three methods are primarily employed in DAUE as exploratory data analysis methods to improve the reliability of subsequent regression models, for example, through eliminating multicollinearity of driving factors.

Classic Regression Models without Considering Spatiotemporal Effect
In DAUE, ordinary linear regression (OLR) and logistic regression (LR) are two simple but widely used regression models.OLR is suitable for exploring linear correlations between multiple factors and continuous dependent variables in urban expansion (e.g., urban area and urban expansion intensity), while LR can be applied to discrete or binary dependent variables (e.g., land use types, changes of urban land, etc.).Compared with OLR, LR integrates the logit transformation with a linear regression of driving factors to generate the probability of urban expansion.With support from remote sensing and GIS in DAUE, logistic regression is often combined with other more sophisticated models such

Correlation Analysis
Scatter plots, gradient analyses, and correlation coefficients are methods that have been employed to visualize/measure correlation relationships since the early days of empirical DAUE research.The former two methods present correlation patterns of X and Y by visualizing a set of discrete data points, while the last method can measure the degree of correlation.A scatter plot visualizes the correlation pattern between each driving factor and urban expansion in the form of a 2D plot.Gradient analysis can expose the relationship between the indicators of urban land distribution/expansion and the distance to diverse site factors, such as city centres [30], roads [31], rivers [32], coastal lines [33], subway stations, and highway exits [34].However, although scatter plots and gradient analysis can be used to display details of correlation patterns, the corresponding correlation strengths cannot be accurately estimated.
Correlation coefficients are indicators that measure the strength of the correlations between driving factors and urban expansion.In DAUE, the Pearson correlation coefficient (PCC) is the most widely used linear correlation analysis, while grey relational analysis (GRA) has been used for nonlinear correlation analysis.Although PCC assumes that observed data were drawn from stochastic processes obeying normal distribution, only very few DAUE studies have conducted the corresponding statistical test before applying it.By contrast, GRA measures the correlation between two variables based on the similarity of their sequence trends, so it imposes fewer limitations on variable distribution.However, GRA cannot identify whether a correlation relationship is positive or negative, since the output values range from 0 to 1.
Currently, due to the diversity of driving factors and the complexity of driving mechanisms, these three methods are primarily employed in DAUE as exploratory data analysis methods to improve the reliability of subsequent regression models, for example, through eliminating multicollinearity of driving factors.

Classic Regression Models without Considering Spatiotemporal Effect
In DAUE, ordinary linear regression (OLR) and logistic regression (LR) are two simple but widely used regression models.OLR is suitable for exploring linear correlations between multiple factors and continuous dependent variables in urban expansion (e.g., urban area and urban expansion intensity), while LR can be applied to discrete or binary dependent variables (e.g., land use types, changes of urban land, etc.).Compared with OLR, LR integrates the logit transformation with a linear regression of driving factors to generate the probability of urban expansion.With support from remote sensing and GIS in DAUE, logistic regression is often combined with other more sophisticated models such as multilevel regression [35], spatial regression [36], and geographically weighted regression [37], among others.Being similar to LR but using different probability distribution, probit regression can also be used to analyse discrete urban expansion variables [7].The coefficients obtained through probit regression are proportional to those obtained through logistic regression, which indicates that the driving effects measured by the two models are equal.Comparatively, logistic regression is more widely used than probit regression.
Panel regression, which can be regarded as the extension of OLR in the temporal dimension, requires time series analysis of both cross-sectional urban expansion data and related driving factors.By providing more informative data across samples in multiple phases, this method can relieve data multicollinearity, capture individual variability, and identify the effects of unobserved time-invariant variables [38,39].In DAUE, pooled ordinary least-square regression, fixed-effect regression, and random-effect regression are three most commonly used methods of panel regression analysis [3,40,41].Statistical tests, especially the F-test and Hausman test, can be used in practice to aid in the selection of the specific panel regression approach that best matches the studied data.
Hierarchical linear modelling (HLM) is a regression approach applied to nested datasets, which enables DAUE research to assimilate driving factors from different administrative levels, such as the prefecture level (level 1, i.e., individual level) and provincial level (level 2, i.e., group/contextual level).The regressions at multiple spatial scales take regional effects on individual units into account, while also considering the effects of the cross-level interaction of driving factors on urban expansion [42,43].Table 3 lists three basic forms of HLM, namely the null model, random-coefficient regression model, and intercepts-and-slopes-as-outcomes model.Among them, the null model measures the influence of differences within and between groups; the random-coefficient regression model is designed to explore the direct effect of individual-level factors on the dependent variable; finally, the intercepts-and-slopes-as-outcomes model can analyse the driving effect of certain factors at both two levels.These models can be constructed in turn and subsequently combined to illustrate the driving effects of each administrative level.Null model Random-coefficient regression model Intercepts-and-slopes-as-outcomes model y ij : the dependent variable of individual i, which is in group j; X k_ij : the k-th explanatory variable of individual i in group j; β 0j : the intercept of the dependent variable in group j; β kj : the coefficient (slope) of the k-th explanatory variable in group j; γ ij : random error associated with individual i in group j; γ 00 : average value of β 0j ; γ k0 : the intercept of β kj ; µ 0j , µ kj : random variables; V kj : the k-th explanatory variable of group j; γ 01 , γ k1 : the coefficient (slope) of V kj .
Structural equation modelling (SEM) is a causal mechanism analysis method that combines factor analysis and path analysis to explore the interactions between a set of observed and latent variables.In land system science, a change of land use usually involves a variable combination of multiple driving factors [44].Therefore, SEM has been used in the DAUE context to measure and analyse the causal mechanisms between urban expansion and various variables.As illustrated in Figure 11, latent variables are derived from observed variables (i.e., driving factors) in factor analysis, while a theoretical framework explaining how these variables relate to each other can be presupposed by researchers in the form of a path diagram [45].This structure of causal relationships between variables in fact makes up a group of interrelated equations [46,47], in which regression coefficients are estimated.The model can then be evaluated and modified step-by-step according to fitting indices.However, it should be noted that theoretical frameworks are user-defined, and that multiple distinct SEM analysis results may be obtained from the same research dataset.
from observed variables (i.e., driving factors) in factor analysis, while a theoretical framework explaining how these variables relate to each other can be presupposed by researchers in the form of a path diagram [45].This structure of causal relationships between variables in fact makes up a group of interrelated equations [46,47], in which regression coefficients are estimated.The model can then be evaluated and modified step-by-step according to fitting indices.However, it should be noted that theoretical frameworks are user-defined, and that multiple distinct SEM analysis results may be obtained from the same research dataset.The difference-in-differences (DID) model is a quasi-experimental approach used to study the influence of a treatment (i.e., explanatory variable) by comparing the changes in outcomes (i.e., explained variable) between a control group and treatment group over time.Therefore, DID is established to measure the impact of only a single factor (one treatment).For example, in DAUE study, Zhou et al. [48] applied DID to explore the impact of the administrative hierarchy system on urban expansion in China, while Deng et al. [49] and Zhang et al. [50] employed DID to study the driving effect of high-speed railways.Furthermore, DID relies on a strict exchangeability assumption, specifically that the treatment group and control group originally have a common trend and that the treatment will cause a difference between them [51].For this reason, the trend similarity between two groups of samples should be guaranteed to ensure the validity of DID results in the DAUE context.Thus, a common trend test and falsification test on the treatment group and control group are required before analysis; however, only Zhang et al. [50] have explicitly conducted such tests.
Generally, these six classic regression models have a strong generalization capacity and are relatively simple to conduct.However, these models cannot handle sophisticated relationships such as the spatial heterogeneity between Y and X during the diversification of urban development.

Geographical Models Considering Spatial and Temporal Effects
In the process of urban spatial expansion, complex interaction between geographic entities is widespread, while spatiotemporal non-stationarity and spatial association also cannot be neglected.Thus, to better capture the actual urban expansion mechanisms, researchers have increasingly taken these underlying spatiotemporal effects into account in DAUE modelling.

Geo-Detector: Model Considering Spatial Distribution
The geo-detector is a spatial statistic model consisting of four detectors: a factor detector, interaction detector, risk detector and ecological detector, as shown in Figure 12 [52].The first three detectors are commonly used in DAUE, especially in urban agglomeration research [53,54].A factor detector can measure the strength of the correlation The difference-in-differences (DID) model is a quasi-experimental approach used to study the influence of a treatment (i.e., explanatory variable) by comparing the changes in outcomes (i.e., explained variable) between a control group and treatment group over time.Therefore, DID is established to measure the impact of only a single factor (one treatment).For example, in DAUE study, Zhou et al. [48] applied DID to explore the impact of the administrative hierarchy system on urban expansion in China, while Deng et al. [49] and Zhang et al. [50] employed DID to study the driving effect of highspeed railways.Furthermore, DID relies on a strict exchangeability assumption, specifically that the treatment group and control group originally have a common trend and that the treatment will cause a difference between them [51].For this reason, the trend similarity between two groups of samples should be guaranteed to ensure the validity of DID results in the DAUE context.Thus, a common trend test and falsification test on the treatment group and control group are required before analysis; however, only Zhang et al. [50] have explicitly conducted such tests.
Generally, these six classic regression models have a strong generalization capacity and are relatively simple to conduct.However, these models cannot handle sophisticated relationships such as the spatial heterogeneity between Y and X during the diversification of urban development.

Geographical Models Considering Spatial and Temporal Effects
In the process of urban spatial expansion, complex interaction between geographic entities is widespread, while spatiotemporal non-stationarity and spatial association also cannot be neglected.Thus, to better capture the actual urban expansion mechanisms, researchers have increasingly taken these underlying spatiotemporal effects into account in DAUE modelling.

Geo-Detector: Model Considering Spatial Distribution
The geo-detector is a spatial statistic model consisting of four detectors: a factor detector, interaction detector, risk detector and ecological detector, as shown in Figure 12 [52].The first three detectors are commonly used in DAUE, especially in urban agglomeration research [53,54].A factor detector can measure the strength of the correlation between driving factors and urban expansion and imply a stronger causality than correlation coefficients [52], although it cannot present the direction of the correlation due to its value range of 0 to 1.An interaction detector can reveal how paired driving factors interact with each other to affect urban expansion and identify the strength, direction, linearity, or nonlinearity of their interaction.For example, a nonlinear-enhance interaction is often found between driving factors in DAUE [55,56], which indicates that the superposition of two factors nonlinearly enhances their explanatory power for urban expansion.A risk detector employs a t-test to detect whether there are significant differences in the mean values of the urban expansion index at different levels of driving factors, which can further reveal the influence of driving factors and their correlation pattern through visualization.
between driving factors and urban expansion and imply a stronger causality than corre-lation coefficients [52], although it cannot present the direction of the correlation due to its value range of 0 to 1.An interaction detector can reveal how paired driving factors interact with each other to affect urban expansion and identify the strength, direction, linearity, or nonlinearity of their interaction.For example, a nonlinear-enhance interaction is often found between driving factors in DAUE [55,56], which indicates that the superposition of two factors nonlinearly enhances their explanatory power for urban expansion.A risk detector employs a t-test to detect whether there are significant differences in the mean values of the urban expansion index at different levels of driving factors, which can further reveal the influence of driving factors and their correlation pattern through visualization.Geo-detectors avoid the influence of factor multicollinearity and can fulfil numerous different functions when different detectors are chosen.However, they are only applicable to discrete explanatory variables of driving factors; as a result, continuous variables will need to be discretized in practice, which may give rise to the modifiable areal unit problem (MAUP) [52].

Spatial Regression Models: Models Considering Spatial Dependence
Since the intrinsic spatial autocorrelation in DAUE variables is ubiquitous, spatial regression models are applied to explicitly express the impact of spatial dependence, with examples including the spatial lag model (SLM), spatial error model (SEM) and spatial Durbin model (SDM) (Figure 13).Among these models, SLM and SEM have been most popular in the DAUE context since 2010 [57].SLM takes the influence of neighboring urban expansion into account, with the spatially lagged variable  established in the model.SEM assumes the existence of unobserved but important explanatory variables that are spatially correlated and affect local expansion; as a result, it multiplies the spatial adjacency matrix with error Geo-detectors avoid the influence of factor multicollinearity and can fulfil numerous different functions when different detectors are chosen.However, they are only applicable to discrete explanatory variables of driving factors; as a result, continuous variables will need to be discretized in practice, which may give rise to the modifiable areal unit problem (MAUP) [52].

Spatial Regression Models: Models Considering Spatial Dependence
Since the intrinsic spatial autocorrelation in DAUE variables is ubiquitous, spatial regression models are applied to explicitly express the impact of spatial dependence, with examples including the spatial lag model (SLM), spatial error model (SEM) and spatial Durbin model (SDM) (Figure 13).
interact with each other to affect urban expansion and identify the strength, direction, linearity, or nonlinearity of their interaction.For example, a nonlinear-enhance interaction is often found between driving factors in DAUE [55,56], which indicates that the superposition of two factors nonlinearly enhances their explanatory power for urban expansion.A risk detector employs a t-test to detect whether there are significant differences in the mean values of the urban expansion index at different levels of driving factors, which can further reveal the influence of driving factors and their correlation pattern through visualization.Geo-detectors avoid the influence of factor multicollinearity and can fulfil numerous different functions when different detectors are chosen.However, they are only applicable to discrete explanatory variables of driving factors; as a result, continuous variables will need to be discretized in practice, which may give rise to the modifiable areal unit problem (MAUP) [52].

Spatial Regression Models: Models Considering Spatial Dependence
Since the intrinsic spatial autocorrelation in DAUE variables is ubiquitous, spatial regression models are applied to explicitly express the impact of spatial dependence, with examples including the spatial lag model (SLM), spatial error model (SEM) and spatial Durbin model (SDM) (Figure 13).Among these models, SLM and SEM have been most popular in the DAUE context since 2010 [57].SLM takes the influence of neighboring urban expansion into account, with the spatially lagged variable  established in the model.SEM assumes the existence of unobserved but important explanatory variables that are spatially correlated and affect local expansion; as a result, it multiplies the spatial adjacency matrix with error Among these models, SLM and SEM have been most popular in the DAUE context since 2010 [57].SLM takes the influence of neighboring urban expansion into account, with the spatially lagged variable WY established in the model.SEM assumes the existence of unobserved but important explanatory variables that are spatially correlated and affect local expansion; as a result, it multiplies the spatial adjacency matrix with error terms to produce its spatially lagged variable (Wε).These two models have been shown to evaluate the driving effects of factors on urban expansion more precisely than OLS, owing to the spatial relationship in the data [57].
SDM is the extension of SLM and SEM, designed to consider the spatial autocorrelation of both the dependent variable and explanatory variables.Feng et al. [58] employed this model to reveal the spatial characteristics of driving effects on urban sprawl, and identified the so-called siphon effect or spill-over effect between cities at the country scale as well as the spatial correlation of urban sprawl in China.Furthermore, it is notable that the coefficients estimated by spatial dependence terms on the explained variable in SDM, SEM, and SLM can be identified as either direct marginal effects caused by local driving factors or indirect marginal effects caused by neighborhood driving factors [50,58,59].
To guarantee the selection of the spatial regression model that will best match the observed data, it is necessary to conduct a Lagrange multiplier test on the data before modelling.However, the spatial regression models mentioned above belong to the type of global regression that is incapable of modelling spatial heterogeneity in the urban expansion process.

SRM and GWR: Models Considering Spatial Dependence and Spatial Heterogeneity
The spatial regime model (SRM) and geographically weighted regression (GWR) are two typical spatial regression techniques that can not only capture spatial dependence through a spatial weighted matrix, but also address spatial heterogeneity by spatially varying coefficients.SRM regards homogeneous geographical units as subsets, and subsequently constructs regression equations for each of them.At present, few studies have applied SRM to quantitatively measure the distinct driving effects on regions within and outside development zones of the city under study [60], or on cities in different administrative levels [61].
GWR can estimate the regression coefficients for each spatial entity by jointly taking its geographical neighbors within a certain spatial range as observed sample sets.Based on Tobler's first law of geography [62], neighbors of different distances within the bandwidth should have different weights obtained from proper kernel functions (Figure 14).In the DAUE context, GWR can provide both global and local implications for rational urban expansion according to spatially varying coefficients.Moreover, it is often integrated with logistic regression to capture the spatial heterogeneity in the probability pattern of urban land transition [63].Besides, multiscale GWR has been employed to generating unique spatial scale for different driving factors [64].
identified the so-called siphon effect or spill-over effect between cities at the country scale as well as the spatial correlation of urban sprawl in China.Furthermore, it is notable that the coefficients estimated by spatial dependence terms on the explained variable in SDM, SEM, and SLM can be identified as either direct marginal effects caused by local driving factors or indirect marginal effects caused by neighborhood driving factors [50,58,59].
To guarantee the selection of the spatial regression model that will best match the observed data, it is necessary to conduct a Lagrange multiplier test on the data before modelling.However, the spatial regression models mentioned above belong to the type of global regression that is incapable of modelling spatial heterogeneity in the urban expansion process.

SRM and GWR: Models Considering Spatial Dependence and Spatial Heterogeneity
The spatial regime model (SRM) and geographically weighted regression (GWR) are two typical spatial regression techniques that can not only capture spatial dependence through a spatial weighted matrix, but also address spatial heterogeneity by spatially varying coefficients.SRM regards homogeneous geographical units as subsets, and subsequently constructs regression equations for each of them.At present, few studies have applied SRM to quantitatively measure the distinct driving effects on regions within and outside development zones of the city under study [60], or on cities in different administrative levels [61].
GWR can estimate the regression coefficients for each spatial entity by jointly taking its geographical neighbors within a certain spatial range as observed sample sets.Based on Tobler's first law of geography [62], neighbors of different distances within the bandwidth should have different weights obtained from proper kernel functions (Figure 14).In the DAUE context, GWR can provide both global and local implications for rational urban expansion according to spatially varying coefficients.Moreover, it is often integrated with logistic regression to capture the spatial heterogeneity in the probability pattern of urban land transition [63].Besides, multiscale GWR has been employed to generating unique spatial scale for different driving factors [64].Compared with GWR, SRM only considers spatial heterogeneity across different clusters in a study area rather than each spatial entity in turn.However, GWR is sensitive Compared with GWR, SRM only considers spatial heterogeneity across different clusters in a study area rather than each spatial entity in turn.However, GWR is sensitive to bandwidth selection and unsuitable for small samples.Notably, when the number of studied entities is too large, solving GWR will require significant storage space and large amounts of computational resources [65].

GTWR and GTWLR: Models Considering Both Spatial and Temporal Effects
Geographically and temporally weighted regression (GTWR) [66] and geographically and temporally weighted likelihood regression (GTWLR) [67] are two models that can deal with both spatial and temporal effects simultaneously.However, relatively few existing DAUE studies have taken temporal dependence into account and applied this kind of model [66][67][68][69].As extensions of GWR in the temporal dimension (Figure 15), GTWR and GTWLR have similar bandwidth selection-related limitations; moreover, with the temporal dimension considered, adequate panel datasets are required before these models can be applied.In addition, GTWR requires the analysed geographic entities of urban expansion to have varying spatial coordinates at different study phases; if this condition is not satisfied, the estimated results will be close to pooled OLS [66,70].
Geographically and temporally weighted regression (GTWR) [66] and geograph-ically and temporally weighted likelihood regression (GTWLR) [67] are two models that can deal with both spatial and temporal effects simultaneously.However, relatively few existing DAUE studies have taken temporal dependence into account and applied this kind of model [66][67][68][69].As extensions of GWR in the temporal dimension (Figure 15), GTWR and GTWLR have similar bandwidth selection-related limitations; moreover, with the temporal dimension considered, adequate panel datasets are required before these models can be applied.In addition, GTWR requires the analysed geographic entities of urban expansion to have varying spatial coordinates at different study phases; if this condition is not satisfied, the estimated results will be close to pooled OLS [66,70].

Machine Learning-Based Models
In recent years, machine learning-based models have been widely applied in various fields for clustering, classification and regression.Among them, random forest (RF), a tree-based machine learning model (Figure 16), has been extensively employed in DAUE.This model can deal with driving factors with high dimension and multicollinearity while avoiding overfitting [72].Unlike many other weakly interpretable machine learning models, such as the multilayer perceptron (MLP) and convolution neural network (CNN), RF is able to identify the importance of features by comparing the changes in out-of-bag errors after the features are disturbed [29], or the average changes in the Gini index following the splitting of feature nodes [73].Moreover, the correlation pattern between each driving factor and urban expansion can be visualized in the form of partial dependency plots [74].
Compared with regression models, RF can be applied to datasets with finer spatial resolution and more multidimensional driving factors, although it is still impacted by poorer interpretability.For example, for the above-mentioned regression models, the necessity of specific factors can be tested with reference to the statistical significance of coefficients; by contrast, RF can only provide a global ranking of the features' importance, and cannot evaluate whether this factor is redundant for the whole model.

Machine Learning-Based Models
In recent years, machine learning-based models have been widely applied in various fields for clustering, classification and regression.Among them, random forest (RF), a tree-based machine learning model (Figure 16), has been extensively employed in DAUE.This model can deal with driving factors with high dimension and multicollinearity while avoiding overfitting [72].Unlike many other weakly interpretable machine learning models, such as the multilayer perceptron (MLP) and convolution neural network (CNN), RF is able to identify the importance of features by comparing the changes in out-of-bag errors after the features are disturbed [29], or the average changes in the Gini index following the splitting of feature nodes [73].Moreover, the correlation pattern between each driving factor and urban expansion can be visualized in the form of partial dependency plots [74].

Brief Model Summary
Most of the above-mentioned traditional correlation analysis and regression models are mechanism-driven models with specific statistical assumptions; a few such assumptions are that the input features are linearly addictive to some extent, the residuals are drawn from white noise distribution, and the spatial effect is isotropic.Therefore, the observed data must be pre-processed (for example, through spatial resolution reduction) to fit the model-specific statistical assumption, which will inevitably lead to the loss of important information.Additionally, because of ideal assumptions, mechanism-driven Compared with regression models, RF can be applied to datasets with finer spatial resolution and more multidimensional driving factors, although it is still impacted by poorer interpretability.For example, for the above-mentioned regression models, the necessity of specific factors can be tested with reference to the statistical significance of coefficients; by contrast, RF can only provide a global ranking of the features' importance, and cannot evaluate whether this factor is redundant for the whole model.

Brief Model Summary
Most of the above-mentioned traditional correlation analysis and regression models are mechanism-driven models with specific statistical assumptions; a few such assumptions are that the input features are linearly addictive to some extent, the residuals are drawn from white noise distribution, and the spatial effect is isotropic.Therefore, the observed data must be pre-processed (for example, through spatial resolution reduction) to fit the model-specific statistical assumption, which will inevitably lead to the loss of important information.Additionally, because of ideal assumptions, mechanism-driven models fail to capture latent relationships beyond imagination, such as high-order interaction between features, geometrically anisotropic relationships, and spatially varying driving relationships at finer resolutions.
On the other hand, most machine learning models can be categorized as data-driven models, and complex relations in driving factors can thus be automatically learned by these models.However, being constrained by interpretability, only RF among the existing machine learning models has been widely used in DAUE.However, since any linear relationship between input feature and the outcome has to be approximated by splits in trees [75], tree-based models may be weak when dealing with linear relationships.Besides, as a shallow learning model, RF is not able to extract high-level information (e.g., texture information) on driving factors.
In terms of the analysis of driving relationships or driving mechanisms, except for SEM, all the aforementioned models can only quantitatively measure the strength of driving effect, and the underlying driving mechanisms are usually obtained by supplemented qualitative analysis.Furthermore, although SEM can quantitatively analyse the assumed driving mechanism, this type of analysis model is commonly built on simple regression methods.Therefore, modelling the complex feedback mechanisms between urban expansion and multiple socio-environmental factors at fine spatial resolution are beyond the scope of all these models thus far.Here, we provide a theoretical description of these models and summarize their advantages and disadvantages in Table 4.  • May be weak when dealing with linearity relationships; the measured ranks of importance lack evaluation metrics 1,2 indicate mechanism-driven models and data-driven models, respectively.

Discussion: Limitations and Future Research Directions
Through a holistic review of the development processes, hot themes and quantitative models employed in DAUE cases published between January 1961 to March 2023, this study found that the DAUE field in the past 40 years mainly had the following four limitations: (1) The scales of study targets in existing DAUE researches have been expanded from single cities to entire countries, and the analysis granularity has been increasingly refined from city level to pixel/parcel level.However, the vertical and horizontal interactions among hierarchical urban systems have not been fully addressed.
(2) The research area of DAUE has gradually evolved into a stage of data-driven analysis; thus, various physical information has been extracted from abundant remote sensing data to improve DAUE research.However, the potential of these data has not been completely exploited, and the required fine-scale social factors cannot be adequately extracted only based on these data sources.
(3) Both mechanism-driven and data-driven models have been employed in research in DAUE.However, most of the employed models can only measure driving effects via processing shallow feature relationships, so that complex driving mechanisms with higherlevel feature patterns (e.g., high-order interaction, feedback) are ignored.
(4) Both researchers in CA-based urban simulation and in the DAUE field have paid close attention to the solutions of sustainable urban development.Meanwhile, simulation models can play an important role in the development and testing of theories [76].However, most research works in urban CA and DAUE are independent from each other, and integrations of both of them are seldom conducted.
These limitations and their corresponding solutions (i.e., future research directions) are presented in detail as follows.

The Complement of Multi-Scale Interaction Research on Hierarchical Urban Systems
Current study targets covered by the field of DAUE have expanded in scale from single cities to metropolitan areas to urban agglomerations and even to entire countries, while the analysis granularity has gradually shifted from cities to parcels and even pixels.However, due to the intrinsic hierarchy and complex interactions in urban systems, quantitative research in DAUE still suffers from twofold inadequacies.
First, few existing studies consider hierarchical interactions across different spatial levels.Geographical units at all administrative levels are interrelated as a nested system, in which a change in spatial pattern at one level is usually associated with combined impacts from the upper or lower levels.For example, the transformation of cultivated lands to residential areas is not only driven by the living convenience of surrounding functional zones such as shopping malls, but is also controlled by macro-scale governmental urban planning.
To date, several multi-scale analyses have been proposed to address interactions across different levels.For example, C. Li et al. [77] studied the driving effects of the same factors on the prefecture and county levels, respectively, finding that both similarities and differences between different urban levels could be observed.Since these multi-scale models usually operate on different spatial levels in parallel, attempts to find top-down synergistic driving mechanisms remained unsuccessful.In this regard, some researchers have also employed HLM to study the driving relationships across adjacent administrative levels, such as modelling the ancillary effect of provincial socioeconomic factors on prefectural urban expansion [43].However, this type of hierarchical method focuses more on the macro-scale levels and thus cannot be directly extended to much finer levels (e.g., cadastral parcels).
To solve these problems, it is critical to explore a unifying analysis framework or hybrid coupling approach in order to capture the comprehensive characteristics of multiscale DAUE, such as hierarchical interrelation and the scale dependency of driving effects.This will enable the drivers or constraints on specific urban levels to be efficiently determined, so that both macro-and micro-policies can be recommended together to regulate urban expansion.
Second, existing studies do not adequately represent the mutual interactions between cities in the context of urban agglomeration.With the continuous improvement of infrastructure networks, interactions caused by the flows of material, information and capital are an inherent aspect of the urban expansion process in urban agglomerations.The cities within these networks are closely connected, meaning that urban expansion in each city will be affected not only by local driving factors but also by distant factors (i.e., tele-driving factors).
Early DAUE studies usually treated cities in urban agglomerations as independent, and neglected the impacts of intensive interactions between them.Several theories were subsequently proposed to model this spatial interaction, including land teleconnection [78] and land telecoupling [79].Li and Xiong [80] investigated the driving mechanisms of urban expansion in China, suggesting that local driving factors would have spill-over effects on neighbouring cities.Several other studies employed gravity-based models to measure the interrelationships between cities [81][82][83].However, these theories or models cannot fully represent the complex interactions occurring in urban agglomeration because of their ideal prerequisite hypothesis and oversimplification of reality.
Therefore, to arrive at a more profound analysis of the driving effects and mechanisms underlying urban expansion, researchers need to treat urban agglomerations as dynamic interactive systems rather than isolated independent urban sets.When constructing driving factors in these areas, spatial interactions caused by regional factor mobility, the spatial spill-over effect, and the siphon effect should all be highlighted.

The Supplement of Remote Sensing-Derived Data and Assimilation of Multi-Source Spatiotemporal Big Data
Classic data sources in DAUE are statistical data from governments or questionnaire data obtained through public surveys; notably, in earlier studies, the poor acquisition efficiency and low spatiotemporal resolution of such data placed significant constraints on deeper analysis [84].Thereafter, the application of remote sensing (RS) and GIS technology greatly improved the availability of high-resolution geographical data, enabling researchers to monitor spatiotemporal dynamics while delving into the driving analysis of urban expansion at multiple scales.
Nevertheless, in terms of the data obtained through remote sensing, most such data are 2D urban physical data (e.g., those pertaining to land use type, impervious surfaces, DEM, and waterbodies); accordingly, the potential of various RS data in DAUE has not been fully exploited, and additional data sources with greater diversity, accessibility, and completeness are also required.For example, the long-term height data of buildings will facilitate the identification and monitoring of urban regeneration, which can be derived from satellite-based photogrammetry [85] or Interferometric Synthetic Aperture Radar (InSAR) images [86].Furthermore, night-time light (NTL) remote sensing data have been widely utilized to predict regional GDP and population distributions [87,88]; thus, efforts can also be made to derive more socio-economic indicators of this kind for adoption in DAUE.With the help of these RS data, it will be possible to conduct a more multidimensional assessment and analysis of urban expansion.
Meanwhile, with the development of information and communications technology, individuals have become ubiquitous social sensors that are constantly collecting data about modern society [89].Thus, massive spatiotemporal big data are emerging from social sensing (e.g., mobile phone data, GPS trajectories, social media data, and other volunteered geographic information (VGI)) and can be assimilated into DAUE study [90,91].Since these accumulated social sensing data accurately represent the daily activities of urban residents, they can be utilized to construct more diverse driving factors that reveal the interactions between regions or cities.For instance, through logistic data and online shopping data, the economic connections between cities and their spatiotemporal dynamics can be identified; with business data on mergers and acquisition, the capital flow and network structure within urban agglomeration can be determined [92]; by identifying the mobility and distribution patterns of intra-urban residents using social media data or trajectory data, long-term driving factors can be built to indicate neighbourhood isolation, social segregation, and urban vibrancy [93][94][95].Therefore, assimilating more novel spatiotemporal data, with large samples and wide coverage, in support of DAUE study represents a promising research direction.

The Performance Improvement of Quantitative Models Based on Interpretable Machine Learning
Quantitative DAUE models have progressively evolved to keep up with researchers' deepening perceptions of the dynamics of urban expansion over space and time.More specifically, a series of models have been proposed for spatial effects, including the geodetector, SLM, SEM, and GWR, etc.Moreover, to characterize temporal correlation, some studies have employed fixed-effect or random-effect models, while GTWR was developed to capture both spatial and temporal effects.
As the diversity and volume of geo-referenced datasets have increased, the application of classic regression models has inevitably encountered some challenges.First of all, studies of micro-scale urban expansion will tend to suffer from sample imbalance, as most of the land use cells are unchanged.Moreover, the complicated relationships embedded in the process of urban expansion (such as the multicollinearity of factors, tele-coupling among cities, and mutual feedback between land use change and environmental factors) cannot be processed intelligently by classic regression models.In addition, large numbers of samples from richer data sources will cause further computational efficiency problems.For example, the computational intensity of spatial regression will increase exponentially with the amount of input data, as the dimension of a spatial adjacency matrix is equal to the sample count.
Current machine learning (ML) models, especially deep learning (DL) models, excel at processing high-dimensional samples, extracting high-level feature information and handling nonlinear laws.Moreover, a variety of ML-based models have been extensively applied in various fields, such as anomaly detection [96], computer vision [97], traffic prediction [98], satellite image classification [99], and land-use/land-cover (LULC) modelling [100].Therefore, these powerful ML/DL models (e.g., CNN) could be employed in DAUE to address the aforementioned challenges.However, an obvious shortcoming is the generally low interpretability of ML/DL models, which will impede their application.
To date, the interpretability of machine learning has attracted widespread research interest, such that considerable advances in this area have already been achieved [101].A series of intrinsically interpretable models (such as Bayesian networks that are integrated with causal inference) and post hoc explanation methods have been developed and successfully applied in many fields, including knowledge discovery, image classification, and clinical medicine [102][103][104].Moreover, theory-coupled interpretable ML models are encouraged [105] and have already been preliminarily practiced [106].Therefore, the combination of interpretable machine learning models with multi-source spatiotemporal big data for DAUE will not only broaden the horizon of research, but will also facilitate the transition of its study paradigm from mechanism-driven regression analysis to data-driven causality inference.DAUE-related research can gain helpful insights from CA-based urban simulation in several key ways.First, there exists CA-based urban simulation focusing on multiscale urban expansion (e.g., [107]), which can provide a valuable reference for modelling hierarchically interrelated urban systems in the DAUE context.Additionally, studies of temporal dependency are inadequate in DAUE, while the ways in which temporally stationary driving factors explain urban expansion have been discussed at length through CA-based urban simulation [28].Finally, interactions within neighbourhoods, an issue of concern in DAUE modelling, have long been probed in simulation fields.For example, CLUE is a type of LULC simulation model designed to capture the feedback between land use change and the relevant factors [108]; FLUS integrates the interactions and competition of different land use types driven by human and natural factors [109].In turn, one future direction of CA-based urban simulation would be to achieve long-term, interpretable, and transferrable simulation of urban systems [110][111][112][113].To accomplish this, a thorough and deep understanding of the driving mechanisms in the dynamic of urban expansion should be attained, primarily through DAUE.Detailed spatiotemporal patterns of driving factors identified through DAUE will provide CA-based urban simulation with sound reference to this research objective.
In short, DAUE can pave the way for more scientific and realistic urban simulation, while urban simulation will simultaneously deepen DAUE research and enable it to better serve urban land management.Therefore, it will be mutually beneficial to motivate the integration of DAUE and CA-based urban expansion simulation in order to develop a multiscale, wide-range, and long-term urban simulation system, at the same time facilitating the integration of simulation modelling and causal explanations of the processes underlying urban land system.

Conclusions
Urban expansion is an important indicator of socioeconomic development in the progress of urbanization, and has accordingly attracted many researchers to carry out intensive studies on the driving analysis of urban expansion (DAUE).Based on DAUE case studies from the WOS Core Collection, we holistically analysed the development process and quantitative models of current DAUE research.In conclusion, the past halfcentury has witnessed an explosive growth in DAUE case studies.Benefiting from the abundance of finer spatial data and the innovation of quantitative models, research content has become increasingly diverse in terms of research scale, driving factors, and analysis per-spectives.To encourage the furtherance of DAUE research, we map out the following future research directions: (1) to pay attention to the hierarchical characteristics of urban systems and conduct multi-scale research on the complex interactions within them in order to capture more dynamic features; (2) to leverage remote sensing data to obtain more urban expansion data and assimilate multi-source spatiotemporal big data to supplement novel socio-economic driving factors; (3) to integrate with interpretable data-driven machine learning techniques in order to bolster the performance and reliability of DAUE models, as well as to favour data-driven causal inference; (4) to couple with the field of urban simulation to achieve the complementary enhancement of model accuracy, interpretability and transferability, meanwhile facilitating model-driven theory development and testing.
indicates a remarkable aggregation of research interest in the detected terms.Moreover, burst duration is defined with reference to the beginning and ending year of the burst state, as shown in Figure A3.The application of burst detection can reveal the explosion, evolution, and decline of the concern for research themes and accordingly facilitate the identification of research fronts in an academic field.The burst detection algorithm proposed by Kleinberg [115] can detect significan changes in the frequency of keywords or terms over a short period of time.A detected burst term has two features: burst strength and burst duration.High burst strength indi cates a remarkable aggregation of research interest in the detected terms.Moreover, burs duration is defined with reference to the beginning and ending year of the burst state, a shown in Figure A3.The application of burst detection can reveal the explosion, evolution and decline of the concern for research themes and accordingly facilitate the identification of research fronts in an academic field.

2. 1 .
The Booming of DAUE Study and Transition of Involved Research Institutions 2.1.1.Three Identified Development Stages of DAUE As Figure 2 illustrates, from January 1961 to March 2023, the annual publication volume (including early accessed paper) and total citation frequency of DAUE studies undergo an overall exponential increase.Three development stages of DAUE can be identified: the start-up stage

Figure 3 .
Figure 3. Top ten influential journals for publications in DAUE.The number in brackets in the year that a DAUE paper was first published in this journal.

Figure 2 .
Figure 2. Annual publication volume (including early-access papers) and citation frequency of DAUE.

Figure 2 .
Figure 2. Annual publication volume (including early-access papers) and citation frequency of DAUE.

Figure 3 .
Figure 3. Top ten influential journals for publications in DAUE.The number in brackets indicates the year that a DAUE paper was first published in this journal.Figure 3. Top ten influential journals for publications in DAUE.The number in brackets indicates the year that a DAUE paper was first published in this journal.

Figure 3 .
Figure 3. Top ten influential journals for publications in DAUE.The number in brackets indicates the year that a DAUE paper was first published in this journal.Figure 3. Top ten influential journals for publications in DAUE.The number in brackets indicates the year that a DAUE paper was first published in this journal.

Figure 5 .
Figure 5. Cooperative network of major countries involved in DAUE.

Figure 5 .
Figure 5. Cooperative network of major countries involved in DAUE.

Figure 5 .
Figure 5. Cooperative network of major countries involved in DAUE.

Figure 6 .
Figure 6.Development of "keywords of urban expansion".

Figure 7
Figure 7 shows that although the specific content of driving analysis has gradual become more in-depth and complex, studies addressing driving mechanisms (how th driving factors impact urban expansion) are far fewer than those merely focusing on th driving effects/relationships (to what extent driving factors influence urban expansion In the selection of driving factors, since approximately 2013, researchers have explicit extended their sphere of interest from only considering city-level macro driving factors further exploring how "spatial determinants" at the parcel or pixel level (such as distan to rivers) impact urban structure and urban land allocation.

Figure 6 .
Figure 6.Development of "keywords of urban expansion".

Figure 10 .
Figure 10.Major quantitative models/methods employed in DAUE.The number in brackets indicates the specific year in which the model was first used in DAUE.

Figure 10 .
Figure 10.Major quantitative models/methods employed in DAUE.The number in brackets indicates the specific year in which the model was first used in DAUE.

Figure 11 .
Figure 11.An example of assumed driving mechanism pending to be tested by structural equation modelling (SEM).Latent variables and observed variables are marked in circles and squares, respectively.Path coefficients (i.e., β, β') describe the directed prediction relationship between variables.

Figure 11 .
Figure 11.An example of assumed driving mechanism pending to be tested by structural equation modelling (SEM).Latent variables and observed variables are marked in circles and squares, respectively.Path coefficients (i.e., β, β') describe the directed prediction relationship between variables.

Figure 16 .
Figure 16.An example of random forest algorithm.

Figure 16 .
Figure 16.An example of random forest algorithm.

4. 4 .
The Mutually Beneficial Integration of CA-Based Urban Expansion Simulation and DAUE DAUE and CA-based urban simulation are two relatively independent branches in the domain of LULC research.DAUE study stresses the interpretability and reliability of the model outputs that quantify the driving relationships or mechanisms underlying urban expansion, while CA-based urban simulation focuses more on the predictability and prediction accuracy of urban land use change models.However, if the interpretability of driving relationships and mechanisms is ignored, the CA simulation results cannot provide scientific guidance for practical urban planning; moreover, the low accuracy of CA simulation models may imply that important factors affecting urban expansion have been neglected in the DAUE context.In essence, the research contents of CA-based simulation and quantitative DAUE overlap to a certain extent, indicating a strong tendency to interrelate.Therefore, efforts could be made to facilitate the coupling of CA-based urban simulation and DAUE in order to achieve complementarity.

Figure A2 .
Figure A2.An example of co-authorship analysis results.

Figure A3 .
Figure A3.Burst detection of the research theme "GIS", for which the burst state extended from 2005 to 2013.

Figure A3 .
Figure A3.Burst detection of the research theme "GIS", for which the burst state extended from 2005 to 2013.

Table 2 .
Results of burst detection.

Table 3 .
Three basic forms of HLM.

Table 4 .
Characteristics of quantitative models employed in DAUE.