The Inﬂuence of the Calibration Interval on Simulating Non-Stationary Urban Growth Dynamic Using CA-Markov Model

: The temporal non-stationarity of land use and cover change (LUCC) processes is one of the main sources of uncertainty that may inﬂuence the calibration and the validation of spatial path-dependent LUCC models. In relation to that, this research aims to investigate the inﬂuence of the temporal non-stationarity of land change on urban growth modeling accuracy based on an empirical approach that uses past LUCC. Accordingly, the urban development in Rennes Metropolitan (France) was simulated using ﬁfteen past calibration intervals which are set from six training dates. The study used Idrisi’s Cellular Automata-Markov model (CA-Markov) which is an inductive pattern-based LUCC software package. The land demand for the simulation year was estimated using the Markov Chain method. Model validation was carried out by assessing the quantity of change, allocation, and spatial patterns accuracy. The quantity disagreement was analyzed by taking into consideration the temporal non-stationarity of change rate over the calibration and the prediction intervals, the model ability to reproduce the past amount of change in the future, and the time duration of the prediction interval. The results show that the calibration interval signiﬁcantly inﬂuenced the amount and the spatial allocation of the estimated change. In addition to that, the spatial allocation of change using CA-Markov depended highly on the basis land cover image rather than the observed transition during the calibration period. Therefore, this study provides useful insights on the role of the training dates in the simulation of non-stationary LUCC.


Introduction
Spatially explicit land use and cover change (LUCC) models are developed in order to improve the understanding of the spatial systems dynamic and behavior, develop hypothesis, predict future LUCC, and support policy makers in taking rational decisions [1][2][3][4][5][6]. Several reviews show that these computational models vary widely in underlying the theoretical assumptions and methodological approaches [6][7][8][9][10]. Land change modeling approaches can be arranged according to their relative emphasis on pattern versus process and projection versus explanation [11]. Most LUCC modeling software packages are empirical approaches based on past land change [12]. They rely on historical data for their calibration to project future changes [13]. The most popular LUCC models developed to simulate real-world urban processes are cellular automata models [14].
LUCC modeling continues to raise several issues and particularly those related to the calibration and validation processes. The calibration process consists in obtaining the values of the transition rule parameters that enable data-driven models to more accurately reproduce observed past tendencies [10,14,15]. Statistical approaches and machine learning algorithms are largely used in calibrating land use models [10]. The main recent advances in the calibration of LUCC models focus on the neighborhood effect, spatial patterns, and application of landscape metrics [10,16,17].
Achieving an appropriate level of credibility in LUCC models' performance for landscape planning decision support requires a proper validation [18,19]. According to [11], successful application of these models requires that model choices respect modeling goals. Furthermore, [11] argue that sensitivity analysis, pattern validation, uncertainty sources, and structural validation should be taken into consideration in the practice of land change models. Therefore, a consistent methodology for assessing model accuracy is mandatory in performing cross-model comparisons and clarify trade-offs among quantity, allocation, and spatial configuration accuracy components [19,20]. In addition to that, there are many challenges and conceptual problems associated with the use of empirical land change models [1,20,21]. According to [1], the main restrictive assumptions are the linear correlation between predictors and dependent variables, temporal stationarity, spatial autocorrelation, and unidirectional causal change. Actually, there are several non-stationarity issues that are raising challenges in LUCC modeling. Some of these main issues concern (1) the large share of non-changing land which has a tendency of skewing validation measures and calibration procedures, (2) the change over time of land change processes, and (3) the change over time of land change drivers.
The non-stationarity of LUCC processes is one of the main sources of uncertainty that may influence the calibration of models. In fact, LUCC involves nonlinear socioeconomic and biophysical processes, non-stationary drivers over time and space, and feedback responses among variables [7,15,[22][23][24]. According to [24], modeling the timing and the system changes tendency is crucial because of non-stationarity and systemic changes in land-use change projections. These changing drivers and processes influence the calibration [10,23,[25][26][27] and outcomes of empirical models [24,25,28,29]. In addition to that, it is challenging for available algorithms to capture the dynamic and nonlinear humanenvironment processes that drive complex land changes [5,23,26].
The stationarity of LUCC dynamics is systematically assumed in the majority of LUCC modeling frameworks, which restricts the usefulness of empirically-fitted LUCC models [1]. Indeed, many land change modeling applications often stress the assessment of the model's predictive performance without taking into consideration the non-stationarity of involved processes and drivers [5,10,20,[30][31][32]. Actually, these studies apply the same set of parameters and explanatory variables when a calibrated model is extrapolated in the validation time interval [10]. Thus, model performance may be underestimated in replicating past land use dynamics when LUCC is assumed to remain unfluctuating over time [26].
Though an extensive body of literature exists on LUCC modeling approaches, only a few applications tackle the temporal non-stationarity in land change [24,26,27,29,33]. As pointed out by [34], the spatially non-stationary relationships between driving factors and land use and land cover (LULC) categories in the transition rules are often neglected in many studies. As pointed by [35], the relationship between non-stationary processes and the predictability of land changes has not yet been explored. Some solutions are nevertheless proposed to overcome the limitations of the Markov chain approach, which assumes the stationarity of the transition probabilities over the calibration and prediction intervals [15]. For example, [33] use the Flow matrix to measure the temporal instability of the annual change among the time intervals that compose the time extent. As an alternative to the Markov matrix, the proposed Flow matrix expresses the sizes of the transitions among categories between two dates [15,33]. Recently, [34] analyzed the role of the spatial non-stationarity of change in the process of LUCC simulation by combining geographically weighted logistic regression and Cellular Automata (CA)-Markov model. Moreover, [29] investigate how the use of historical maps may affect the quality of the calibration process of the cellular automata urban models. However, the trade-offs of choosing one calibration interval over another needs to be justified in order to avoid any negative consequence of an arbitrary choice effect on the model's output.
The purpose of this research is to investigate the influence of the non-stationarity of land change on the modeling accuracy of an urban growth model over time. It seeks to Remote Sens. 2021, 13, 468 3 of 20 distinguish between the inaccuracies due to the model's behavior and the disagreements resulting from the non-stationarity of input data over the calibration and validation time extent. The application was conducted to Rennes Metropolitan area (France). It consists in simulating future urban development using different calibration intervals based on past LUCC. With five input LULC maps for 2003,2006,2010,2011,2012, and 2015, fifteen discrete time intervals were used in order to parameterize the model and then project LUCC for 2018. The simulated LULC maps for 2018 were compared to the actual map based on the quantity, allocation, and spatial patterns accuracy. The study used the CA-Markov software package, which is available in the IDRISI Selva raster-based spatial analysis software [12,36]. CA-Markov is referred to as spatially explicit, inductive pattern-based, and expert knowledge-driven model [12]. CA-Markov model uses the Markov chain approach to compute the transition potential matrix. In CA-Markov, the suitability map is generated by establishing a relationship between explanatory drivers and observed land use and land cover map using a multi-criteria evaluation (MCE) method [12,37,38].

Definitions and Statements
This article uses three terms to describe time: extent, interval, and duration [33]. Temporal extent is the period of time between the first date 2003 and the last date 2018 in a time series data. Interval is a period for which data exist both at an initial date and at a subsequent date. The calibration interval [T 0 , T 1 ] was used to parameterize the model, whereas the simulation interval [T 1 , T 2 ] was used to simulate the future urban growth for the desired simulation date T 2 . Based on LULC maps for 2003, 2006, 2010, 2011, 2012, and 2015, fifteen training intervals within the time extent were implemented to calibrate the model and then to simulate developed areas for 2018: [2003,2006], [2003,2010], [2003,2011], [2003,2012], [2003,2015], [2006,2010], [2006,2011], [2006,2012], [2006,2015], [2010,2011], [2010,2012], [2010,2015], [2011,2012], [2011,2015], and [2012,2015]. Duration is the amount of time of any interval. Furthermore, the temporal stability and the temporal stationarity terms are used interchangeably to describe the degree to which the rate of the actual land change is still consistent over time extent [33,39,40].
For each simulation, only one single interval using only two input LULC maps from two different training dates can be used for calibrating the model, whereas the start of the simulation interval is also the end of the calibration interval. The calibration of CA-Markov is based on one basis LULC image, which is also the calibration endpoint (T 1 ). Duration is the amount of time of any period. Furthermore, the terms temporal stability, temporal homogeneity, and temporal stationarity are used synonymously to establish the validity of model assumptions, and to explore whether the rate of land change trajectories is still consistent over time [33,[39][40][41].
Many land cover datasets and remote sensing images such as CORINE Land Cover database and large-scale 30-m land cover maps are delivered with a certain spatial and temporal accuracy that is often unsuitable for the temporal slight change analysis in an urban landscape. Consequently, the observed change can be simply due to misclassification errors especially in small zones undergoing slow rate of change as in the current research study area. Accordingly, for temporal land change analysis requirements and modeling purposes over fifteen years from 2003 to 2018, all the input LULC maps are performed using image interpretation. The manual digitization of urban areas is carried out from the ESRI's World Imagery basemap in ArcGIS Online. World Imagery provides satellite and aerial imagery with a ground resolution of 1 m or less, for example, the satellite images used for mapping impervious surfaces in 2018 are provided with an accuracy of 8.5 m and a ground resolution that varies between 30 and 50 cm. In addition to that, increased built-up areas between 2003 and 2018 are mapped using visual interpretation of historical Google Earth's free and public high-resolution images whose resolution varies according to the source of data.

Study Area
Rennes Metropolitan is located in the northwest of France (Figure 1). In 2012, it summed up to 705 km 2 and had a population of 4,120,717 inhabitants that were dispersed between 37 municipalities. Rennes Metropolitan is one of the booming areas in the west of France. Since the late 50s, Rennes Metropolitan has acquired strong experience in land management and urban planning.

Study Area
Rennes Metropolitan is located in the northwest of France ( Figure 1). In 2012, it summed up to 705 km 2 and had a population of 4,120,717 inhabitants that were dispersed between 37 municipalities. Rennes Metropolitan is one of the booming areas in the west of France. Since the late 50s, Rennes Metropolitan has acquired strong experience in land management and urban planning.

Study Area
Rennes Metropolitan is located in the northwest of France ( Figure 1). In 2012, it summed up to 705 km 2 and had a population of 4,120,717 inhabitants that were dispersed between 37 municipalities. Rennes Metropolitan is one of the booming areas in the west of France. Since the late 50s, Rennes Metropolitan has acquired strong experience in land management and urban planning.     (Table 1). Mainly, urban development has occurred through an edge growth process at the proximity of previously developed areas in order to preserve rural areas and prevent urban sprawl ( Figure 2). The analysis of actual spatial growth patterns shows a non-stationary dynamic. In addition to that, the urban development process is not spatially homogeneous and significant differences are observed over time. Moreover, the number of urban patches (NP) has decreased from 240 in 2003 to 218 in 2018, and the Euclidean mean nearest neighbor distance (ENND_MN) has slightly increased from 105 m in 2003 to 137 m in 2018 (Table 1).

Independent Variables-Constraints and Potential Driving Factors
The input data that are likely to drive the observed urban growth dynamic in Rennes Metropolitan area were prepared to calibrate CA-Markov model and set the simulation rules. These biophysical driving factors covering the same area with the same spatial resolution (15 m) consisted of the existing developed areas, slope, digital elevation model, distance to developed areas, distance to major transportation network, excluded areas from urbanization, and local urban planning document that was designed to establish the general rules concerning land use at the municipality scale ( Figure 3). The slope and main road maps are unchanged for all the simulations, regardless of the calibration interval. The distance to developed areas was calculated for the year T 1 of every given calibration interval [T 0 , T 1 ]. The excluded zones included water bodies, wetlands, woodlands, and other protected areas with respect to current land use planning documents and guidelines requirements. The final layer combining all these constraints data was converted into a binary map in order to constraint future development and preserve natural land resources.  (Table 1). Mainly, urban development has occurred through an edge growth process at the proximity of previously developed areas in order to preserve rural areas and prevent urban sprawl ( Figure 2). The analysis of actual spatial growth patterns shows a non-stationary dynamic. In addition to that, the urban development process is not spatially homogeneous and significant differences are observed over time. Moreover, the number of urban patches (NP) has decreased from 240 in 2003 to 218 in 2018, and the Euclidean mean nearest neighbor distance (ENND_MN) has slightly increased from 105 m in 2003 to 137 m in 2018 (Table 1).

Independent Variables-Constraints and Potential Driving Factors
The input data that are likely to drive the observed urban growth dynamic in Rennes Metropolitan area were prepared to calibrate CA-Markov model and set the simulation rules. These biophysical driving factors covering the same area with the same spatial resolution (15 m) consisted of the existing developed areas, slope, digital elevation model, distance to developed areas, distance to major transportation network, excluded areas from urbanization, and local urban planning document that was designed to establish the general rules concerning land use at the municipality scale ( Figure 3). The slope and main road maps are unchanged for all the simulations, regardless of the calibration interval. The distance to developed areas was calculated for the year T1 of every given calibration interval [T0, T1]. The excluded zones included water bodies, wetlands, woodlands, and other protected areas with respect to current land use planning documents and guidelines requirements. The final layer combining all these constraints data was converted into a binary map in order to constraint future development and preserve natural land resources.  Land change processes are modeled using stochastic models [41,42]. In fact, Markov processes are stochastic processes in which the state of the system at a given time t+1 is derived from its state at an earlier time t and independent of its history before a spot of time t [39]. Actually, Markov chain is a Markov process which is treated as a series of transitions between certain finite or denumerable possible states of the process. The quantity of land cover change is calculated by means of a transition area matrix using the Markov chain model [33,39,42].
where V t is the 1-by-n row vector of the input land cover proportion area at an initial time t, V t+1 is the 1-by-n row vector of the output land cover proportion area at the subsequent time t + 1, M is a n × n transition matrix for an interval t + t + 1, P ij expresses the conditional probability that a pixel of category i at time t transitions to category j by time t + 1, and n is the number of land cover categories. The conditional probability of transition for a cell transitioning from the category i at time t to the category j at the subsequent time t + 1 is calculated by dividing entry Cij(t) of the raw matrix by the row's marginal sum. The calibration of the matrix Mt based on a single past time interval using two observed land cover maps allows making projections of future land change.
where Cij is the size of area that transitioned from category i at t to category j at time t + 1, ∑ n j=1 Cij is the row's marginal sum, and n is the number of land cover categories. The transition probability matrix for the predicted year is firstly calculated based on Equation (4). The original transition probability matrix is obtained based on the two former LULC maps that are used to calibrate the model.
where P(t2) is the state probability of a point of time, P(t1) is the preliminary state probability, and P is the original transition probability matrix.

CA-Markov Model
Markov model, which is intrinsically aspatial, lacks spatial dependence of geographical cells upon changes in neighboring cells [41]. Therefore, the hybrid CA-Markov model combines the Markov chain that determines the quantity of change and Cellular Automata in order to spatially allocate the estimated change [12,43]. In fact, the Cellular Automata approach adds the spatial contiguity role to the Markov model. The cellular entity changes its state based on both its previous state and adjacent neighbors [43,44]. Therefore, cells that are close to the existing urban areas have a high probability to change [44]. As a matter of fact, the spatial allocation of change using CA-Markov is based on the suitability map. In parallel, other models use the transition potential approach [12,37,38].
Running CA-Markov module in IDRISI program requires inserting the basis land cover image, the Markov transition areas file, transition suitability image collector, and filter type. Criteria that determine the most suitable areas for projected urban development include the excluded protected zones, slope gradient, proximity to roads, and distance to existing developed areas ( Figure 2). Proximity to developed areas is calculated for the end point (T 1 ) of the calibration interval [T 0 , T 1 ]. Additionally, the constraints for urban development are provided in a binary map. Zero is assigned to areas of absolute constraints and 1 corresponds to unconstrained areas which have no impact. The transition potentials are multiplied by the constraints map during the change prediction process. Furthermore, input factors are previously standardized using IDRISI Fuzzy module. The weight assigned to each driver (distance to urban areas in the basis image: 0.772; distance to roads: 0.173; slope: 0.055) are obtained by using the Analytical Hierarchy Process (AHP) method based on a pair-wise comparison matrix [45,46]. The multi-criteria evaluation (MCE) decision support tool is employed to compute transition suitability maps ( Figure 4) using standard kernel size of 5 × 5 pixels contiguity filter.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 19 urban development are provided in a binary map. Zero is assigned to areas of absolute constraints and 1 corresponds to unconstrained areas which have no impact. The transition potentials are multiplied by the constraints map during the change prediction process. Furthermore, input factors are previously standardized using IDRISI Fuzzy module. The weight assigned to each driver (distance to urban areas in the basis image: 0.772; distance to roads: 0.173; slope: 0.055) are obtained by using the Analytical Hierarchy Process (AHP) method based on a pair-wise comparison matrix [45,46]. The multi-criteria evaluation (MCE) decision support tool is employed to compute transition suitability maps ( Figure 4) using standard kernel size of 5 × 5 pixels contiguity filter.

Accuracy Assessment
The validity of each simulation output is assessed by comparing the predicted to the actual urban areas for 2018. The comparison is carried out based on the quantity, allocation, and spatial patterns disagreements [5,19,20,47,48]. The quantity error is due to the discrepancy between the estimated and real amounts of change. In parallel, the allocation disagreement is due to spatial inconsistency. A three-dimensional table is performed using the reference last calibration date (T1) map which is also the basis land cover image in CA-Markov module, the reference 2018 map, and the predicted 2018 map [49]. The observed change is the union of hits (observed change correctly predicted as change) and misses (observed change but predicted as persistent), whereas the predicted change consists of hits and false alarms (observed persistent but predicted as change) [20,47]. This comparison permits to differentiate between the fits that are due to the presence of persistent cells (null successes) and the agreements that result from the correct predicted change and [7,20,49,50].
The calculation of the error due to quantity (Equation (5) where ΔTSimulation is the duration of the simulation interval (years), B is the observed annual change rate over the simulation interval (ha/year), and C is the predicted annual change rate over the simulation interval (ha/year). The majority of the studies dealing with LUCC modeling, especially the accuracy assessment, tighten the analysis of the error disagreement, which may cause some con-

Accuracy Assessment
The validity of each simulation output is assessed by comparing the predicted to the actual urban areas for 2018. The comparison is carried out based on the quantity, allocation, and spatial patterns disagreements [5,19,20,47,48]. The quantity error is due to the discrepancy between the estimated and real amounts of change. In parallel, the allocation disagreement is due to spatial inconsistency. A three-dimensional table is performed using the reference last calibration date (T 1 ) map which is also the basis land cover image in CA-Markov module, the reference 2018 map, and the predicted 2018 map [49]. The observed change is the union of hits (observed change correctly predicted as change) and misses (observed change but predicted as persistent), whereas the predicted change consists of hits and false alarms (observed persistent but predicted as change) [20,47]. This comparison permits to differentiate between the fits that are due to the presence of persistent cells (null successes) and the agreements that result from the correct predicted change and [7,20,49,50].
The calculation of the error due to quantity (Equation (5)) involves the time duration of the simulation interval (∆T Simulation = 2018 − T 1 ) in addition to the observed (B) and the predicted (C) annual change rates over the simulation period [T 1 , 2018].
where ∆T Simulation is the duration of the simulation interval (years), B is the observed annual change rate over the simulation interval (ha/year), and C is the predicted annual change rate over the simulation interval (ha/year). The majority of the studies dealing with LUCC modeling, especially the accuracy assessment, tighten the analysis of the error disagreement, which may cause some confusions between the real model performance and the temporal non-stationarity of change. As illustrated above (Equation (5)), the error due to quantity involves the actual (B) and the simulated (C) change rate parameters that result from two different and independent processes. In fact, such processes make the analysis of the error due to quantity difficult. Accordingly, the quantity of disagreement can be reformulated for better interpretation and analysis by introducing two additional parameters (Equation (6)). The variable (A-B) is used to compare the actual change between the calibration and the simulation intervals. It allows measuring the temporal non-stationarity assumption of the change rate over the calibration and the simulation intervals. The variable (A-C) allows to evaluate the ability of the Markov process in estimating the appropriate change rate by replicating the calibration past change rate (A) in the simulation period. In fact, the estimated change (C) is determined based on the observed change rate during the calibration interval (A) using the Markov chain model.
where ∆T Simulation is the duration of the simulation interval (years), A is the observed annual change rate during the calibration interval (ha/year), B is the observed annual change rate over the simulation interval (ha/year), C is the predicted annual change rate over the simulation interval (ha/year), (A-C) is a measure of the ability of the model to replicate the calibration change rate in the simulation interval, and (A-B) is a measure of the temporal non-stationarity between the calibration and the simulation intervals. (A-B) equals 0 when the two intervals have the same observed change rate. In parallel to that, a set of seven commonly used spatial metrics is calculated using the FRAGSTATS Spatial Pattern Analysis Program for Categorical Maps [51]. Spatial metrics are implemented to assess the influence of the calibration dataset variability on the spatial configuration accuracy of the model's projections [52,53]. Selected metrics, which are calculated based on the four cells neighborhood rule, comprise the number of patches (NP), patch density (PD), landscape shape index (LSI), mean patch area (Area-MN), mean Euclidean nearest neighbor distance (ENND-MN), large patch index (LPI), and patch aggregation index (AI). Table 2 shows the temporal instability of the observed change rate between the calibration (A) and the prediction (B) intervals. The error budget analysis, which was performed at the study area level, revealed a significant variation in the quantity disagreement of the estimated change using the Markov chain model.

Quantity of Change Estimate
The results exhibited an error due to quantity ranging from 45 ha to 540 ha with a standard deviation of 136 ha. The largest error due to quantity (540 ha) was produced by using the calibration interval [2003,2006] and the simulation interval [2006,2018] which had the largest time duration ( Figure 5). The best score (45 ha) was achieved by using the most recent land cover maps of 2012 and 2015 to calibrate the model.  The results also show the ability of the Markov model in reproducing the observed past change rate in the simulation interval (Table 2). Indeed, the estimated annual growth rate during the simulation interval (C) fit the actual growth rate during the calibration interval (A) for all the simulations. Moreover, the observed annual growth rate during the calibration intervals (A) was higher than the estimated annual growth rate during the prediction interval (C) for all the simulations. The observed difference (A-C) varied from 0 to 2 ha. Therefore, the amount of error due to quantity was primarily determined by the time duration of the simulation interval and the difference of the observed annual change rate between the calibration and the simulation intervals. In addition to that, Figure 6 illustrates a significant variation in the false alarms, misses, and hits. The number of false alarms was slightly greater than the number of misses for all the examined calibration intervals. Accordingly, the Markov model overestimated the amount of predicted developed areas compared to the observed change.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 19 Figure 6. False alarms, misses, and hits using different calibration intervals.
For more details, three categories of calibration intervals could be distinguished based on the distribution of correctness and disagreements. The first group, comprising the calibration interval [2003,2006], exhibited the maximum number of hits, false alarms, and misses compared to other groups. The second group was composed of five calibration intervals that used 2015 basis land cover image, which was the closest calibration point to the simulation date (2018). This group provided the best fit of 45 ha by producing fewer false alarms and misses. At the same time, it yielded the lowest correctness score.

Calibration interval
Hits Misses False alarms Figure 6. False alarms, misses, and hits using different calibration intervals.
For more details, three categories of calibration intervals could be distinguished based on the distribution of correctness and disagreements. The first group, comprising the calibration interval [2003,2006], exhibited the maximum number of hits, false alarms, and misses compared to other groups. The second group was composed of five calibration intervals that used 2015 basis land cover image, which was the closest calibration point to the simulation date (2018). This group provided the best fit of 45 ha by producing fewer false alarms and misses. At the same time, it yielded the lowest correctness score. Indeed, the correct predicted cells represented 6% of the total predicted change when using 2015 basis land cover image, whereas it reached 20% when using 2006 basis land cover image. The last category, summing up to nine calibration intervals using land cover basis images of 2010, 2011, and 2012 was characterized by an intermediate score of correctness and errors.

Spatial Allocation of Change
Simulated developed areas for 2018 using different calibration time intervals [T 0 , T 1 ] were associated with different amounts of spatial correctness and errors ( Table 3) Table 3 depicts different scores of correctness, errors, and persistent areas depending on calibration intervals. The error due to allocation tended to decrease as the calibration interval involved a more recent date as the endpoint (T 1 ). For example, [2003,2015],    Largely, the allocation error still remained the major error component (Figure 9). It varied from 71 to 94% of the total error compared to the quantity disagreement that    Largely, the allocation error still remained the major error component (Figure 9). It varied from 71 to 94% of the total error compared to the quantity disagreement that Largely, the allocation error still remained the major error component (Figure 9). It varied from 71 to 94% of the total error compared to the quantity disagreement that ranged from 6 to 29%. Moreover, values of the allocation disagreement, which ranged from 461 to 1810 ha, were more dispersed compared to the error due quantity. With a difference of 1349 ha between the minimum and the maximum values, the allocation error presented a standard deviation of 419 ha.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 19 ranged from 6 to 29%. Moreover, values of the allocation disagreement, which ranged from 461 to 1810 ha, were more dispersed compared to the error due quantity. With a difference of 1349 ha between the minimum and the maximum values, the allocation error presented a standard deviation of 419 ha. Figure 9. Percentage of error due allocation and error due to quantity with respect to total error. Table 4 shows a comparison between the fifteen simulated urban areas for 2018 using spatial metrics. The results highlight a significant variation, especially in the number of patches (NP), patch density (PD), mean patch area (Area-MN), and mean distance  Table 4 shows a comparison between the fifteen simulated urban areas for 2018 using spatial metrics. The results highlight a significant variation, especially in the number of patches (NP), patch density (PD), mean patch area (Area-MN), and mean distance (ENND-MN) between different simulated urban patterns for 2018. Other metrics including landscape shape index (LSI), aggregation (AI), and large patch index (LPI) provided very small variations which should be carefully interpreted by taking into consideration the spatial resolution of the input data.  Figure 10 illustrates that the allocation of the predicted change was characterized by a contiguous spatial pattern, because the actual and the simulated urban development have occurred close to existing developed areas. However, the results indicate that simulated urban areas in 2018 using CA-Markov did not match the spatial configuration of the reference change in terms of the total number of urban patches. Actual urban patches ranged from 240 for 2003 to 218 for both 2005 and 2018, respectively (Table 1). At the same time, Table 4 shows that the number of the simulated urban patches varied from 124 for the calibration interval [2003,2006] to 173 for the calibration interval [2010,2011].
The number of the simulated urban patches was smaller than the number of the observed patches compared to the actual situation in 2018 and the basis land cover (T 1 ) image. For example, the simulated patches accounted for 124, while the actual situation in the reference (T 2 = 2018) and the basis (T 1 = 2006) LULC maps, respectively, accounted for 218 and 228. Actually, the CA-Markov model underestimated the total number of patches by simulating an edge-growth pattern due to the combined effect of the user-defined transition rules and the spatial contiguity filter. CA-Markov has therefore underestimated the number of developed patches by merging closer urban areas regardless of the calibration interval.
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 19 same time, Table 4 shows that the number of the simulated urban patches varied from 124 for the calibration interval [2003,2006]   The number of the simulated urban patches was smaller than the number of the observed patches compared to the actual situation in 2018 and the basis land cover (T1) image. For example, the simulated patches accounted for 124, while the actual situation in the reference (T2 = 2018) and the basis (T1 = 2006) LULC maps, respectively, accounted for 218 and 228. Actually, the CA-Markov model underestimated the total number of patches by simulating an edge-growth pattern due to the combined effect of the user-defined transition rules and the spatial contiguity filter. CA-Markov has therefore underestimated the number of developed patches by merging closer urban areas regardless of the calibration interval.

Quantity of Change Estimate
This current study shows that varying the pair training dates caused significant variations in the estimated amount of the projected developed areas. However, the majority of LUCC modeling approaches, especially those that use Markovian models, assume that the spatiotemporal dynamic of land change still remains homogenous over the calibration and the simulation periods. Such a temporal extrapolation modeling approach may be consistent only when developing the business-as-usual scenario. However, the assumption that future land demand can be solely estimated by extrapolating past tendency could be erroneous in many cases [12]. In fact, estimating the amount of change based on temporal extrapolation modeling process is unable to deal with intermediate changes and temporal discontinuity over time. Actually, demographic, political, socioeconomic, and environmental changes lead to non-stationary urban land demand and spatial patterns over time.
Furthermore, [5] point out that the strict assumption associated with a Markov process can lead to conceptual problems when the calibration and the simulation intervals consist of different time durations. Still, the current study demonstrates that the

Quantity of Change Estimate
This current study shows that varying the pair training dates caused significant variations in the estimated amount of the projected developed areas. However, the majority of LUCC modeling approaches, especially those that use Markovian models, assume that the spatiotemporal dynamic of land change still remains homogenous over the calibration and the simulation periods. Such a temporal extrapolation modeling approach may be consistent only when developing the business-as-usual scenario. However, the assumption that future land demand can be solely estimated by extrapolating past tendency could be erroneous in many cases [12]. In fact, estimating the amount of change based on temporal extrapolation modeling process is unable to deal with intermediate changes and temporal discontinuity over time. Actually, demographic, political, socioeconomic, and environmental changes lead to non-stationary urban land demand and spatial patterns over time.
Furthermore, [5] point out that the strict assumption associated with a Markov process can lead to conceptual problems when the calibration and the simulation intervals consist of different time durations. Still, the current study demonstrates that the differences between the calibration and the simulation intervals' time durations did not matter. For example, the calibration interval [2012,2015] and the simulation interval [2015,2018], which had the same short time duration (∆T = 3), yielded the best quantity agreement (Table 2). However, the quantity disagreement using the calibration interval [2006,2012] and its corresponding simulation interval [2012,2018] accounted for 168 ha even if they consisted of the same time duration (∆T = 6). Actually, the amount of quantity disagreement resulted from the combination of the observed annual change rates during the calibration and the simulation intervals, the simulation time duration, and the predicted annual change rate during the simulation interval (Equation (6)).
The results reveal that a low score of error due to quantity did not automatically mean that the model was efficient enough (Equation (6)). In fact, the error due to quantity provides a useful measure of model performance only if the observed annual change rate remains stationary over the calibration and the validation intervals (A − B = 0). Conversely, the calibration intervals [2006,2010], [2006,2012], [2006,2015], [2010,2015], [2011,2012], [2011,2015] and [2012,2015] had A = C, which means that the Markov model performed perfectly in estimating the appropriate annual change rate. In addition to that, the predicted change rate fit the actual change rate during the calibration period. In this case, the error disagreement was due to the temporal non-stability (A-B) of the urban growth intensity over time. Moreover, the estimated amount of change during the simulation interval using the Markov technique was lower than the baseline for all the calibration intervals. In fact, the difference (A-C) between the observed annual growth rate during the calibration interval (A) and the predicted annual growth rate during the prediction interval (C) showed that the Markov method tended to underestimate the amount of the predicted developed areas. Systematically, many studies that deal with LUCC modeling validation techniques under the stationarity assumption [5,47] underestimate the real predictive performance of models in replicating the past amount of change in the future. In fact, they usually attribute the error due to the temporal non-stationarity of land change to the model limitations. Accordingly, the proposed method to analyze the quantity disagreement components will be useful in assessing the real accuracy of models. It permits to distinguish between the capacity of the adopted modeling framework in reproducing past tendency and the non-stationarity of land change over the calibration and simulation time intervals. The modeling framework includes not only the model's behavior but also the modeler's methodological choices, particularly in determining the simulation transition rules using a knowledge-driven approach such as in CA-Markov change potential evaluation. In addition to that, assessing LUCC simulation accuracy is a critical task which requires high change detection accuracy. In fact, misregistration of the satellite images and misalignment of different data sources drop the accuracy of the detected change [13,20]. Thus, the current study used manual image interpretation for mapping historical LUCC because mismatched data and classification errors will ultimately lead to inaccurate predictions and bias in the model performance assessment, especially in landscapes experiencing small changes.

Spatial Allocation of Change
The results point out that using different calibration time intervals [T 0 , T 1 ] resulted in significant variations in the amount of hits and allocation disagreement. The error due to allocation, which depends on the spatial allocation process that is based on the change potential map, tended to decrease as the calibration interval used a recent endpoint reference map (T 1 ). In fact, false alarms and misses dropped as the last calibration point (T 1 ) got closer to the target simulation point (T 2 ). For example, the shortest (∆T Calibration = 3) calibration interval [2003,2006], which was associated with the longest (∆T Simulation = 12) simulation interval [2006,2018], provided the maximum allocation error, but also the highest share of the correct allocated urban cells. Conversely, all of the calibration intervals that used the most recent basis image (T 1 = 2015 and ∆T Simulation = 3) yielded the lowest amount of the misallocation pixels, but also the lowest quantity of hits. Moreover, the allocation error constituted the major error component. It varied from 71 to 94% of the total error, compared to the quantity disagreement that ranged only from 6 to 29%. In addition to that, the values of the allocation disagreement were more dispersed compared to the error due quantity. Therefore, future improvements should focus more on the spatial allocation of the change process.
The spatial patterns of the simulated urban areas using CA-Markov did not perfectly match the spatial configuration of the reference map. Significant variations were observed in the number of patches (NP), mean patch area (Area-MN), and mean distance (ENND-MN) between the different simulated LULC maps for 2018. For example, the number of the simulated urban patches was smaller than the number of the observed ones in the reference land use map (T 2 ), but also smaller than the basis land cover image (T 1 ) itself. In addition to that, the analysis of the spatial metrics showed that as the basis land cover image (T 1 ) got more closer to the simulation target (T 2 ), the number of patches, patch density, and shape index increased, while mean patch area and mean distance decreased.
Overall, the simulated spatial patterns were relatively realistic even they were slightly more compact compared to the actual situation. In fact, the CA-Markov model tends to generate compact spatial patterns by simulating the new individual urban cells mainly adjacent to the existing urban areas in the basic land cover image. For the fifteen simulations, CA-Markov exhibited limitations in dealing with the spatial allocation of the individual patches. This shortcoming can have a negative effect when reproducing past spreading development trends using a path-dependent approach. This behavior of CA-Markov was probably due to the theoretical approach that was applied on the growth modeling using CA models [52]. Actually, a CA-based approach includes feedback mechanisms due to the proximity effect [12,16]. Accordingly, the urbanization level of the local neighborhood of a grid cell strongly influences the allocation of new developed patches [52]. In CA-Markov, the neighborhood filter leads to a suitability score decrease as the candidate pixels are getting away from the existing built-up areas [12]. Nevertheless, the simulated compact spatial patterns cannot be explained only by the neighborhood filter effect. In fact, the effect of the simulation framework related to selected driving forces and change potential evaluation approach should not be neglected. For instance, the current research used the change potential map which was specifically elaborated using a user knowledge-driven approach in order to reproduce the actual dominant edge growth pattern.
Regarding spatial allocation, the simulated change using the CA-Markov model was more determined by the last calibration date rather than the transition that has occurred during the calibration interval. Hence, non-stationary change processes cannot be captured based on the CA-Markov empirical approach. Actually, one of the main differences between CA-Markov and other LUCC software packages within the Idrisi program is the change potential map that is used in the spatial allocation of change. CA-Markov is based on the suitability map, which does not take into consideration the spatial patterns of past changes during the calibration period [12]. In fact, the suitability map is elaborated based on the relationships between the selected explanatory drivers and only one available land use map. According to [37], the reference map used by CA-Markov for transforming and weighting input variables is the state of LULC at the last calibration date. Conversely, the reference map in Multi-Layer Perceptron (MLP), SimWeight, and Logistic Regression (LR) methods corresponds to the transition probability map, which takes into account spatial patterns of past changes from the first to the last calibration dates.

Implications on the Model Performance and Outputs
Using appropriate time series input data and calibration intervals is extremely important when the objective is to reproduce past change patterns and validate empirically based LUCC model under the hypothesis of temporal homogeneity. Otherwise, significant errors could be induced, especially when past trend-based models are used in the simulation of path-dependent land use scenarios [33]. In fact, the results revealed how outcomes and accuracy scores of the Markov chain and CA-Markov models vary significantly with respect to selected calibration dates. The magnitude of measured residues is largely due to the non-stationarity of the observed land change over the calibration and the prediction intervals rather than the inadequacy of Markov model. Therefore, the validation interval needs to be consistent with the calibration interval in terms of the amount and spatial patterns of change. This requirement will lead to a better evaluation of the model ability in reproducing past-trend dynamics. [27] propose the estimation of the degree of significance of selected training dates using long time series of LULC maps. Subsequently, several calibration intervals should be included based on the research objective, types of errors to be assessed (quantity, allocation, and spatial patterns), and trade-offs in the model accuracy. The variation in the size and intensity of change across time intervals has to be determined. For instance, the quantitative method developed by [54] can be employed to unify size measurements and stationarity of land changes by time interval. In addition to that, the comparison of Markov matrices associated with different time intervals can be performed using a variety of methods that have been developed to adjust conditional probabilities of transitions (Equation (3)) to represent an equivalent duration [33,[39][40][41].
The assessment of spatial allocation requires that the validation interval should match the reference quantity during the calibration interval because of the sensitivity of the pattern metrics to the quantity of each category [11]. According to [55], evaluating spatial allocation accuracy of models is difficult when the quantity error is large. Thus, the ability of models to predict spatial allocation can be better assessed when the quantity of change is correctly estimated [55]. For example, the calibration interval [2012,2015], which yields the least quantity error, can be used to examine the ability of the model to replicate past spatial patterns in the prediction interval [2015,2018]. Furthermore, artificially simulated images with specific properties can also be generated in case of lack of actual calibration data that are required to assess the model's performance in reproducing specific aspects of past changes.
Spatiotemporal non-stationarity of LUCC still remains a complex issue. It involves several parameters including selected calibration time points (long vs. short time duration), input calibration data characteristics (spatial and thematic accuracy), scale of investigation, and underlying processes (global vs. local approach). According to [53], low spatial resolution and thematic accuracy may result in a more generalized structure of the mapped urban land cover objects, thus leading to an overestimation of landscape homogeneity. Conversely, relevant structures may get lost in a highly heterogeneous pattern due to a detailed landscape classification [53].
The number of training dates and the duration of calibration intervals depend on the research objective, observed intensity, and spatial patterns of past change, availability of data, and model inputs requirement. In addition to that, it is a necessity to deal with the risk of extrapolating and generalizing the observed change that has occurred during the calibration interval. According to [13], the temporal extent of the calibration period also affects the simulated LUCC results using models that rely on historical data for their calibration. In fact, a long time calibration interval privileges global trends while tending to average out the extreme tendencies [15]. However, fluctuations due to noise or one-off events are more likely to be captured in a short calibration interval. In addition to that, [12] indicated that the suitability approach is likely to perform better and be more stable as the simulation is carried out over a long period with non-stationary change patterns. According to [10], longtime interval might cause a high number of misses, but also a high share of hits. For example, the shortest time calibration interval [2003,2006], corresponding to the simulation interval [2006,2018], which was the longest simulation period (∆T = 12), accounted for the highest quantity and allocation errors, but also the highest number of correct predicted cells. However, the shortest calibration interval [2012,2015], associated with the shortest simulation interval [2015,2018], yielded the lowest hits but also the lowest quantity and allocation disagreements.
Furthermore, [7] have already developed an assessment framework that represents the ability of multiple human-environmental models in addressing the tridimensional spatial, temporal, and human-decision making real-world complexity. Models with a high value for temporal complexity may incorporate multiple time steps, long duration, and capacity to handle time lags or feedback responses [7]. For instance, the calibration of the SLEUTH CA-based urban growth model [56] requires four reference dates compared to the majority of existing LUCC models that are limited to two calibration dates only. Other simulation tools like Land Change Modeler [20] are calibrated based on one single past time interval using only two reference land cover maps. As pointed out by [15], the temporal resolution of LUCC models includes the number of available calibration dates and time intervals. In fact, [29] and [24] have found that the number of time-points used in the calibration of a CA-based model significantly influences the calibration result.

Conclusions
The influence of the non-stationarity of land change on the calibration and the accuracy assessment of the inductive pattern-based LUCC models is seldom addressed in the literature review. Actually, the stationarity of land change over time is frequently assumed in the majority of LUCC modeling studies. Regarding this, this research aimed at informing users on the influence of the calibration interval on the outputs of LUCC models and their validity estimation. In fact, taking into consideration the non-stationarity of the land change dynamic is a requisite to provide better estimation of the predictive performance of path-dependent LUCC models.
The findings, specifically, demonstrate the influence of varying the calibration time points on the simulation of non-stationary urban growth spatiotemporal dynamic using a CA-Markov model. They exhibited significant variations in the quantity and the spatial allocation agreements with respect to the selected key training dates. For all fifteen examined calibration intervals, the results showed that the Markov model performed better in estimating the annual change rate of the urban growth based on historical data. However, the analysis of the simulated spatial patterns showed that the CA-Markov model tended to foster compact urban form. In addition to that, the spatial pattern accuracy of CA-Markov depended highly on the basis land cover image rather than the observed transition during the calibration period.
At last, the study suggests the use of the appropriate calibration and simulation intervals during which the amount and the spatial patterns of change are stationary. It also recommends the distinction between the non-stationarity of the observed real change and the inability of the adopted modeling framework, which includes both the model behavior and the user's methodological choices, in reproducing past land changes. For that matter, this paper proposes a useful analysis of errors and correctness variations by examining the error budget using different calibration intervals based on historical LULC data. Data Availability Statement: Very high resolution satellite data that support the findings of this study are openly available in Google Earth database or ESRI's World Imagery basemap.