## 1. Introduction

Simulation models of urban systems were first developed in the 1950s and 1960s as a way to understand the complexity of cities and to forecast trends and consequences of planning policies. Several formalisms (thermodynamics, general systems theory, synergetic, microsimulation) were used [

1], following fashions as well as opportunities arisen from access to new technologies [

2,

3]. Each of the methods cited have their specific advantages and drawbacks, that we will not discuss here, but all of them provide a similar opportunity and a similar challenge that is linked to simulation and has not changed as the formalisms evolved. The opportunity that we focus on in this paper is the function of a virtual laboratory that a computerised simulation model enables [

4], which is of paramount interest for human and social sciences, where

in vivo experiments are impossible. By allowing to implement competing and/or complementary hypotheses into generative mechanisms (that is, a set of interaction activities performed by entities (cities) producing emergent patterns [

5]) in a model testable against empirical data, this function of virtual laboratory makes the model a framework for the evaluation of the plausibility of different theories. However, simulation as a method and an epistemological way of testing theories gives way to a limitation known since von Bertalanffy [

6] as equifinality. It describes the fact that even when a model is performing well, one cannot infer that the underlying combination of mechanisms is the one operating “in real life”, because several models can lead to the same results (many-to-one) and the same process can lead to several qualitatively different results (one-to-many) [

7]. The problem of adequacy assessment of the model with real life (also known as ontological adequacy testing, cf. [

8]) is twofold. First, many effective (or “real”) processes can lead to the same observed situation within the target system that we aim to model. Second, inadequate mechanisms implemented in the model can simulate in a satisfying way the situation under study. The challenge of identifying the actual causal mechanisms is a recurrent problem in social simulation [

9,

10], but one that is usually overlooked at the stage of results’ analysis.

We have tried in a previous study [

11] to tackle the one-to-many part of the challenge within the model, using an evolutionary algorithm to look for the maximum diversity in patterns produced by a given set of mechanisms in a model of systems of cities. What we present here relies to the multiplicity of possible causes leading to the single historical trajectory observed empirically. We present a multimodelling framework which allows to combine different mechanisms into a modular model evaluated against a unique set of evaluation criteria, enabling for the comparison of the performance of different model structures to simulate urbanisation and the evolution of cities in a territorial system.

More precisely, we are interested in simulating the co-evolution of cities encompassed in a territorial system (typically a nation or a continent) and to reproduce their hierarchical and spatial patterns. In other words, we want to obtain simulations that produce the stylised facts [

12] generalised from the empirical observation of various systems of cities: their hierarchy, spacing and functional differentiation [

13].

Section 2 presents the patterns we aim to reproduce and the catalogue of theories and mechanisms on which we draw to compose the model.

Section 3 describes the multi-model and its implementation in our study case, the evolution of the Soviet and post-Soviet system of cities (around 2000 cities).

Section 4 exposes the results of its exploration through multi-calibration. This exploration aims to explore the performance of different hypotheses in explaining the evolution of Soviet and post-Soviet cities.

Section 5 concludes on the study case and the method.

## 3. Modular Multimodelling Experiment

The incentive to implement competing and complementary theories into different models evaluated against one another is a recurrent plea in the simulation literature [

7,

59,

60,

62,

82]. It reveals how tricky its implementation and automatic evaluation might be, besides the epistemological challenge of equifinality and the kind of conclusions one can draw from this confrontation. Indeed, thirty years after the “Automated Modeling System to Explore a Universe of Spatial Interaction Models” by Openshaw [

61], there are no standard tools nor formal methodology for theory testing with simulation models. Indeed, Openshaw’s automated way to explore model structures, being a pure optimisation way of discovering model structures, is impressive methodologically but it does not suit our goal of theory testing in a virtual laboratory, because it can result in optimal models that are impossible to interpret. Instead, we think that the first step should be to gather a catalogue of theoretical processes and mechanistic hypotheses working as potential explanations. This usually precedes the mixed-modelling step [

63,

64] and prevents from endlessly building models “from scratch” [

62]; it allows to build on previous work and experience.

Following the ODD protocol [

65], we briefly present an overview of the model (O,

Section 3.1), its design concepts (D,

Section 3.2) and implementation details (D,

Section 3.3). More specifically, we distinguish between a baseline model of urban interactions and the hypotheses implementing our catalogue of mechanisms as modular blocks, that can be activated or discarded to compose a family of simulation models (

Section 3.3).

#### 3.1. Overview

This family of models aims at simulating the demographic growth of cities of the Former Soviet Union, in terms of magnitude, rhythm and spatial location. The concept of model family points to the fact that we consider several mechanisms as candidates for explaining the historical processes observed, and we aim to test and compare their ability to simulate the (post-)Soviet urban growth.

Agents in the model are collective entities: cities. We model between 1145 and 1822 such cities (depending on the modelling period) for each simulation. This number corresponds to the number of urban agglomerations recorded at the initial date for the period under enquiry in the Soviet Union [

56]. Two periods are considered: a Soviet simulation (from 1959 to 1989, in 30 time steps of one year) and a post-Soviet one (1989 to 2010, based on Russian census dates). A city (low level) is described by a location (absolute and relative, such as the belonging to a region), a population and quantities of production, consumption and total wealth. The system of cities (high-level) is described by the distribution of city sizes.

At each step:

Each city updates its economic variables based on its current population;

Cities interact (i.e., exchange product) with other cities according to they supply, demand and distance;

Each city updates its wealth based on the results of its interactions;

A simulation step ends when each city updates its population.

This baseline schedule is modified when additional mechanisms are activated in the model structure. We detail these modifications in the following detailed description of additional mechanisms (

Section 3.3).

#### 3.2. Design Concepts

The family of models includes a weak concept of emergence [

66]. This means that no surprising pattern emerges from cities’ interactions, but that they organise so as to produce patterns of increasing (or decreasing) [

11] levels of hierarchy of their size distribution. Cities adapt to their interaction network by adjusting their (population) size. They sense their environment in two additional mechanisms, when they can benefit from underground resources or regional rural migrations. Their interactions are monetary as cities do not exchange residents: they trade production value (either bilaterally in the baseline model or through the mutual redistribution of taxes in the redistribution mechanism). Cities are part of regional collectives when the redistribution or urban transition mechanisms are activated. No fitness is computed at the city level: we keep this term for the calibration, to qualify the fitness between the simulated size distribution and the historical one over time. The model finally involves neither prediction nor stochasticity.

#### 3.3. Implementing Mechanisms as Building Blocks (Details)

The multi-model is composed of the baseline model and additional modules of mechanisms that override the sequence of agents’ rules when they are activated. In the following sections, we detail the baseline model and the modular blocks of mechanisms that are added incrementally to the original equations. For further informations, the baseline model and the first two additional mechanisms were described and evaluated in details in [

54].

#### 3.3.1. The Baseline Model

The baseline model relies on the assumption that population and wealth are the basic descriptors of cities and the engine of their co-evolution. It therefore models exclusively size effects of population on wealth and spatial interactions.

At initialization, each city

i of the Former Soviet Union is setup with its historical population

${P}_{i}$ at the beginning date of simulation, and located at its empirical coordinates, enabling site, situation and distance to play in the same geometry as in the target system. An estimated value of wealth

${W}_{i}$ (expressed in a fictive unit) is determined for each city with respect to its size, following Equation (

2).

This first mechanism is a first possibility of implementing the theoretical hypothesis of agglomeration economies. Wealth is indeed distributed superlinearly for each value of

$populationToWealth$ significantly greater than 1. Time is modelled as discrete steps, each of which represents a time period of one year, during which interactions occur in a synchronous way. At each step:

**Each city i updates its economic variables:** a global supply

${S}_{i}$ Equation (

4) and a global demand

${D}_{i}$ Equation (

7), according to its population

${P}_{i}$ and three parameters (

$economicMultiplier$,

$sizeEffectOnSupply$ and

$sizeEffectOnDemand$).

**Each city interacts with other cities** according to the intensity of their potential

$IP$ Equation (

9). For two distinct cities

i and

j, the computation of the interaction potential

$I{P}_{ij}$ consists in confronting the supply of

i Equation (

11) to the demand of

j with an equation borrowed to the gravity model Equation (

12).

with

${d}_{ij}$ a measure of distance between

i and

j.

Interactions of cities

i and

j based on their potential

$I{P}_{ij}$ result in a transaction

${T}_{ij}$ from

i to

j Equation (

13).

**Each city updates its wealth ${W}_{i}$** according to the results of its transactions

T (unsold supply

$U{S}_{i}$ Equation (

15) and unsatisfied demand

$U{D}_{i}$ Equation (

16)) in which it was committed Equation (

14).

**A simulation step ends when each city updates its population** according to its new resulting wealth Equation (

17) :

#### 3.3.2. Mechanism Increments

“[...] features a non-zero sum game [...], rewarding cities who effectively interact with others rather than internally. We assume that the exchange of any unit of value is more profitable when it is done with another city, because of the potential spillovers of technology and information [

54].”

This bonus

${B}_{i}$ depends on the volume of transactions and the diversity of partners

${J}_{i}$ the city

i has exchanged with Equation (

18). It is added to the wealth at the end of each step Equation (

19), following Equation (

14):

n being the total number of cities in the system (

i.e., 1145 for a simulation beginning in 1959, 1822 for a simulation beginning in 1989), and

${J}_{i}$ the number of partners with which

i has engaged in the current simulation step.

“Every interurban exchange generates a fixed cost (the value of which is described by the free parameter

$fixedCost$). This implies two features that make the model more realistic: first, no exchange will take place between two cities if the potential transacted value is under a certain threshold ; second, cities will select only profitable partners and not exchange with every other cities. This mechanism plays the role of a condition before the exchange” [

54].

The interaction potential between city

i and city

j will be positive only if the potential value that

i is willing to sell to

j is superior to the fixed value it costs it to negotiate, prepare and send the transaction Equation (

20):

Therefore, each transaction of a city

i gives way to a fixed cost. Their sum is subtracted from the wealth of city

i at the end of each step Equation (

21), following Equation (

14):

**Site effects** are targeted by the

**resource** mechanism: site advantages are particularised in this model by natural resource deposits (more specifically: coal deposits

C on the one hand, and oil and gas deposits

O on the other hand). The assumption is made that if the city

i is located on some coal or oil deposits (

${C}_{i}=1$ or

${O}_{i}=1$), the city benefits from the advantage granted by the extraction activity. The capacity of extraction depends on the capital (wealth) of the city and takes the form of a wealth multiplier for each resource Equation (

22) after Equation (

14):

**Territorial and political effects** are formalised by the

**redistribution** mechanism. It allows for a redistribution of wealth between cities of the same territory

R (region or State). To do so, territorial taxes

$t{t}_{k}$ are collected in each city

${k}_{R}$, as a proportion

$territorialTaxes$ of their wealth. The total amount of taxes collected is

$T{T}_{R}$ Equation (

23):

From this taxes, the administrative status of the territory

R (denoted by

$C{C}_{i,R}$, set to 1 if

i is the capital city of the region and 0 otherwise) allows the capital city to take a share

$CS$ for its administration needs Equation (

24):

The rest of the taxes is redistributed to cities of region. Each city

${i}_{R}$ receives a share

$t{r}_{i,R}$ that is proportionate to its population Equation (

25):

The balance of the territorial redistribution is added to the wealth of a city Equation (

26) after Equation (

14):

**Finally, territorial and situation explanations** are mixed in the

**urban transition** mechanism. To account for the different opportunities of cities to attract rural migrants in the different regions, we model the evolution of the urban transition curves over time. As shown empirically [

45], 100 out of the 108 regions of the Former Soviet Union have followed the scheme of the urban transition. It means that their urbanisation rate

${U}_{R}$ (in %) has followed a logistic function over time

t Equation (

27):

The parameter

$urbanisationSpeed$ is not a free parameter, thus it will not be calibrated. It has been empirically determined as the “mean” transition regime that minimises the error of logistic adjustment performed on data depicting the urbanisation rate over time for each region of the post-Soviet Union [

45]. This was done to position every region on a single urbanisation curve with respect to the historical time lags their urban transition. Indeed, western regions were already mainly urban in the 1960s whereas some central asian countries (Tajikistan, Kirghizstan) are still dominantly rural nowadays. This imply strong disparities in migration opportunities from the cities. In order to model a generic process of urban transition, we located the different regions at different stages of the same transition curve instead of considering a specific urbanisation curve for each region. The consequence of this initialisation process is that each region will move one step further on the urbanisation curve (leading eventually to 100%) at each simulation step, but that the rural potential to migrate in cities will depend on its current position on the urban transition curve (high potential for weakly urbanised regions, small potential for regions already very urban). The migration potential of each city

i in territory

R is built as a multiplier

$T{M}_{R}$ specific to each region for each time step Equation (

28):

This extra population growth is added Equation (

29) after Equation (

17) and the new urbanisation rates of regions are updated for

$t+1$ Equation (

27):

Because we want to evaluate the contribution of each theoretical mechanism to the simulation of urbanisation and its interactions with other mechanisms, we need the modules to be easily activated and de-activated dynamically in the model. To that extent, we leveraged the mixin-methods system of the Scala programming language [

67]. The mixins were first proposed at the beginning of the 90 s in the Jigsaw programming language [

68]. They are now adopted in mainstream languages such as Scala. It has been established as a powerful way to perform type-safe dependency injection framework [

69], which is a powerful paradigm to achieve modularity. The mixin pattern allows to achieve type safe dependency injection. Mixins are therefore a suited tool to achieve modularity in model implementations, which we use to implement MARIUS in Scala language (to access the source code, cf. [

70]).

#### 3.4. Technical Modular Implementation

The mixin-methods concept [

71,

72] generalizes object-oriented programming to enable feature-oriented programming [

73]. It makes the definition of class hierarchy more flexible [

74]. Variations of isolated features (such as the different update functions of city wealth and population in our model) are defined in separate modules and mixed-in with each other at the object instantiation point, also called mixin application. The advantage of mixins (or trait) over classical object-oriented programming is the possibility of defining numerous variations of several features without increasing exponentially the number of specialized class.

For instance, let us consider a class

C implementing two methods

**a** and

**b**. We could think of class

C as the interaction potential function,

**a** representing the basic implementation of Equation (

9) and

**b** the fixed cost selection of Equation (

20). To define alternative implementations of those methods in the classical object-oriented paradigm, one would implement subclasses of

C and override the implementations of

**a** and

**b** in each subclass. For instance, in the Listing 1, the class

C1 specialises

C and defines implementations for the methods

**a** and

**b**.

Listing 1: Object oriented specialisation.
abstract class C extends A with B {
def a(x : Double) : Double
def b(x : Double) : Double
def c(x : Double) = /* Compute something using a and b * /
}
class C1 extends C {
def a(x : Double) = /* Some implementation of a */
def b(x : Double) = /* Some implementation of b */
}

This pattern achieves a very low level of code reusability. Let

**a** and

**b** have 10 possible implementations, then 100 specialised implementations of

C would be required. The mixins method solves the problems of combinatorial explosion of the number of implementations by delaying the entanglement of the class components at the instantiation site. Listing 2 exposes an implementation based on mixins providing alternative implementations of

**A** and

**B** and the corresponding parameter specifications. The implementation choice is delayed until the instantiation point (last lines of Listing 2) at which a mixin is defined.

Listing 2: Mixin in Scala.
trait A {
def a(x : Double) : Double
}
trait B {
def b(x : Double) : Double
}
trait C extends A with B {
def c(x : Double) = /* Compute something using a and b */
}
// Implementation 1 of trait A
trait A1 extends A {
def a(x : Double) = /* Some implementation */
}
// Implementation 2 of trait A
trait A2 extends A {
// Parameter p0 used in this version of a
def p0 : Double
def a(x : Double) = /* Some implementation using p0 */
}
// Implementation 2 of trait B
trait B2 extends B {
def b(x : Double) = /* Some implementation */
}
val instance 1 =
new C with A1 with B2 {}
val instance 2 =
new C with A2 with B2 {
// Value for parameter p0 of trait A2
def p0 = 1.0
}

Scala traits expressiveness can be leveraged to implement modular evolutive type-safe modelling frameworks proposing alternative model features, feature composability and the formalisation of the feature dependencies. The implementation of alternative behaviours in several traits provides the isolation of model component implementations and explicit dependencies between these components. Each component defines free parameters that have to be set at the model instantiation site, otherwise it won’t compile.

In this experiment, we defined each alternative model mechanism in a particular trait. The executable model has to be composed by picking the traits we wanted to test. In order to evaluate concurrently all the alternative mechanisms, we generated all the possible models (or combination of mechanisms). It was achieved using a code generation algorithm which produces all the possible models implementations by generating a Scala source code containing all the possible traits combinations, such as the one shown on Listing 3.

Listing 3: Example of generated code.
def model (index : Int , parameters : Seq[Double]) =
index match {
case 0 =>
new Model with T11 with T21 with . . . {
def p0 = parameters(0)
def p1 = parameters(1)
...
}
case 1 =>
new Model with T11 with T22 with ... {
def p0 = parameters(0)
def p1 = parameters(1)
...
}
case 2 => ...
}

This generated source code implements a single function encapsulating all the model alternatives. This function takes two arguments: an index of the implementation that shall be executed and a vector of parameters to set for the model (for this work, we calibrated only double precision floating points values). Note that the vector has a fixed size which does not depends on the model instantiated. A given model implementation generally does not use all the parameters: instead it will use only some of them and ignore the others (

Figure 1).

**Figure 1.**
Multi-Calibration Protocol. This schema proposes a simplified graphic description of the modular model building (**top row**) together with the empirical evaluation process (**middle row**) and the iterative genetic calibration of the model structures (**bottom row**).

**Figure 1.**
Multi-Calibration Protocol. This schema proposes a simplified graphic description of the modular model building (**top row**) together with the empirical evaluation process (**middle row**) and the iterative genetic calibration of the model structures (**bottom row**).

#### 3.5. Calibrating a Multi-Model

“Whatever changes occur in the institutional, political and social context of computational models, the question of how to learn from models remains. It is clear that assessment of the accuracy of a model as a representation must rest on argument about how competing theories are represented in its workings, with calibration and fitting procedures acting as a check on reasoning”

In order to evaluate the capacity of the models to reproduce the historical trajectory of urbanisation in the Former Soviet Union, we rely on an automated calibration. This procedure is part of the model evaluation [

75,

76,

78]. Its aim is to find the values of parameters for which the model results match the fitness criteria (here:

δ, the lowest possible distance to the data, once the realism criteria are met). If the model can be calibrated, then it is shown that the mechanisms included in the model are sufficient to simulate the urban trajectory (that does not prove yet that they are all necessary). If there are no parameter value for which the fitness criterion is met, then the combination of mechanisms is not sufficient to model urbanisation under the current implementation.

In order to calibrate all the models at once, we designed a variant of the NSGAII genetic algorithm that includes a niching mechanism [

79]. Niching methods aim at preserving suboptimal solutions to preserve diversity. Our niching algorithm divided the population of parameterised models (that is, one model with one vector of parameter values) into sub-populations, each sub-population containing one model alternative (every model in the sub-population have the same mechanism structure). The genome of each individual (an individual corresponds to a vector of parameter values and structure index that defines the model under evaluation) contains two parts. The first part is an integer value that corresponds to the index of the model alternative on which the genome is evaluated. The second part is a vector of double values containing the values of all the parameters for the model.

In order to evaluate a genome, we designed a fitness function. This function calls the generated function described above, runs the model and evaluates its dynamics using the fitness function described in

Section 2.3 (

i.e., the criteria of realism of micro-dynamics and the distance

δ between simulated and observed population data for each city at each census date,

Figure 1).

In NGSAII, the elitism operation preserves the best individuals among the whole population. The evaluation algorithm we applied has the exact same elitism strategy for each sub-population. No global elitism strategy was performed; instead, we kept the 50 best-performing individuals in each sub-population (or model combination of mechanisms). In order to speed up the convergence of the algorithm, we also tweaked the mutation operation: it had a $10\%$ chance of mutating the “model index” part of the genome. The new value for the “model index” was drawn uniformly among the possible model indexes. This allowed to periodically test on other models, some parameter values that were performing well on a given model.

We then distributed this algorithm on the European Grid Infrasctructure (EGI) using the technique known as the “island model” in the same way as we described in [

76]. We ran 200,000 jobs of 2 h. One model execution being of 40 s on average, this experiment corresponds to approximately 72 million evaluations of parameterised models. The set we analyse in the following section corresponds, for each of the 32 different structures of models (64 in fact because the model can be instantiated for two different time periods: 1959–1989 and 1989–2010), to the 50 vectors of parameter values that resulted in the smallest distance to empirical data, so altogether 3200 performant parameter sets associated with a model structure and a performance measure (

Figure 1). Instantiated model structures can be for example: baseline model + 1959–1989, baseline model + fixed cost + 1989–2010, baseline model + urban transition + resources + 1989–2010, baseline model + redistribution + urban transition + 1959–1989, baseline model + fixed cost + bonus + resources + 1959–1989,

etc.. Those are part of the 64 instantiated model structures for which 50 parameter sets were analysed.

## 4. Results: Hypothesis Testing to Explain Urbanisation in the Former Soviet Union

The analysis of calibration results consists in relating the performance of these calibrated models (in terms of distance between simulated and observed growth hierarchically and spatially) to their structure (their mix of mechanism) and the values of parameters associated with each activated mechanism. We detail each analysis below, and invite the reader to replicate them using the interactive application VARIUS, built to explore those results online [

77]. We present the results of this exploration in the form of three questions at the macro, meso and micro scale of the city-system.

- 1
**Which is the most parsimonious model to simulate the evolution of cities before and after the collapse of the Soviet Union?** A way to answer this question is to restrict the set of results to the five model structures that correspond to the a mix of two mechanisms: the baseline model + one additional mechanism (for example: resource extraction). That leaves us with $50\times 5=250$ parameter sets and 250 performance values δ for each time period analysed. We look at the lowest distance to empirical data achieved in this set of results, and identify the corresponding model structure (the additional mechanism involved) and parameter values as the best performing ones.

For simulations starting in 1959 up to 1989 (the Soviet Union actually came to an end in December 1991 but the last Census of the Soviet Union was performed in 1989), the most performant parameterised model with only one additional mechanism is composed of the baseline model plus the mechanism of urban transition (cf.

Table 1). It is characterised by significant economies of agglomeration (

$sizeEffectOnSupply=1.05$) but a linear function of demand with size. The rural multiplier is equal to 3.5% and allows to simulate fast urbanising regions of Siberia and Central Asia (cf.

Figure 2). However, the population of a majority of cities is under-estimated in the simulation, especially in the upper part of the hierarchy, around Moscow and in eastern Ukraine (or more generally in the Western part of the Former Soviet Union. See question 3 for an hypothesis as to why that might be.).

**Table 1.**
Parameter values of the best performing simple model before the political transition.

**Table 1.**
Parameter values of the best performing simple model before the political transition.
Parameter Name | Value | Mechanism |
---|

economicMultiplier | 0.002193758 | Baseline |

populationToWealth | 1.000184755 | Baseline |

sizeEffectOnSupply | 1.053943022 | Baseline |

sizeEffectOnDemand | 1.000000000 | Baseline |

wealthToPopulation | 0.203567639 | Baseline |

distanceDecay | 1.872702086 | Baseline |

ruralMultiplier | 0.034975771 | UrbanTransition |

Normalized δ | n cities | Time steps |

0.01423387 | 1145 | 30 |

**Figure 2.**
Spatial distribution of residuals for the parsimonious models. Only cities largely over- and under-estimated by the simulation are mapped in this figure. The size of the circles indicate the population of the city as simulated at the end of the period. Dark colours indicate large discrepancy: overestimation of the population in blue (negative residual) and underestimation in orange (positive residual). N.B.: To reproduce those maps on the application VARIUS, run the corresponding model and choose a residual cutoff of 0.3.

**Figure 2.**
Spatial distribution of residuals for the parsimonious models. Only cities largely over- and under-estimated by the simulation are mapped in this figure. The size of the circles indicate the population of the city as simulated at the end of the period. Dark colours indicate large discrepancy: overestimation of the population in blue (negative residual) and underestimation in orange (positive residual). N.B.: To reproduce those maps on the application VARIUS, run the corresponding model and choose a residual cutoff of 0.3.

After the (political) transition, the best calibrated model with one additional mechanism includes site advantages for cities. This model has two additional parameters and meets the evaluation criteria (per city per census) twice better than the best model for the previous period (0.005

vs. 0.01, cf.

Table 1 and

Table 2). The analysis of the parameters fits the empirical observations of faster growing cities located on oil and gas deposits (

$oilAndGasEffect=0.02$), and declining cities in the Donbass and Kuzbass coal regions (

$coalEffect=-0.01$). This model’s specifications include very low size effects and a very uneven wealth distribution amongst cities at initialisation (

$populationToWealth=1.12$). The residuals are distributed roughly symmetrically (

Figure 3), but without any mechanism of urban transition, the model clearly underestimates the post-1989 growth of all the rapidly growing cities of Central Asia.

**Table 2.**
Parameter values of the best performing simple model after the political transition.

**Table 2.**
Parameter values of the best performing simple model after the political transition.
Parameter Name | Value | Mechanism |
---|

economicMultiplier | 0.502616330 | Baseline |

populationToWealth | 1.124963276 | Baseline |

sizeEffectOnSupply | 1.002982515 | Baseline |

sizeEffectOnDemand | 1.000808442 | Baseline |

wealthToPopulation | 0.699943763 | Baseline |

distanceDecay | 1.475836151 | Baseline |

oilAndGazEffect | 0.017066495 | Resources |

coalEffect | −0.011792670 | Resources |

Normalized δ | n cities | Time steps |

0.005180008 | 1822 | 21 |

**Figure 3.**
Distribution of residuals for the parsimonious models. (**a**) Urban Transition model. 1959–1989; (**b**) Resource Extraction model. 1989–2010.

**Figure 3.**
Distribution of residuals for the parsimonious models. (**a**) Urban Transition model. 1959–1989; (**b**) Resource Extraction model. 1989–2010.

To summarise, situation effects and territorial effects seem to be the dominant candidates for explaining the specific part of the trajectories of cities in the Soviet era, while site effects seem to have taken over since 1991, the transition to capitalism and the rise of oil prices in the world markets. Moreover, the better fit of the latter model could be an indication of the “normalisation” of the urban processes in the post-Soviet space, compared to a more singular pattern of Soviet urbanisation.

- 2
**Which are the mechanisms (and mechanisms’ interactions) that are essential to model the Soviet and post-Soviet urbanisation patterns?** To address this question, we statistically analyse the results of the multicalibration (3200 sets of parameters, the best 50 of each model structure for each time period) to evaluate the contribution of each mechanism (everything else being equal in the model structure) to the reduction of distance between simulated and observed demographic data for each city. More specifically, we regress the distance to data

δ against the mechanism composition of the model, following Equation (

30):

with

δ the distance between simulated and observed trajectories, i one of the 3200 parameterised models considered,

$BONU{S}_{i}=0$ if the model includes the bonus mechanism,

etc...,

$PERIO{D}_{i}=0$ if the model is run between 1989 and 2010,

$PERIO{D}_{i}=1$ if the model is run between 1959 and 1989.

The estimated coefficients (b1, b2, b3, b4, b5, b6) associated with each mechanism correspond to the average contribution of the mechanism to the distance

δ when activated, everything else being equal. They are represented in

Figure 4, along with the intercept (a), that is the average

δ of the baseline model between 1989 and 2010.

**Figure 4.**
Contribution of mechanisms to a simulation that reduces the distance to observed data. N.B.: Estimated coefficients ($a=\u201c{\left(intercept\right)}^{\prime \prime},b1=\u201cBonu{s}^{\prime \prime},b2=\u201cCos{t}^{\prime \prime},b3=\u201cResource{s}^{\prime \prime},b4=\u201cRedistributio{n}^{\prime \prime},b5=\u201cTransitio{n}^{\prime \prime},b6=\u201cX1959-{1989}^{\prime \prime}$) are considered significant for p-values inferior to 0.005. A negative coefficient indicates that the mechanism contributes to reducing the distance to data, on average, when activated in the model (the mechanism is thus called “essential” to simulate the urban hierarchical evolution of the Former Soviet Union). A positive coefficient indicates that the mechanism increases the distance to data, on average, when activated in the model (and is therefore harmful to a realistic simulation of cities growth). A non-significant coefficient can indicate two things: either the mechanism is neutral in the simulation, either its contribution to reducing the distance to data depends on the presence of other mechanisms.

**Figure 4.**
Contribution of mechanisms to a simulation that reduces the distance to observed data. N.B.: Estimated coefficients ($a=\u201c{\left(intercept\right)}^{\prime \prime},b1=\u201cBonu{s}^{\prime \prime},b2=\u201cCos{t}^{\prime \prime},b3=\u201cResource{s}^{\prime \prime},b4=\u201cRedistributio{n}^{\prime \prime},b5=\u201cTransitio{n}^{\prime \prime},b6=\u201cX1959-{1989}^{\prime \prime}$) are considered significant for p-values inferior to 0.005. A negative coefficient indicates that the mechanism contributes to reducing the distance to data, on average, when activated in the model (the mechanism is thus called “essential” to simulate the urban hierarchical evolution of the Former Soviet Union). A positive coefficient indicates that the mechanism increases the distance to data, on average, when activated in the model (and is therefore harmful to a realistic simulation of cities growth). A non-significant coefficient can indicate two things: either the mechanism is neutral in the simulation, either its contribution to reducing the distance to data depends on the presence of other mechanisms.

We find that on average, the bonus, fixed costs and urban transition mechanisms tend to reduce the simulation error significantly (cf.

Figure 4). The transition and bonus mechanisms are identified to be the most effective ones. By contrast, the resource mechanism is correlated significantly with a change in the evaluation criteria but tends to increase the error when it is activated (compared with model structures without this mechanism). This counter-intuitive result might be linked to the weak influence of resource extraction for the first period of simulation, when there is a diversity of urban trajectories within resource-rich regions. The redistribution mechanisms does not appear significant on average in this analysis, as it plays a minor role in the reduction of error when activated.

Finally, we confirm the observation that our models work better to simulate the urban evolution following the dismantlement of the Soviet Union. Indeed, as shown in

Figure 4, the value of normalized

δ is greater by 0.01 point on average when the model is specified for the first period. This could be an expression of “normalisation” or simplification of the urban processes in the post-Soviet space after the political and economic transition of the 1990s.

- 3
**What are the cities that resist modelling? In other words, what are the cities that are too specific to be modelled by any of the mechanisms implemented?** To answer this last question, we statistically analyse the difference between the log of the population observed and the log of the population simulated at the last evaluation Census date for cities included in the simulation, with respect to their locational and functional attributes. The models for which we present the results below contains all the implemented mechanisms, and are applied to the two periods of enquiry.

For the two periods, we find that our models persistently and significantly under-estimate the growth of the largest and most western cities of the (Former) Soviet Union, everything else being equal (cf.

Figure 5). Moreover, capital cities appear to have grown less historically than what we can predict with a complete model of the period 1959–1989. The other urban attributes included in the regressions (natural resources and mono-functional specialisation) do not seem associated with any systematic pattern of over- or under-estimation.

The difficulty to reproduce the trajectory of the largest cities has been encountered for a comparable model of system of cities (Simpop2, see [

80]) and solved by the exogenous introduction of innovations to account for the creative features and higher probability of adoption of new technology and functions by the largest cities.

The under-estimation of growth in the western part of the territory might be due to its integration within a larger area (the Eastern Europe) during the periods under study: our hypothesis here is that the centrality of western (post-)Soviet cities would then be minored in our model because it does not take into account the interactions with east-European cities (Warsaw, Prague, Bratislava, etc.) which formed altogether an economic system (even though the integration was always stronger within the FSU).

**Figure 5.**
Profiles of residuals. Estimated coefficients are considered significant for

p-values inferior to 0.01. These graphs should be read similarly to

Figure 4: negative coefficients indicate that, on average, cities with the attribute under consideration are overestimated by the simulation, whereas positive coefficients indicate average underestimation of the population of cities with the attribute in the simulation. The attribute “log(population)” is the only quantitative one and the coefficient indicate the increase of residual value for an increase of 1 of the log of population (in thousands). Finally the intercept coefficient indicates the average residual value of cities with none of the attribute considered and a population of 1000. (

**a**) Complete model. Simulation 1959–1989; (

**b**) Complete model. Simulation 1989–2010.

**Figure 5.**
Profiles of residuals. Estimated coefficients are considered significant for

p-values inferior to 0.01. These graphs should be read similarly to

Figure 4: negative coefficients indicate that, on average, cities with the attribute under consideration are overestimated by the simulation, whereas positive coefficients indicate average underestimation of the population of cities with the attribute in the simulation. The attribute “log(population)” is the only quantitative one and the coefficient indicate the increase of residual value for an increase of 1 of the log of population (in thousands). Finally the intercept coefficient indicates the average residual value of cities with none of the attribute considered and a population of 1000. (

**a**) Complete model. Simulation 1959–1989; (

**b**) Complete model. Simulation 1989–2010.

Finally, some individual cities appear as clear outliers of the model, and could correspond to a profile that is too specific to be modelled by any generic mechanism. For the first period, (cf.

Table 3), the most obvious examples of singular trajectories are the cities which grew much faster than what was expected from their site, situation or interaction attributes. Indeed, Naberezhnye Tchelny, Volgodonsk, Toljatti or Bratsk owe their sudden development to political decisions to implement flagship projects: automobile industry mega-plants in Naberezhnye Tchelny (trucks) and Toljatti (cars), energy production sites in Volgodonsk (atomic power) and Bratsk (hydroelectric power station). These economic policies of the 1950s and 1960s led those cities to be four times as populated 30 years later than what was expected from their interactions, resource or regional characteristics.

For the second period, (cf.

Table 4), Astana is a good example of a similar singular trajectory that we would not aim to simulate with generic urbanisation mechanisms, as it owes its booming growth to the decision of the Kazakh newly independent State to locate its headquarters in this city more central to the country (compared to Almaty). On the contrary, Baikonyr, also in Kazakhstan, has suffered from the cuts in the space industry (non-predictable at the urban level of our mechanisms). Other shrinking cities like Aleksandrovsk-Sahalinsk, Krasnozavodsk or Uglegorsk would require more detailed mechanisms of demographics (lack of birth and emigration) and economic cycles to be simulated adequately.

**Table 3.**
Observed and Simulated populations of urban outliers in 1989.

**Table 3.**
Observed and Simulated populations of urban outliers in 1989.
Positive Residuals |
---|

City | Observed Pop. | Simulated Pop. |
---|

Naberezhnye Tchelny | 500,000 | 30,000 |

Volgodonsk | 191,000 | 36,000 |

Chajkovskij | 86,000 | 19,000 |

Toljatti | 685,000 | 158,000 |

Bratsk | 285,000 | 73,000 |

Balakovo | 197,000 | 52,000 |

Tihvin | 71,000 | 20,000 |

Chervonograd | 72,000 | 21,000 |

Obninsk | 111,000 | 32,000 |

Staryjoskol | 174,000 | 53,000 |

**Negative Residuals** |

**City** | **Observed Pop.** | **Simulated Pop.** |

Zaozernyj | 16,000 | 54,000 |

Gremjachnsk | 21,000 | 56,000 |

Atakent/Ilitch | 15,000 | 38,000 |

Kizel | 37,000 | 88,000 |

Cheremhovo | 74,000 | 172,000 |

Ilanskij | 18,000 | 42,000 |

Gornoaltajsk | 46,000 | 102,000 |

Volchansk | 15,000 | 32,000 |

Zujevka | 16,000 | 35,000 |

Taldykorgai | 138,000 | 296,000 |

**Table 4.**
Observed and Simulated populations of urban outliers in 2010.

**Table 4.**
Observed and Simulated populations of urban outliers in 2010.
Positive Residuals |
---|

City | Observed Pop. | Simulated Pop. |
---|

Mirnyja | 41,000 | 12,000 |

Sertolovo | 48,000 | 16,000 |

Beineu | 32,000 | 11,000 |

Govurdak | 76,000 | 28,000 |

Serdar/Gyzylarbat | 98,000 | 37,000 |

Bayramaly | 131,000 | 53,000 |

Sarov | 92,000 | 39,000 |

Turkmenabat/Tchardjou | 427,000 | 185,000 |

Astana/Tselinograd | 613,000 | 278,000 |

Dashougouz | 275,000 | 126,000 |

Sovetabad | 11,000 | 33,000 |

Zhanatas | 21,000 | 50,000 |

Krasnozavodsk | 13,000 | 31,000 |

Gagra | 11,000 | 25,000 |

Nevelsk | 12,000 | 26,000 |

Arkalyk | 28,000 | 59,000 |

Chyatura | 14,000 | 28,000 |

Aleksandrovsk Sahalinsk | 11,000 | 21,000 |

Uglegorsk | 10,000 | 20,000 |

Baikonyr | 36,000 | 67,000 |

To summarise, there are particular types of urban trajectories that are not simulated well by the model because of its simplicity, and trajectories that are too specific to be modelled. We find that the exploration of our models, their calibration and the analysis of residuals has helped to identify those cities and to suggest some missing mechanisms.