A modular modelling framework for hypotheses testing in the simulation of urbanisation

In this paper, we present a modelling experiment developed to study systems of cities and processes of urbanisation in large territories over long time spans. Building on geographical theories of urban evolution, we rely on agent-based models to 1/ formalise complementary and alternative hypotheses of urbanisation and 2/ explore their ability to simulate observed patterns in a virtual laboratory. The paper is therefore divided into two sections : an overview of the mechanisms implemented to represent competing hypotheses used to simulate urban evolution; and an evaluation of the resulting model structures in their ability to simulate - efficiently and parsimoniously - a system of cities (the Former Soviet Union) over several periods of time (before and after the crash of the USSR). We do so using a modular framework of model-building and evolutionary algorithms for the calibration of several model structures. This project aims at tackling equifinality in systems dynamics by confronting different mechanisms with similar evaluation criteria. It enables the identification of the best-performing models with respect to the chosen criteria by scanning automatically the parameter along with the space of model structures (as combinations of modelled dynamics).


I. Introduction
Simulation models of urban systems were first developed in the 1950s and 1960s as a way to understand the complexity of cities and to forecast trends and consequences of planning policies. Several formalisms (thermodynamics, general systems theory, synergetic, microsimulation) were used [Batty, 2008], following fashions as well as opportunities arisen from access to new technologies [Heppenstall et al., 2012, Pumain andSanders, 2013]. Each of the methods cited have their specific advantages and drawbacks, that we will not discuss here, but all of them provide a same opportunity and challenge that is linked to simulation and has not changed as the formalisms evolved. The opportunity that we focus on in this paper is the function of virtual laboratory that a computerised simulation model enables [Carley, 1999], which is of paramount interest for human and social sciences in which in vivo experiments are impossible or difficult. By allowing to implement competing and/or complementary hypotheses into generative mechanisms 1 in a model testable against empirical data, this function of virtual laboratory makes the model a framework for the evaluation of the plausibility of different theories. However, simulation as a method and an epistemological way of testing theories gives way to a limitation known since Bertalanffy [von Bertalanffy, 1968] as equifinality. It describes the fact that even when a model is performing well, one cannot infer that the underlying combination of mechanisms is the one operating "in real life", because several models can lead to the same results (many-to-one), and the same process can lead to several patterns (one-to-many) [O'Sullivan, 2004]. The problem of adequacy assessment of the model with real life (also known as ontological adequacy testing, cf. [Rossiter et al., 2010]) is twofold. First, many effective (or "real") processes can lead to the same observed situation within the target system that we aim to model. Second, inadequate mechanisms implemented in the model can simulate in a satisfying way the situation under study. This causal challenge of identification of the generative mechanisms is a recurrent problem in social simulation for explanatory purposes [Grüne-Yanoff, 2009, Elsenbroich, 2012, but one that is usually overlooked at the stage of results' analysis.
We have tried in a previous study [Chérel et al., 2015] to tackle the one-to-many part of the challenge within the model, using an evolutionary algorithm to look for the maximum diversity in patterns produced by a given set of mechanisms in a model of systems of cities. What we present here relies to the multiplicity of possible causes leading to the single historical trajectory observed empirically. We present a multimodelling framework that allows to combine different mechanisms into a modular model evaluated against a unique set of evaluation criteria, enabling for the comparison of the performance of different model structures to simulate urbanisation and the evolution of a system of cities.
More precisely, we are interested in simulating the co-evolution of cities encompassed in a territorial system (typically a nation or a continent), and to reproduce the regular patterns or organised structures that are observed empirically in various systems of cities: their hierarchy, spacing and functional differentiation [Pumain, 1997]. Section II presents the patterns we aim to reproduce and the catalogue of theories and mechanisms on which we draw to compose the model. Section III describes the multi-model and its implementation in our study case, as well as its exploration through multicalibration. This exploration aims to explore the performance of different hypotheses in explaining the evolution of Soviet and post-Soviet cities. Section IV concludes on the study case and the method.

II. A catalogue of possible mechanisms of urbanisation
"It may also be useful to think of complex geographical models as extensions of thought experiments, where the necessary and contingent implications of theories can be examined. Further, admitting that 'all models are wrong' is akin to the realisation in post-structural social science that multiple competing accounts of the same settings are possible, and that faced with a diversity of accounts the context and intent of each must be an important element in the evaluation process." [O'Sullivan, 2004, p.291] A large bunch of theories in geography, economics and natural sciences have tried to provide an account of the regularity of urbanisation processes and the structuration of a system of cities, through models and narratives. Keeping in mind the equifinality challenge, this means that we already have a strong theoretical basis and several causal model candidates to confront with empirical regularities. Without being exhaustive on these theories, we try to provide an overview of the mechanisms that have been proposed in the literature. We then expose the kind of results that are achieved by statistical models using empirical data, and present the particularity of our study case.

II.1. Competing theories
Out of the three stylised facts describing the structure of a system of city: the hierarchy of cities by size, their regular spacing and the functional differentiation [Pumain, 1997], the former has fostered the larger body of research, and will be our main criterion for stating the performance of a simulation model, whereas the latter two remain hard to formalise and to compare over time and space with respect to the availability of urban information at a local level. Consequently, we present in larger details the competing theories aiming at explaining the regularity of city size distribution and its evolution over time, and quickly review theories of location and functional specialisation.
The hierarchical organisation of city sizes represents a "mystery" [Krugman, 1996c] that has intrigued many researchers, because of its regularity and simplicity of description (the ranksize "rule") despite the complexity of urban functioning and interactions (for reviews, cf. [Cheshire, 1999, Pumain, 2006). [Auerbach, 1913, Lotka, 1925, Singer, 1936 are known to be the first to formalise this regularity, leading to the famous rule that the size of a city is a power law function of its rank in the hierarchy, a rule particularized by [Zipf, 1949] arguing that the Pareto exponent is expected to be -1. Alternative mathematical descriptions have been proposed, for instance the lognormal distribution [Gibrat, 1931, Berry, 1961, Eeckhout, 2004. The two distributions however are close and tend to coincide in the middle-upper part of the hierarchy [Mitzenmacher, 2004, Clauset et al., 2009].
Because of this regularity, generative models of power laws and lognormal distributions have logically been considered as candidate explanations [Mitzenmacher, 2004, Clauset et al., 2009. The most famous one is Gibrat's law [Gibrat, 1931], which generates a lognormal distribution by a process of "proportional effect". This model of multiplicative growth allocates randomly growth rates 2 from independent distributions, at each short time step, to cities independently from their size. It is thus considered close to a random walk. It results in amplifications of urban growth and decline lead-ing to a lognormal distribution. The Simon model of preferential attachment [Simon, 1955] applied to cities generates power laws of city sizes by simulating an incremental creation of new urban blocks attached to existing urban clusters with a probability dependent on their size [Krugman, 1996a]. Those two models are elegantly simple enough to generate hierarchical distributions, but they lack an explaining power to help understand urbanisation processes: "Scaling laws often are viewed as overidentified: they can be generated by a wide range of distinct models. It is essential to select specifications that integrate in the model a significant part of the existing knowledge about towns and cities" [Pumain, 2004, p. 1]. Some attempts were made to characterise them in terms of exogenous shocks and contingent local policies (for example in Gabaix's model [Gabaix, 1999]), but random growth models seem to miss the causal power of mechanism-based simulation models [Hedström and Ylikoski, 2010].
Other accounts of urban growth models leading to the observed stylized hierarchical structure and evolution (deterministic in Dimou and Schaffar's typology [Dimou and Schaffar, 2011], by opposition to random models) range into two broad categories: equilibrium models of externalities (such as [Henderson, 1974, Krugman, 1996b) and evolutionary models of spatial interactions (such as [Allen and Sanglier, 1981, Weidlich and Haag, 1983, Bura et al., 1996). The former type of models formalizes a balance of centripetal (sharing, matching, learning in the labour market for example) and centrifugal forces (pollution and congestion) resulting in optimal sizes for cities given the current technology. The latter type of models rely on attractivity and spatial interactions of cities to explain the dynamic of competition and cooperation resulting in a regular hierarchy in the urban system.
Location theories of cities typically build on two concepts to explain the regular spacing and empir-ical distribution of cities in space: site and/or situation advantages [Christaller, 1933, Ullman, 1941, Krugman, 1996b, Pumain, 1997. Site advantages refer to natural features available at an absolute location (natural resources, harbour conditions, etc.), whereas situation advantages refer to centrality in a transportation network or an interface relative position for example. All the theories mentioned above rely necessarily on relations with (consistent) territories providing resources or distance constraints for instance, in order to explain urban coevolution.
Theories of urban specialisation inherit from trade theories (via comparative and competitive advantages) and theories of product and innovation cycles. They state that cities have different features (size, situation, site, former specialisation) that make them akin to be more competitive to specialise in one economic sector in comparison with other cities, or to adopt an innovation sooner or later than their neighbours. Depending on the current cycle of the product, those specialisations affect not only the sectoral composition of the active population and its economic output, but also confers an advantage to early adopters and defines further options for specialisation and growth [Duranton andPuga, 2001, Pumain et al., 2006].
Finally, political factors and singular policies are usually considered necessary to explain differentiated urban growth (through administrative functions or investment policies targeting specific cities at a given moment in time).
We compose our catalogue with mechanisms that formalize the theories cited above. They fall into five broad classes of generative mechanisms potentially accounting for the emergence of a structured system of cities: • Spatial Interactions and diffusion allow for the exchange of information, money, goods and people. It thus makes cities co-evolve over time and adapt collectively to changing economic and innovation cycles through competition and cooperation, resulting in some complementarity of their specialisation. These local interactions and their consequences on the regular organisation of the system as a whole under spatial constraints could be thought of as "complex systems effects".
• Size effects like agglomeration economies and urbanisation externalities illustrate a very direct and self-reinforcing cause for hierarchical differentiation.
• Site effects explain the spatial location of growth processes around resource-rich areas for the related innovation cycle.
• Situation effects illustrate the importance of the neighbouring relational environment (potential field, network accessibility, etc.) on a city's pattern of growth.
• Territorial effects account for some exogenous (policy) shocks and the solidarity of urban trajectories in a common political space (through redistributive processes for example).
We look for statistical evidences of these factors of urbanisation in the empirical literature before turning to our case study experiment, where we propose an implementation and evaluate the power of each of the five mechanisms into an agent-based (multi-) model.

II.2. Empirical Results from the Literature
Spatial Interactions are tricky to measure because of the variety and non-commensurability of flows circulating between cities at various temporalities. Until recently, the diffusion of innovations (agricultural techniques [Hägerstrand, 1968], telephone lines [Robson, 1973] or newspapers [Pred, 1973]) served as a proxy for these interactions. Since the development of various volumes of high velocity data, actual interactions (like phone calls [Krings et al., 2009]) have confirmed for example the relevance of the gravity model to describe inter-city interactions.
Size effects in urban growth and differentiation were revealed by a persistent empirical cor-relation between growth rates and city sizes over long periods of time. All over the 19th century, [Robson, 1973, p. 79] measured a positive coefficient between the log of English and Welsh cities' population and their ten-year growth rates (from a minimum of +1.47 between 1861 and 1871 to a maximum of +8.53 between 1821 and 1831). This correlation is found for French cities as well [Guérin-Pace, 1995]. The size effect finally relates to the stability of the rank position of large cities (by comparison with the fluctuations of smaller cities).
Site effects were classically approached by estimating the surplus of growth associated with a localised resource (typically, deposits of natural materials, such as coal or gas). In the Soviet urban system of the 1920s-1930s, the location of a city on a coal deposit was associated with a surplus of 1.15 points in percentage of average demographic growth per annum, everything else being equal. A surplus over 0.5 point is observed nowadays  for oil and gas deposits [Cottineau, 2014a]. In the USA, [Black and Henderson, 2003] estimate that the coastal location (e.g. a resource for tourism) is associated with a significantly higher ten-year growth rates of 3 to 5 points.
Situation effects can be revealed by the spatial autocorrelation of growth or the coevolution between transportation networks and urban networks. In the first case, [Hernando et al., 2015] found a characteristic distance for spatial autocorrelation of growth rates of 215 km for American counties and of 80 km for Spanish cities. As for transportation dynamics, [Bretagnolle, 2003] measured the correlation between accessibility and growth rates for French cities in the last two centuries. She finds that cities that were weakly connected (by any transportation network: road, rail or air) in 1900 and stayed isolated in 2002 grew slower (0.94% on average per annum between 1900 and 2002) than cities that became motorway nodes (1.21%) or multimodal hubs (1.69%). Likewise, well connected cities at the beginning of the period tended to grow faster than the first category.
Territorial effects can be approached empirically by relating political statuses to dynamics of growth. In developing countries, regional capitals were found to grow significantly faster by 0.5 to 1 point of annual average growth rate in the 1960s [Preston, 1979] and the 1990s [Brockerhoff, 1999]. In the Former Soviet Union, the regional status of capital has proven important to predict urban growth [Harris, 1970], the coefficient regressed against growth rates over time ranges from +0.24 point between 1989+0.24 point between and 2002+0.24 point between to +1.88 between 1926+0.24 point between and 1939+0.24 point between [Cottineau, 2014a. Besides, cities that belong to the same territory have shown an increased pattern of synchronicity in their growth and decline trajectories from the 1980s on, suggesting evidence of both political shocks and territorial solidarity in the spatial distribution of urban growth.
Despite the corroboration of theories provided by the numbers cited above, statistical correlations do not prove any causal chain and fail to explain the processes at work. Moreover, they are not adapted to model dynamic, non-linear and complex evolutions and interactions. This is a recurrent problem that has led social scientists to promote a combination of statistical methods with generative (simulation) modelling [Byrne, 1998, Goldthorpe, 2001].

II.3. A Case Study : the Former Soviet Union
We applied such a mixed methodology to the case study of the urban Soviet Union of the last 50 years [Cottineau, 2014a. This system is mainly interesting to us because it has the reputation of having been a controlled economic, political and social system aiming at reaching equalisation (of city size for example) and yet, while observing its evolution with generic urban models, the urban trends appear very classic. By this we mean for instance that the evolution of the percentage of urban population follows a very classical logistic function of urban transition, that the rank-size slope is close to the value expected with respect to the time of settlement of the different parts of the territory, and that it has increased over time, indicating an increase of in-equality of city sizes (despite anti-urban political discourses [Cottineau, 2014a]). The monographic explanation of urban growth in the Former Soviet Union has been dominant in urban geography, and we argue that it might have hidden very generic processes of urbanisation in this system of cities [Cottineau and Slepukhina, forthcoming].
Our modelling experiment aims at testing the genericity of Soviet and post-Soviet urbanisation by simulating generic mechanisms and testing them against harmonised historical data. The database used in this study consists of open demographic, territorial and site informations on 1929 agglomerations of the Former Soviet Union between 1840 and 2010 [Cottineau, 2014b].
We analyse and compare the explaining power of generic mechanisms in a simulation of the period before  and the period after  the dislocation of the Soviet Union. The agentbased model framework of this experiment is called MARIUS 3 . In this model, cities interact as collective agents [Bura et al., 1996] and we evaluate every simulations with three evaluation criteria. The first two criteria are boolean indicators of the "realism" of the microdynamics of the simulation : we only analyse simulations in which the number of cities with a nil wealth and the number of cities producing more output in one year than their stock of wealth is 0. Once these controlling criteria are met, we try to minimise the distance δ between the simulated population P s and the observed population P o of each city i at each date t we have census information for, as stated in equation (1).
This criterion ensures that a "perfect" model (achieved for δ = 0) would predict the right amount of urban growth and its exact location over time. In order to compare different time periods, we normalise δ by the number of cities and the number of censuses used in a given period.

III. Modular multimodelling experiment
The incentive to implement competing and complementary theories into different models evaluated against one another is a recurrent plea in the simulation literature [Openshaw, 1983, O'Sullivan, 2004, Batty and Torrens, 2005, Thiele and Grimm, 2015, Batty, 2015. It reveals how tricky its implementation and automatic evaluation might be, besides the epistemological challenge of equifinality and the kind of conclusions one can draw from this confrontation. Indeed, thirty years after the "Automated Modeling System to Explore a Universe of Spatial Interaction Models" by [Openshaw, 1983], there are no standard tools nor formal methodology for theory testing with simulation models. Indeed, Openshaw's automated way to explore model structures, being a pure optimisation way of discovering model structures, is impressive methodologically but it does not suit our goal of theory testing in a virtual laboratory, because it can result in optimal models that are impossible to interpret.
Instead, we think that the first step should be to gather a catalogue of theoretical processes and mechanistic hypotheses working as potential explanations. This usually precedes the mixed-modelling step [Contractor et al., 2000, Auchincloss et al., 2008 and prevents from endlessly building models "from scratch" [Thiele and Grimm, 2015]; it allows to build on previous work and experience.
In a second step, we have built a baseline model of urban interactions and implemented hypotheses as blocks of mechanisms that can be activated or discarded to compose a family of simulation models (section III.1). By automatically calibrating all model structures against a similar evaluation objective (section III.2), we were able to compare the power of several groups of hypotheses in the case of the Soviet and post-Soviet urbanisation (section III.3).

III.1. Implementing mechanisms as building blocks
The multi-model is composed of the baseline model and modules of mechanisms that override the sequence of agents' rules when they are activated. In the following sections, we detail the baseline model and the modular blocks of mechanisms that are added incrementally to the original equations. For further informations, the baseline model and the first two additional mechanisms were described and evaluated in details in [Cottineau et al., 2015].

III.1.1 The Baseline Model
The baseline model relies on the assumption that population and wealth are the basic descriptors of cities and the engine of their coevolution. It therefore models exclusively size effects of population on wealth and spatial interactions.
At initialization, each city i of the Former Soviet Union is setup with its historical population P i at the beginning date of simulation, and located at its empirical coordinates, enabling site, situation and distance to play in the same geometry as in the target system. An estimated value of wealth W i (expressed in a fictive unit) is determined for each city with respect to its size 4 , following equation (2).
Time is modelled as discrete steps, each of which represents a time period of one year, during which interactions occur in a synchronous way. At each step: Each city i updates its economic variables: a global supply S i (4) and a global demand D i (7), according to its population P i and three parame-ters (economicMultiplier, sizeE f f ectOnSupply and sizeE f f ectOnDemand).
Each city interacts with other cities according to the intensity of their potential IP (9). For two distinct cities i and j, the computation of the interaction potential IP ij consists in confronting the supply of i (11) to the demand of j with an equation borrowed to the gravity model (12).
with d ij a measure of distance between i and j.
Interactions of cities i and j based on their potential IP ij result in a transaction T ij from i to j (13).
Each city updates its wealth W i according to the results of its transactions T (unsold supply US i (15) and unsatisfied demand UD i (16)) in which it was committed (14).
A simulation step ends when each city updates its population according to its new resulting wealth (17) :

III.1.2 Mechanism Increments
The mechanism that accounts best for interactions benefits at the intercity level is the one called bonus. It "[...] features a non-zero sum game [...], rewarding cities who effectively interact with others rather than internally. We assume that the exchange of any unit of value is more profitable when it is done with another city, because of the potential spillovers of technology and information." [Cottineau et al., 2015] This bonus B i depends on the volume of transactions and the diversity of partners J i the city i has exchanged with (18). It is added to the wealth at the end of each step (19), following equation (14): n being the total number of cities in the system (i.e. 1145 for a simulation beginning in 1959, 1822 for a simulation beginning in 1989), and J i the number of partners with which i has engaged in the current simulation step.
A mechanism related to situation advantages is called fixed costs. It ensures that the situation of each city in the system is taken into account in its interactions with other cities.
"Every interurban exchange generates a fixed cost (the value of which is described by the free parameter f ixedCost). This implies two features that make the model more realistic: first, no exchange will take place between two cities if the potential transacted value is under a certain threshold ; second, cities will select only profitable partners and not exchange with every other cities. This mechanism plays the role of a condition before the exchange." [Cottineau et al., 2015] The interaction potential between city i and city j will be positive only if the potential value that i is willing to sell to j is superior to the fixed value it costs it to negotiate, prepare and send the transaction (20): Therefore, each transaction of a city i gives way to a fixed cost. Their sum is subtracted from the wealth of city i at the end of each step (21), following equation (14): Site effects are targeted by the resource mechanism: site advantages are particularised in this model by natural resource deposits (more specifically: coal deposits C on the one hand, and oil and gas deposits O on the other hand). The assumption is made that if the city i is located on some coal or oil deposits (C i = 1 or O i = 1), the city benefits from the advantage granted by the extraction activity. The capacity of extraction depends on the capital (wealth) of the city and takes the form of a wealth multiplier for each resource (22) after equation (14): Territorial and political effects are formalised by the redistribution mechanism. It allows for a redistribution of wealth between cities of the same territory R (region or State). To do so, territorial taxes tt k are collected in each city k R , as a proportion territorialTaxes of their wealth. The total amount of taxes collected is TT R (23): From this taxes, the administrative status of the territory R (denoted by CC i,R , set to 1 if i is the capital city of the region and 0 otherwise) allows the capital city to take a share CS for its administration needs (24): The rest of the taxes is redistributed to cities of region. Each city i R receives a share tr i,R that is proportionate to its population (25): The balance of the territorial redistribution is added to the wealth of a city (26) after equation (14): Finally, territorial and situation explanations are mixed in the urban transition mechanism. To account for the different opportunities of cities to attract rural migrants in the different regions, we model the evolution of the urban transition curves over time. As shown empirically [Cottineau, 2014a], 100 out of the 108 regions of the Former Soviet Union have followed the scheme of the urban transition. It means that their urbanisation rate U R (in %) has followed a logistic function over time t (27): We have estimated the parameter urbanisationSpeed on empirical data and normalised the time units in order to position every region in a single urbanisation curve with respect to their time lags based on their urbanisation rate. The consequence of this initialisation process is that each region will move one step further on the urbanisation curve (leading eventually to 100%) at each simulation step, but that the rural potential to migrate in cities will depend on its current position on the urban transition curve (high potential for weakly urbanised regions, small potential for regions already very urban). The migration potential of each city i in territory R is built as a multiplier TM R specific to each region for each time step (28): This extra population growth is added (29) after (17) and the new urbanisation rates of regions are updated for t + 1 (27): Because we want to evaluate the contribution of each theoretical mechanism to the simulation of urbanisation and its itneraction with other mechanisms, we need a model that enables modules to be activated and de-activated. The technical implementation of MARIUS permits the modular functioning of the model, i.e. with or without any of its building blocks.

III.1.3 Technical modular implementation
In order to achieve a modular implementation of the model and its mechanisms, we leveraged the mixin-methods system of the Scala programming language [Schärli et al., 2003]. The mixins were first proposed at the beginning of the 90s in the Jigsaw programming language [Bracha, 1992]. They are now adopted in mainstream languages such as Scala. It has been established as a powerful way of to perform type-safe dependency injection framework [Wampler and Payne, 2009], which is a powerful paradigm to achieve modularity. The mixin pattern allows to achieve type safe dependency injection. Mixins are therefore a suited tool to achieve modularity in model implementations, which we use to implement MARIUS in Scala language 5 .
It makes the definition of class hierarchy more flexible [Steyaert and Meuter, 1995]. Variations of isolated features (such as the different update functions of city wealth and population in our model) are defined in separate modules and mixed-in with each other at the object instantiation point, also called mixin application. The advantage of mixins (or trait) over classical object-oriented programming is the possibility of defining numerous variations of several features without increasing exponentially the number of specialized class.
For instance, let's consider a class C implementing two methods a and b 6 . To define alternative implementations of those methods in the classical object-oriented paradigm, one would implement subclasses of C and override the implementations of a and b in each subclass. For instance in the listing 1 the class C1 specialises C and defines implementations for the methods a and b.
Listing 1: Object oriented specialisation  In this experiment we defined each alternative model mechanism in a particular trait. The exe-cutable model has then be composed by picking the traits we wanted to test. In order to evaluate concurrently all the alternative mechanisms, we generated all the possible models (or combination of mechanisms). It was achieved using a code generation algorithm which produces all the possible models implementations by generating a Scala source code containing all the possible traits combinations, such as the one shown on Listing 3. This generated source code implements a single function encapsulating all the model alternatives. This function takes two arguments: an index of the implementation that shall be executed and a vector of parameters to set for the model (for this work we calibrated only double precision floating points values). Note that the vector has a fixed size which does not depends on the model instantiated. A given model implementation generally does not use all the parameters: instead it will use only some of them and ignore the others.

III.2. Calibrating a multi-model
"Whatever changes occur in the institutional, political and social context of computational models, the question of how to learn from models remains. It is clear that assessment of the accuracy of a model as a representation must rest on argument about how competing theories are represented in its workings, with calibration and fitting procedures acting as a check on reasoning." [O'Sullivan, 2004, p.291] In order to evaluate the capacity of the models to reproduce the historical trajectory of urbanisation in the Former Soviet Union, we rely on an automated calibration. This procedure is part of the model evaluation [Grimm and Railsback, 2012, Thiele et al., 2014, Schmitt et al., 2015. Its aim is to find the values of parameters for which the model results match the fitness criteria (here: δ, the lowest possible distance to the data, the realistic criteria having been met). If the model can be calibrated, then it is shown that the mechanisms included in the model are sufficient to simulate the urban trajectory (not necessary though). If there are no parameter value for which the fitness criteria is met, then the combination of mechanisms is not sufficient to model urbanisation under the current implementation.
In order to calibrate all the models at once, we designed a variant of the NSGAII genetic algorithm that includes a niching mechanism [Mahfoud, 1995]. Niching methods aim at preserving suboptimal solutions to preserve diversity. Our niching algorithm divided the population into sub-populations, each sub-population containing one model alternative. The genome of each individual 7 contains two parts. The first part is an integer value that corresponds to the index of the model alternative on which the genome is evaluated. The second part is a vector of double values containing the values of all the parameters for the model.
In order to evaluate a genome, we designed a fitness function. This function calls the generated function described above, runs the model and evaluates its dynamics using the fitness function described in section II.3 (i.e. the criteria of realism of micro-dynamics and the distance between simulated and observed population data for each city at each census date).
In NGSAII, the elitism operation preserves the best individuals among the whole population. The evaluation algorithm we applied has the exact same elitism strategy for each sub-population. No global elitism strategy was performed, instead we kept the 50 best-performing individuals in each subpopulation (or model combination of mechanisms). In order to speed up the convergence of the algorithm, we also tweaked the mutation operation: it had a 10% chance of mutating the "model index" part of the genome. The new value for the "model index" was drawn uniformly among the possible model indexes. This allowed to periodically test on other models, some parameter values that were performing well on a given model.
We then distributed this algorithm on the European grid EGI using the technique known as the "island model" in the same way as we described in [Schmitt et al., 2015]. We ran 200 000 jobs of 2 hours.

III.3. Hypothesis testing to explain urbanisation in the Former Soviet Union
After the multicalibration procedure, we end up with the best performing simulations of 64 different structures of models 8 . The analysis of calibration results consists in relating the performance of these calibrated models (in terms of distance between simulated and observed growth hierarchically and spatially) to their structure and the values of parameters associated with each activated mechanism. This analysis was made using an interactive application available online: http://shiny.parisgeo.cnrs.fr/VARIUS.
We present the results of this exploration in the form of three questions at the macro, meso and micro scale of the city-system. 1/ Which is the most parsimonious model to simulate the evolution of cities before and after the collapse of the Soviet Union? For simulations starting in 1959For simulations starting in up to 1989 , the best model with only one additional mechanism is composed of the baseline model plus the mechanism of urban transition (cf. Table 1).
It is characterised by significant economies of agglomeration (sizeE f f ectOnSupply = 1.05) but a linear function of demand with size. The rural multiplier is equal to 3.5% and allows to simulate fast urbanising regions of Siberia and Central Asia (cf. figure 1). However, the population of a majority of cities is under-estimated in the simulation, especially in the upper part of the hierarchy, around Moscow and in eastern Ukraine 10 . After the transition, the best performing model with one additional mechanism includes site advantages for cities. This model has two additional parameters and meets the evaluation criteria (per city per census) twice better than the best model for the previous period (0.005 vs. 0.01, cf. tables 1 and 2).
The analysis of the parameters fits the empirical observations of faster growing cities located on oil and gas deposits (oil AndGasE f f ect = 0.02), and declining cities in the Donbass and Kuzbass coal regions (coalE f f ect = −0.01). This model's specifications include very low size effects and a very uneven wealth distribution amongst cities at initialisation (populationToWealth = 1.12). The residuals are distributed roughly symmetrically (figure 1), but without any mechanism of urban transition, the model clearly underestimates the post-1989 growth of all the rapidly growing cities of Central Asia. To summarise, situation effects and territorial effects seem to be the dominant candidates for explaining the specific part of the trajectories of cities in the Soviet era, while site effects seem to have taken over since 1991, the transition to capitalism and the rise of oil prices in the world markets. Moreover, the better fit of the latter model could be an indication of the "normalization" of the urban processes in the post-Soviet space, compared to a more singular pattern of Soviet urbanisation.

2/ Which are the mechanisms (and mechanisms' interactions) that are essential to model the Soviet and post-Soviet urbanisation patterns?
We statistically analyse the results of the multicalibration to evaluate the contribution of each mechanism (everything else being equal in the model structure) to the reduction of distance between simulated and observed demographic data for each city.  Simulation 1959-1989| Resource Extraction model. Simulation 1989-2010Urban Transition model 1959-1989  We find that on average, the bonus, fixed costs and urban transition mechanisms tend to reduce the simulation error significantly (cf. figure 2). The transition and bonus mechanisms are identified to be the most effective ones. By contrast, the resource mechanism is correlated significantly with a change in the evaluation criteria, but tends to increase the error when it is activated (compared with model structures without this mechanism). This counter-intuitive result might be linked to the weak influence of resource extraction for the first period of simulation, when there is a diversity of urban trajectories within resource-rich regions. The redistribution mechanisms does not appear significant on average in this analysis, as it plays a minor role in the reduction of error when activated.
Finally, we confirm the observation that our models work better to simulate the urban evolution following the dismantlement of the Soviet Union. Indeed, as shown in figure 2, the value of normalized δ is greater by 0.01 point on average when the model is specified for the first period. This could be an expression of "normalisation" or simplification of the urban processes in the post-Soviet space after the political and economic transition of the 1990s.

3/ What are the cities that resist modelling? In other words, what are the cities that are too specific to be modelled by any of the mechanisms implemented?
To answer this last question, we statistically analyse the difference between the log of the population observed and the log of the population simulated at the last evaluation Census date for cities included in the simulation, with respect to their locational and functional attributes. The models for which we present the results below contains all the implemented mechanisms, and are applied to the two periods of enquiry.
For the two periods, we find that our models persistently and significantly under-estimate the growth of the largest and most western cities of the (Former) Soviet Union, everything else being equal (cf. figure 3). Moreover, capital cities appear to have grown less historically than what we can predict with a complete model of the period 1959-1989. The other urban attributes included in the regressions (natural resources and mono-functional specialisation) do not seem associated with any systematic pattern of over-or under-estimation.
The difficulty to reproduce the trajectory of the largest cities has been encountered for a comparable model of system of cities (Simpop2, see [Bretagnolle and Pumain, 2010]) and solved by the exogenous introduction of innovations to account for the creative features and higher probability of adoption of new technology and functions by the largest cities.
The under-estimation of growth in the western part of the territory might be due to its integration within a larger area (the Eastern Europe) during the periods under study: our hypothesis here is that the centrality of western (post-)Soviet cities would then be minored in our model because it does not take into account the interactions with east-european cities (Warsaw, Prague, Bratislava, etc.) which formed altogether an economic system (even though the integration was always stronger within the FSU).   Finally, some individual cities appear as clear outlyers of the model, and could correspond to a profile that is too specific to be modelled by any generic mechanism. For the first period, (cf. table 3), the most obvious examples of singular trajectories are the cities which grew much faster than what was expected from their site, situation or interaction attributes. Indeed, Naberezhnye Tchelny, Volgodonsk, Toljatti or Bratsk owe their sudden development to political decisions to implement flagship projects: automobile industry mega-plants in Naberezhnye Tchelny (trucks) and Toljatti (cars), energy production sites in Volgodonsk (atomic power) and Bratsk (hydroelectric power station). These economic policies of the 1950s and 1960s led those cities to be four times as populated 30 years lates than what was expected from their interactions, resource or regional characteristics.
For the second period, (cf. table 4), Astana is a good example of a similar singular trajectory that we would not aim to simulate with generic urbanisation mechanisms, as it owes its booming growth to the decision of the Kazakh newly independent State to locate its headquarters in this city more central to the country (compared to Almaty). On the contrary, Baikonyr, also in Kazakhstan, has suffered from the cuts in the space industry (non-predictable at the urban level of our mechanisms). Other shrinking cities like Aleksandrovsk-Sahalinsk, Krasnozavodsk or Uglegorsk would require more detailed mechanisms of demographics (lack of birth and emigration) and economic cycles to be simulated adequately.
To summarise, there are particular types of urban trajectories that are not simulated well by the model because of its simplicity, and trajectories that are too specific to be modelled. We find that the exploration of our models, their calibration and the analysis of residuals has helped to identify those cities and to suggest some missing mechanisms.

IV. Conclusion
"Despite the fact that the experience of individual cities has become more varied internationally (at least within what might be called the mature economies) there is stronger evidence of a predictable pattern of change, determined by common causal factors, than might be expected given the diversity and variety of cities." [Cheshire, 1999[Cheshire, , p. 1342 Systems of cities have attracted a lot of attention from social modellers because of the regularity of their patterns. Instead of regarding the profusion of competing theories as a source of confusion (or suspicion), we proposed a framework to integrate complementary accounts of urbanisation processes into a modular agent-based model. This work of synthesis and testing within a virtual laboratory has been made possible by its automation and the extensive use of computation resources to calibration sixty-four models structures with empirical data on almost two thousands cities in the Former Soviet Union over the last fifty years. The model provides a basis for comparison of the different theories that can be augmented by new or alternative implementations of mechanisms. It could also be applied to different systems of cities (in space or time).
In the present study, we showed that the multimodelling approach has helped identify and order the mechanisms that most probably generated the urban pattern or Soviet and post-Soviet urbanisation: situation effects of urban transition before 1991 and resource extraction afterwards. The intuitive mechanism of redistribution however has proved unsignificant. This method was finally useful to spot the most singular trajectories of cities that we interpret with empirical and monographic knowledge to refine the model.
To conclude, modelling experiments perform a radical compression of reality and an extreme simplification of individual trajectories, events and persons into synthetic aggregates. The ontological adequacy of the model to real life is therefore necessarily evaluated at an aggregated level that creates the problem of equifinality. No simulation model can elude this question, but everything should be tried to reduce its impact on what can be learned from the modelling experience.