Population Synthesis Handling Three Geographical Resolutions

: In this paper, we develop a synthetic population as the ﬁrst step in implementing an integrated land use/transport model. The model is agent-based, where every household, person, dwelling, and job is treated as an individual object. Therefore, detailed socioeconomic and demographic attributes are required to support the model. The Iterative Proportional Updating (IPU) procedure is selected for the optimization phase. The original IPU algorithm has been improved to handle three geographical resolutions simultaneously with very little computational time. For the allocation phase, we use Monte Carlo sampling. We applied our approach to the greater Munich metropolitan area. Based on the available data in the control totals and microdata, we selected 47 attributes at the municipality level, 13 attributes at the county level, and 14 additional attributes at the borough level for the city of Munich. Attributes are aggregated at the household, dwelling, and person level. The algorithm is able to synthesize 4.5 million persons in 2.1 million households in less than 1.5 h. Directions regarding how to handle multiple geographical resolutions and how to balance the amount and order of attributes to avoid overﬁtting are presented.


Introduction
Synthetic populations are used in transportation modeling when individual records of households and persons are not available due to privacy reasons, insufficient resolution, or missing attributes. Within the context of transportation modelling, population synthesis is the process of creating a representation of a complete, disaggregate population by combining a sample of disaggregate members of a population in a way as to match key distributions for the entire population [1]. Key distributions-also known as control attributes-can be at the household level, such as household size, at the person level, such as gender, age or employment status, or at the dwelling level, such as construction year or living space. Moreover, control attributes can be aggregated at different geographical resolutions, such as boroughs, municipalities, or counties.
Synthesizing a population has two main phases: optimization (fitting) and allocation. The first phase fits a disaggregate sample of agents (microdata) to aggregated constraints (control totals), while the second phase replicates actual agents for the synthetic population using a probabilistic selection [2]. While the procedure on the second stage is usually the same across population synthesizers and relies on Monte Carlo sampling, there is a broad range of procedures for the first stage.

Materials and Methods
The algorithm builds on several of the methods and techniques that have been introduced thus far in the field of population synthesis and expands it to three geographical levels. The method includes three stages: (1) selecting geographical resolution and scales of analysis; (2) optimization; (3) allocation.
The following subsections will describe each stage and the application.

Selecting Geographical Resolution and Scales of Analysis
Two main data items required to synthesize populations are individual household structures (or household microdata) and aggregate distributions at a certain geographical resolution (or control totals).
Household microdata is provided by many statistics bureaus in the form of the microcensus. It includes basic socioeconomic information, current and previous employment, and location. Control totals are usually available at statistics bureaus. The data can be aggregated to several geographical resolutions: borough, municipality, county, state, or nationwide.
After data were collected, we selected attributes as control attributes. Control attributes must be meaningful for the model, included in both databases, and have equal or comparable stratifications in both databases.

Optimization: IPU with Three Geographical Resolutions
The optimization uses Iterative Proportional Updating (IPU). It was proposed by Konduri et al. [21] for two geographical resolutions, and it was expanded for this research to three geographical resolutions. The IPU procedure consists of adjusting the set of weights for each household of the microdata to minimize the error between control totals and calculated distributions of each attribute for each geographic resolution.
Before starting the IPU procedure, it is required to summarize each microdata record according to the categories of the control attributes. The result is stored in the frequency matrix. The frequency matrix shows the household and dwelling type and the frequency of different person types within each household for the sample. The dimension of the matrix is N × M, where N is the number of households in the microdata and M is the number of control attributes (household, person and dwelling type).
The set of weights is provided at the lowest geographical resolution. An initial set of weights is set to one. In the next iterations, weights are updated after considering each control attribute. All attributes, regardless of whether they are household, dwelling, or person type, are treated equally. Weights are only updated in the households where the frequency of the control attribute is different than zero. Attributes at the lowest geographical resolution (i.e., municipality) update only the weight of one record, while attributes at the higher geographical resolution (i.e., county) update a set of weights of all nested areas (i.e., municipalities or boroughs).
After all control attributes are considered, we calculate the relative difference in absolute difference between control total and calculated distribution for each attribute. The average error is compared to the previous iteration. If the absolute difference of average deviation values between two full iterations satisfies a set of tolerance criteria, the algorithm stops updating household weights. The default threshold is set equal to 0.01% and can be modified by the user. Average absolute relative difference across all constraints has been used previously by Ye et al. [20] and Konduri et al. [21] in the original IPU procedure and by others [3,5,23,31,35]. Other indicators for goodness of fit include standardized root mean square error [7,11,24,27,36], difference on counts [1,8,10], or error percentages [9]. Additional stopping criteria include the maximum number of iterations (default value set to 1500) and average error threshold (default value of 1 × 10 − 7). The process converges after several iterations depending on the number of control attributes and number of municipalities within one county. Household microdata is provided by many statistics bureaus in the form of the microcensus. It includes basic socioeconomic information, current and previous employment, and location. Control totals are usually available at statistics bureaus. The data can be aggregated to several geographical resolutions: borough, municipality, county, state, or nationwide.
After data were collected, we selected attributes as control attributes. Control attributes must be meaningful for the model, included in both databases, and have equal or comparable stratifications in both databases.

Optimization: IPU with Three Geographical Resolutions
The optimization uses Iterative Proportional Updating (IPU). It was proposed by Konduri et al. [21] for two geographical resolutions, and it was expanded for this research to three geographical resolutions. The IPU procedure consists of adjusting the set of weights for each household of the microdata to minimize the error between control totals and calculated distributions of each attribute for each geographic resolution.
Before starting the IPU procedure, it is required to summarize each microdata record according to the categories of the control attributes. The result is stored in the frequency matrix. The frequency matrix shows the household and dwelling type and the frequency of different person types within each household for the sample. The dimension of the matrix is N × M, where N is the number of households in the microdata and M is the number of control attributes (household, person and dwelling type).
The set of weights is provided at the lowest geographical resolution. An initial set of weights is set to one. In the next iterations, weights are updated after considering each control attribute. All attributes, regardless of whether they are household, dwelling, or person type, are treated equally. Weights are only updated in the households where the frequency of the control attribute is different than zero. Attributes at the lowest geographical resolution (i.e., municipality) update only the weight of one record, while attributes at the higher geographical resolution (i.e., county) update a set of weights of all nested areas (i.e., municipalities or boroughs).
After all control attributes are considered, we calculate the relative difference in absolute difference between control total and calculated distribution for each attribute. The average error is compared to the previous iteration. If the absolute difference of average deviation values between two full iterations satisfies a set of tolerance criteria, the algorithm stops updating household weights. The default threshold is set equal to 0.01% and can be modified by the user. Average absolute relative difference across all constraints has been used previously by Ye et al. [20] and Konduri et al. [21] in the original IPU procedure and by others [3,5,23,31,35]. Other indicators for goodness of fit include standardized root mean square error [7,11,24,27,36], difference on counts [1,8,10], or error percentages [9]. Additional stopping criteria include the maximum number of iterations (default value set to 1500) and average error threshold (default value of 1 × 10 − 7). The process converges after several iterations depending on the number of control attributes and number of municipalities within one county. Table A3 shows the pseudocode for the IPU with three geographical resolutions and two aggregation levels.

Allocation: Monte Carlo Sampling
To generate the synthetic population, households of the microdata are randomly drawn based on their weights. Once a household is selected for the municipality, it is allocated to a traffic analysis zone (TAZ) within the municipality or borough. The zone system nests TAZs within the municipal regions respecting municipal boundaries. The probability for each TAZ is the ratio between the population in the TAZ and the total population on the municipality.
The value of control attributes is directly copied from the microdata into the synthetic population. Finally, we computed the absolute difference between the generated population and control totals. The goodness of fit of the allocation is evaluated as the average relative error of the control attributes at the highest-ranking geography (i.e., county).
shows the pseudocode for the IPU with three geographical resolutions and two aggregation levels.

Allocation: Monte Carlo Sampling
To generate the synthetic population, households of the microdata are randomly drawn based on their weights. Once a household is selected for the municipality, it is allocated to a traffic analysis zone (TAZ) within the municipality or borough. The zone system nests TAZs within the municipal regions respecting municipal boundaries. The probability for each TAZ is the ratio between the population in the TAZ and the total population on the municipality.
The value of control attributes is directly copied from the microdata into the synthetic population. Finally, we computed the absolute difference between the generated population and control totals. The goodness of fit of the allocation is evaluated as the average relative error of the control attributes at the highest-ranking geography (i.e., county).

Application
We applied our approach to the greater Munich metropolitan area ( Figure 1). The region includes the cities of Munich, Augsburg, Ingolstadt, Landshut, and Rosenheim and their respective suburbs to cover the large commuting shed of the Munich region. The delineation of the study area was defined by the share of commuter flow into these five central cities. Municipalities were included in the area if at least 25% of workers commute to one of the central cities. The area includes a population of 4.5 million living in 2.1 million households. It has 444 municipalities distributed in 28 counties. The number of municipalities by county varied from 1 to 46. Finally, we computed the absolute difference between the generated population and control totals. The goodness of fit of the allocation is evaluated as the average relative error of the control attributes at the highest-ranking geography (i.e., county).

Application
We applied our approach to the greater Munich metropolitan area ( Figure 1). The region includes the cities of Munich, Augsburg, Ingolstadt, Landshut, and Rosenheim and their respective suburbs to cover the large commuting shed of the Munich region. The delineation of the study area was defined by the share of commuter flow into these five central cities. Municipalities were included in the area if at least 25% of workers commute to one of the central cities. The area includes a population of 4.5 million living in 2.1 million households. It has 444 municipalities distributed in 28 counties. The number of municipalities by county varied from 1 to 46.  After delineating the study area, municipalities are divided in Traffic Analysis Zones (TAZs) using a gradual raster-based zone system [37]. The total number of TAZs is 4950 ( Figure 2). Twenty-nine percent of the population lives in the city of Munich itself, while the average population per municipality is below 10,000. Therefore, a higher spatial resolution was created for the city of Munich than for other less populated municipalities.  The German household microdata was purchased from the German State Statistical Office. Due to privacy considerations, individual records are anonymized and not geolocated. In fact, the only geographical resolution of the reference is the state of residence (e.g., Bavaria). Control totals are available online at the German 2011 Household Census [38], the GENESIS Online database [39], the Census Hub [40], and the website of the city of Munich [41].

Rosenheim
Based on the available data in the control totals and microdata, we selected 60 control attributes: 47 attributes at the municipality level and 13 attributes at the county level. For the city of Munich, we selected 14 additional attributes at the borough level. Table 1 summarizes the control attributes by type and geographical resolution. The land use/transport model for which this synthetic population was generated requires other attributes that are not in the microdata, such as person income, person workplace, dwelling monthly cost, dwelling quality, or number of cars on the household. They are obtained from other submodels, such as car ownership model or land price model, and assigned by Monte Carlo sampling.

Results
This section presents the results of the synthetic population for the greater Munich metropolitan area. We analyze the differences between the aggregate distributions and the resulting synthetic population by: (1) geographical resolution and (2) attribute.
The optimization phase is deterministic, but the allocation phase produces slightly different model runs every simulation due to the random seed used while sampling. To provide some impression of the range of the sampling error, we ran the allocation five times. The average result of the five runs and the standard deviation between the five model runs are indicated.

By Geographical Resolution
Firstly, we analyzed the average error of control attributes at the municipality level ( Figure 3). We took the absolute value of each control attribute's error to calculate the average error by municipality. The errors after the optimization are shown in green, and the errors of the allocation are shown in orange. The errors after the optimization phase are generally below 6%. In fact, only 7 out of 444 municipalities had an average error higher than 6%. Those municipalities are located in large counties containing 46 and 23 municipalities, respectively. Additionally, some of the municipalities had a higher concentration of students or retired persons living in large households, which were scarcely observed on the microdata. The microdata seemed to underrepresent selected combinations of attributes, making it difficult to match all control totals very well. municipalities had a higher concentration of students or retired persons living in large households, which were scarcely observed on the microdata. The microdata seemed to underrepresent selected combinations of attributes, making it difficult to match all control totals very well.
The errors of the allocation phase are, as expected, highly dependent on the size of the municipality. In small municipalities, it is likely that many weights are near zero. Microrecords are sampled proportionally to their weights an integer number of times. The "law of large numbers" generally compensated the error on the attributes for a large number of draws, but it led to lower accuracy for small municipalities with a small number of draws, given the high number of possible combinations of attributes. In this sense, the size of the municipality, the number of microrecords, and the number of combinations of attributes should be balanced. In our case study, the balance was produced for municipalities with 5000 inhabitants or more (asymptote in Figure 3). The standard deviation between the five runs is also highly dependent on the size of the municipality (Figure 4). The standard deviation is generally lower than 1% for municipalities higher than 10,000. The optimization phase is deterministic, and therefore the deviation in five runs is equal to zero. As a consequence, only the allocation phase is plotted in Figure 4.
Secondly, we analyzed the errors at the county level for all attributes ( Figure 5). Given the size dependency of the allocation error, the average error at the county level was calculated weighting the municipality error by its population. Weighted average error was used at the optimization phase as well for consistency.
After the optimization phase, the majority of the counties had a weighted average error lower than 1.5%, and only two counties exceeded 3%. Not surprisingly, the weighted average error increased during the allocation phase. Rural counties with smaller municipalities presented the largest weighted average errors, although they were below 6%. Compared to the municipality level, the weighted average error at the county level is lower. The lower error of measurement for the larger zones is only a natural effect of the scale-dependent accuracy of geoinformation, perhaps due to spatial autocorrelation of the data and that at the county level, the number of microrecords is multiplied by the number of municipalities within the county, having a bigger sample to distribute the control totals. The errors of the allocation phase are, as expected, highly dependent on the size of the municipality. In small municipalities, it is likely that many weights are near zero. Microrecords are sampled proportionally to their weights an integer number of times. The "law of large numbers" generally compensated the error on the attributes for a large number of draws, but it led to lower accuracy for small municipalities with a small number of draws, given the high number of possible combinations of attributes. In this sense, the size of the municipality, the number of microrecords, and the number of combinations of attributes should be balanced. In our case study, the balance was produced for municipalities with 5000 inhabitants or more (asymptote in Figure 3).
The standard deviation between the five runs is also highly dependent on the size of the municipality (Figure 4). The standard deviation is generally lower than 1% for municipalities higher than 10,000. The optimization phase is deterministic, and therefore the deviation in five runs is equal to zero. As a consequence, only the allocation phase is plotted in Figure 4.
Secondly, we analyzed the errors at the county level for all attributes ( Figure 5). Given the size dependency of the allocation error, the average error at the county level was calculated weighting the municipality error by its population. Weighted average error was used at the optimization phase as well for consistency.
After the optimization phase, the majority of the counties had a weighted average error lower than 1.5%, and only two counties exceeded 3%. Not surprisingly, the weighted average error increased during the allocation phase. Rural counties with smaller municipalities presented the largest weighted average errors, although they were below 6%. Compared to the municipality level, the weighted average error at the county level is lower. The lower error of measurement for the larger zones is only a natural effect of the scale-dependent accuracy of geoinformation, perhaps due to spatial autocorrelation of the data and that at the county level, the number of microrecords is multiplied by the number of municipalities within the county, having a bigger sample to distribute the control totals.  Finally, we analyzed the results for the city of Munich at three geographical resolutions: county, municipality, and borough. The error for the city of Munich was 2%, slightly higher than other large cities. The attribute with the highest error is single-person households, which provided an average error of 10%, similar across all boroughs. The error is produced due to inconsistencies between  Finally, we analyzed the results for the city of Munich at three geographical resolutions: county, municipality, and borough. The error for the city of Munich was 2%, slightly higher than other large cities. The attribute with the highest error is single-person households, which provided an average error of 10%, similar across all boroughs. The error is produced due to inconsistencies between Finally, we analyzed the results for the city of Munich at three geographical resolutions: county, municipality, and borough. The error for the city of Munich was 2%, slightly higher than other large cities. The attribute with the highest error is single-person households, which provided an average error of 10%, similar across all boroughs. The error is produced due to inconsistencies between borough and municipality control totals, as the total number of single-person households in Munich differs from 368,447 at the municipality level to 410,993 as the sum of the boroughs. The difference is around 10%, which is the same result obtained from the algorithm. Figure 6 shows the share of single-person households across Munich boroughs and the difference between control totals and the results after the optimization. The differences in absolute value are below 0.2%, providing practically the same distribution of the attributes across all boroughs. Therefore, the sole analysis of the error could be misleading when control totals at different resolutions are not consistent.

By Attribute
The weighted average errors (in absolute value) and deviations for household, dwelling, and person attributes are summarized in Tables 2-4, respectively. We took the absolute value per municipality and calculated the weighted average error, as in the previous analysis. In all the cases, the errors at the allocation phase are higher than the errors at the optimization phase. Deviation at the optimization phase is equal to zero because the procedure is deterministic. The error in the total number of households is equal to zero because the method adjusted weights to match that number. During optimization, the total number of households was the last attribute to compute. As a consequence, the final weight provides a perfect match for the total number of households. During allocation, households are drawn N-times, with N being the total number of households at the control total.
The distribution of household sizes provides individual errors below 1% after the optimization phase and below 4% after the allocation phase. Errors increase slightly with household size. Large households have more possible combinations of attributes and they are less frequent in the microdata, which increases the error. The initial categories of household size were reduced from six to five to reduce the allocation errors in small municipalities for very large household sizes.
Similar conclusions are obtained from dwelling attributes. There are very few microdata records with dwellings in small buildings constructed after 2005. After a series of trials and errors, the initial categories of dwelling construction period were reduced from six to four categories.
The errors for person attributes are higher than for household and dwelling attributes. It is caused by the high number of possible cross-classifications by age and gender (equal to 34). Age-by-gender distributions are a key demographic characteristic for the land use model. Therefore, we opted to maintain the relatively large number of stratifications. The errors were below 3.1% after the optimization phase and varied between 3.5% and 7.0% after the allocation phase. If only age is taken into account, the error decreases considerably and is below 5%. Given that census data have a small error term as well (between 5% and 10% for the number of microrecords used in this case study [42]), the results are assumed to represent the population of the study area reasonably well.

Discussion
In this paper, we developed the synthetic population of the greater Munich metropolitan area. The algorithm is written in Java 8, taking advantage of memory efficiency benefits and parallelization of some methods for the optimization procedure. As with some other authors, we used IPU for optimization [17,20,21,25] and, like almost all synthesizers, Monte Carlo sampling for allocation. The runtime of the optimization phase is 17 min, while the allocation phase takes 1 h. Household, person, and dwelling objects are created during the allocation phase sequentially, increasing the total runtime of the program. The population synthesizer outperforms other synthesizers, which usually take between a couple of hours to an overnight run [17] for a similar scale.
We developed, for the first time, one application for three geographical areas. This feature could be beneficial in large municipalities with different characteristics across boroughs and case studies with control totals clustered at different geographical resolutions. We observed that the attributes controlled at the higher geographical resolution (i.e., county) presented lower errors than those controlled at the lower geographical resolution (i.e., municipality). At the county level, the number of microrecords is multiplied by the number of municipalities within the county. Having a bigger sample of microrecords to calculate the weights leads to better distributions of control totals, and therefore, to more accurate allocation results at the county level. Nevertheless, the local distribution at the municipality level is not controlled and could lead to higher errors if there are strong differences across the municipalities. In some cases, the algorithm can produce better results if the attributes are controlled at the higher geographical resolution, and the lower resolution is only used for validation. In the case of inconsistencies between control totals at two geographical levels, it is misleading to solely use the average error. The results should be analyzed with caution to determine whether the share across the lower level resembles the control totals distribution. Another approach could be rescaling the control totals distributions of the less reliable area to have the same total in both resolutions.
Another important discussion is the order of attributes at the optimization phase. The algorithm updates sequentially the set of weights to match the control totals. Therefore, the weights better represent those attributes that are matched at the end. As a consequence, the most important attributes shall come last, while the least reliable (or least relevant) attributes shall come first. During this exercise, all county attributes were considered before starting with municipality attributes due to reliability.
The application presents one of the highest numbers of attributes found in the literature (see Table A2). As far as the authors know, there is no open discussion regarding the number of attributes and their categorization to avoid overfitting. Some categories were merged to reduce the number of possible combinations of household, person, and dwelling types. Having a fast algorithm allowed for testing multiple combinations of attributes and categories that reduced the resulting error without significantly decreasing the fidelity of the results. In this case study, the initial categories for household size were reduced from six to five to control the errors at small municipalities. Similarly, the number of dwelling ages by building size was reduced from 12 to 8.

Conclusions
Our algorithm to create a synthetic population maintained the advantages of the original IPU algorithm and added two important improvements. Foremost, it expanded the geographical scope from two to three geographic levels. This allowed for a more accurate synthesis of the city of Munich, which contains 29% of the population and has differentiated demographic profiles by boroughs. The method can include different attributes at each geographical level.
The algorithm is also able to synthesize 4.5 million persons in 2.1 million households in less than 1.5 h. Having a faster algorithm can improve the accuracy of the synthesized population. A series of trials with different attribute stratifications and a different order in which attributes are controlled are usually required to better replicate the population. The order of control attributes during the optimization alters the results. Control attributes with lower reliability or less relevance for the task at hand should be tested first to reduce their influence on the final results. The most important control attributes, such as total population or total number of households, should be tested last to assure that the final weights accommodate their control totals.
Equally important is to avoid overfitting. An excessive number of attributes could drastically increase the number of possible combinations of household, person, and dwelling types and lead to very scarce combinations of those attributes on the microdata. This issue becomes more relevant for larger counties with a high number of municipalities. In such cases, the user needs to balance the relative importance and reliability of having more categories from the attributes and the error that could be produced after allocation. This consideration is especially important for small municipalities.
The population synthesizer is incorporated with the open-source land use model SILO [43] and is available at the GitHub repository: https://github.com/msmobility/silo. The pseudocode for the optimization phase is in ISPRS Int. J. Geo-Inf. 2018, 7, 0 3 of 21 Household microdata is provided by many statistics bureaus in the form of the microcensus. It includes basic socioeconomic information, current and previous employment, and location. Control totals are usually available at statistics bureaus. The data can be aggregated to several geographical resolutions: borough, municipality, county, state, or nationwide.
After data were collected, we selected attributes as control attributes. Control attributes must be meaningful for the model, included in both databases, and have equal or comparable stratifications in both databases.

Optimization: IPU with Three Geographical Resolutions
The optimization uses Iterative Proportional Updating (IPU). It was proposed by Konduri et al. [21] for two geographical resolutions, and it was expanded for this research to three geographical resolutions. The IPU procedure consists of adjusting the set of weights for each household of the microdata to minimize the error between control totals and calculated distributions of each attribute for each geographic resolution.
Before starting the IPU procedure, it is required to summarize each microdata record according to the categories of the control attributes. The result is stored in the frequency matrix. The frequency matrix shows the household and dwelling type and the frequency of different person types within each household for the sample. The dimension of the matrix is N × M, where N is the number of households in the microdata and M is the number of control attributes (household, person and dwelling type).
The set of weights is provided at the lowest geographical resolution. An initial set of weights is set to one. In the next iterations, weights are updated after considering each control attribute. All attributes, regardless of whether they are household, dwelling, or person type, are treated equally. Weights are only updated in the households where the frequency of the control attribute is different than zero. Attributes at the lowest geographical resolution (i.e., municipality) update only the weight of one record, while attributes at the higher geographical resolution (i.e., county) update a set of weights of all nested areas (i.e., municipalities or boroughs).
After all control attributes are considered, we calculate the relative difference in absolute difference between control total and calculated distribution for each attribute. The average error is compared to the previous iteration. If the absolute difference of average deviation values between two full iterations satisfies a set of tolerance criteria, the algorithm stops updating household weights. The default threshold is set equal to 0.01% and can be modified by the user. Average absolute relative difference across all constraints has been used previously by Ye et al. [20] and Konduri et al. [21] in the original IPU procedure and by others [3,5,23,31,35]. Other indicators for goodness of fit include standardized root mean square error [7,11,24,27,36], difference on counts [1,8,10], or error percentages [9]. Additional stopping criteria include the maximum number of iterations (default value set to 1500) and average error threshold (default value of 1 × 10 − 7). The process converges after several iterations depending on the number of control attributes and number of municipalities within one county. Table A3 shows the pseudocode for the IPU with three geographical resolutions and two aggregation levels.

Allocation: Monte Carlo Sampling
To generate the synthetic population, households of the microdata are randomly drawn based on their weights. Once a household is selected for the municipality, it is allocated to a traffic analysis zone (TAZ) within the municipality or borough. The zone system nests TAZs within the municipal regions respecting municipal boundaries. The probability for each TAZ is the ratio between the population in the TAZ and the total population on the municipality.
The value of control attributes is directly copied from the microdata into the synthetic population. Finally, we computed the absolute difference between the generated population and control totals. The goodness of fit of the allocation is evaluated as the average relative error of the control attributes at the highest-ranking geography (i.e., county).
. Implementations of this work in Cape Town and Kagawa (Japan) demonstrated the adaptability of the algorithm to other study areas. Further plans to implement this work in Sydney and Teheran, and to compare the results in Cape Town with an alternative approach will appraise the adaptability of the algorithm to other study areas and modeling scales. More empirical work still needs to be done to evaluate the model performance of the algorithm in land use models or other applications, such as travel demand models [44,45] or public health models [46], and to improve allocation based on dasymetric mapping [47]. Including agents' microlocation (or explicit coordinates) is also under investigation.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.