Next Article in Journal
Nested Stochastic Valuation of Large Variable Annuity Portfolios: Monte Carlo Simulation and Synthetic Datasets
Next Article in Special Issue
Gridded Population Maps Informed by Different Built Settlement Products
Previous Article in Journal
Microstructural and Metabolic Recovery of Anhedonic Rat Brains: An In Vivo Diffusion MRI and 1H-MRS Approach
Article Menu

Export Article

Open AccessArticle

Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia

1,2,3,†,* , 4,†
Flowminder Foundation, SE-11355 Stockholm, Sweden
WorldPop, Department of Geography and Environment, University of Southampton, Southampton SO17 1BJ, UK
Department of Social Statistics, University of Southampton, Southampton SO17 1BJ, UK
Department of Economics, Leiden University, 2311 EZ Leiden, The Netherlands
These authors contributed equally to this work.
Author to whom correspondence should be addressed.
Received: 18 June 2018 / Revised: 20 July 2018 / Accepted: 7 August 2018 / Published: 9 August 2018
Full-Text   |   PDF [3333 KB, uploaded 9 August 2018]   |  


Whether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we simulate a close-to-reality population of individuals nested in households geolocated to realistic building locations. Using the R simPop package and ArcGIS, multiple realizations of a geolocated synthetic population are derived from the Namibia 2011 census 20% microdata sample, Namibia census enumeration area boundaries, Namibia 2013 Demographic and Health Survey (DHS), and dozens of spatial covariates derived from publicly available datasets. Realistic household latitude-longitude coordinates are manually generated based on public satellite imagery. Simulated households are linked to latitude-longitude coordinates by identifying distinct household types with multivariate k-means analysis and modelling a probability surface for each household type using Random Forest machine learning methods. We simulate five realizations of a synthetic population in Namibia’s Oshikoto region, including demographic, socioeconomic, and outcome characteristics at the level of household, woman, and child. Comparison of variables in the synthetic population were made with 2011 census 20% sample and 2013 DHS data by primary sampling unit/enumeration area. We found that synthetic population variable distributions matched observed observations and followed expected spatial patterns. We outline a novel process to simulate a close-to-reality microdata census geolocated to realistic building locations in a low- or middle-income country setting to support spatial demographic research and survey methodological development while avoiding disclosure risk of individuals. View Full-Text
Keywords: simulation; census; simPop; LMIC simulation; census; simPop; LMIC

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Supplementary material


Share & Cite This Article

MDPI and ACS Style

Thomson, D.R.; Kools, L.; Jochem, W.C. Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia. Data 2018, 3, 30.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics



[Return to top]
Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top