The Missing Millions in Maps: Exploring Causes of Uncertainties in Global Gridded Population Datasets

: Gridded population datasets model the population at a relatively high spatial and temporal granularity by reallocating ofﬁcial population data from irregular administrative units to regular grids (e.g., 1 km grid cells). Such population data are vital for understanding human–environmental relationships and responding to many socioeconomic and environmental problems. We analyzed one very broadly used gridded population layer (GHS-POP) to assess its capacity to capture the distribution of population counts in several urban areas, spread across the major world regions. This analysis was performed to assess its suitability for global population modelling. We acquired the most detailed local population data available for several cities and compared this with the GHS-POP layer. Results showed diverse error rates and degrees depending on the geographic context. In general, cities in High-Income (HIC) and Upper-Middle-Income Countries (UMIC) had fewer model errors as compared to cities in Low- and Middle-Income Countries (LMIC). On a global average, 75% of all urban spaces were wrongly estimated. Generally, in central mixed or non-residential areas, the population was overestimated, while in high-density residential areas (e.g., informal areas and high-rise areas), the population was underestimated. Moreover, high model uncertainties were found in low-density or sparsely populated outskirts of cities. These geographic patterns of errors should be well understood when using population models as an input for urban growth models, as they introduce geographic biases.


Introduction
The global human population is presently estimated at 7.9 billion, and it is projected to increase to 9.7 billion in 2050 [1]. However, there are high uncertainties as to the total number as well as in the geographic patterns of the population at country, regional and city scales [2][3][4]. Presently, the majority of the global population is living in cities, and for many local and global policy goals (e.g., the Sustainable Development Goals (SDGs)), reliable geographic information about the population distribution is required [5,6]. In general, population datasets are critical components to measure and understand humanenvironmental interrelationships. They are widely used in economic models, public health research, human settlement planning, election preparation, risk assessment and disaster preparedness and response [7][8][9][10]. Thus, all these applications require reliable population data. Such datasets are increasingly available, but it is difficult for users to understand their strengths and weaknesses for a specific application context [11]. In this paper, we take the example of one commonly used population dataset, the GHS-POP [12], to assess the causes of any uncertainties when using it as an input for urban models.
Urban growth models are commonly used methods to monitor and plan for sustainable urban development [13][14][15]. Such models require accurate global population data. One of the most commonly used population models is the multi-temporal global population projection of the European Commission, the GHS-POP dataset [12]. Therefore, the overall goal of this study was to gain insight into the causes of the erroneous allocation of the GHS-POP data (within cities) by analyzing the relations between the GHS-POP data and local population data and relating uncertainties to different land-use types. First, we compared GHS-POP with local population data to identify any over-and underestimations at an intracity scale. Second, we compared the estimation errors with land uses to better understand the causes of overestimation or underestimation. The study addresses the following questions: a.
What is the relatedness of GHS-POP and local population data at the lowest available administrative level? b.
What is the relationship between the spatial pattern of over-and underestimated areas and the types of land use? c.
What are the implications for the use of presently available population data in urban growth models?

Gridded Population Models-Their Strengths and Limitations
Traditionally, population data are obtained through a census-official counts of all persons in a country. Census data are collected at long intervals, commonly every ten years. However, in many Low-and-Middle-Income Countries (LMICs) censuses have been interrupted, postponed or are not scheduled [16], with common causes being conflicts and, also, recently, the COVID-19 pandemic. For instance, the Democratic Republic of Congo has not had a census for 30 years, and Brazil recently postponed its 2020 census. Additionally, the census data collection frame may have biases, such as the exclusion of marginalized groups (e.g., slums, temporary settlements) [17,18]. Additionally, administrative units change over time and large aggregation units can hide heterogeneity in the area (known as the modifiable area unit problem [19]).
Global gridded population mapping approaches started in the 1990s, when population data from irregular vector formats were converted to standardized grid cells [10,20]. Global gridded population datasets use a consistent model framework (e.g., WorldPop), providing a high-resolution population count (e.g., 100 m grid cells). Most gridded dataset sets use dasymetric models to estimate the spatial distribution of population data, using available census data in combination with other spatial data (e.g., land cover) to disaggregate population counts across grid cells. They are split into top-down and bottom-up approaches. Most global population datasets are derived from top-down gridded approaches (see Appendix A); they disaggregate population counts into small grid cells (e.g., GHS-POP). Simple top-down approaches assume a uniform distribution of population within administrative units (e.g., GPWv4 [21]), while more complex approaches incorporate ancillary data to generate weights (e.g., land cover, night-time lights) for allocating population [8]. Bottom-up gridded approaches are typically based on micro-census samples and build geo-statistical relationships between population density (micro-census) and the built environment to predict population counts across grid cells of unsampled areas (Wardrop et al., 2018) (e.g., LandScan-HD or GRID3).
Many gridded population datasets are now available, which have been emerging with the data revolution, and many are open access. However, this great diversity of models leaves users often very uncertain about the advantages and limitations of individual datasets. The input data in these models are diverse, and underlying assumptions and modelling approaches affect the outcome of the gridded population dataset. For example, the grid sizes differ, ranging from 30 m to 10 km. Large grid cells better reflect the low granularity of available census data (used as the input), but are limited in reflecting the spatial variability of populations [11]. In general, gridded datasets, except for LandScan, measure the night-time population. For large cities, there can be a difference of several million between the daytime and night-time populations, as people commute to cities for work, education, etc.
Commonly used global gridded population models are summarized in Appendix A. Most of these models use the GPW as their input, which is now in its fourth version (GPWv4). GPWv4 is based on the most detailed spatial resolution census data collected between 2005 and 2014. For example, the GHS-POP uses the Global Human Settlement Layer (GHSL) and GPWv4 [22]. WorldPOP includes several covariates (e.g., night-time lights) to model population distribution using a random forest-based machine learning approach [8]. Grid3 is an emerging dataset that presently provides population data for several African countries as a bottom-up model. The historic HYDE model, based on the United Nation's World Population Prospects and historical estimations from the literature [23,24], provides a time series of the human population with a spatial resolution of 10 km. These population models use a variety of ancillary data (e.g., built-up masks, land cover, land use, roads, infrastructure, services, night-time lights, topography and points of interest [7,9,25]) that assist in the spatial distribution of the population [8,25,26]. In general, the resolution of ancillary data influences the predictability of the model. In most cases, high-resolution data provide more reliable estimates than coarse-resolution data. However, only a few high-resolution datasets have global coverage and layers might have gaps (e.g., in rural areas) [25,27]. Commonly used ancillary data include land-cover/land-use maps from satellite images, e.g., the Global Human Settlement Layer or the Global Urban Footprint [26]. Typically, the integration of several ancillary datasets improves the accuracy of the population model [28].
Understanding the modelling approaches is an essential step in understanding the strengths and weaknesses of each dataset. As the population distribution approach disaggregates census data into grids, the uncertainties of the input data will propagate in the model. In addition, the employed modelling methods come with caveats. For example, several population models (e.g., LandScan, GHS-POP) employ regression-based models [29]. These models assume a stable relationship between population density and covariates. However, this assumption is false, and non-linear relationships are not captured [30]. Machine learning models are increasingly used for population modelling (e.g., WorldPOP). Random Forest (RF) [31], for example, can deal with high dimensional datasets and can model complex non-linear relationships. Thus, besides the census, the different input data and modelling approaches used can also impact the accuracy of the model. We have summarized their strengths, weaknesses and dependencies based on the literature (Table 1). Besides the modelling approach, the main factor that influences the accuracy of top-down approaches is the census data (e.g., its aggregation scale). The more spatially disaggregated these input data are (i.e., the higher the resolution), the more precise the allocation in the grid cells [32]. The modelling approach influences the disaggregation/allocation of the population into the grid cell; many population models (e.g., GHS-POP, GRUMP) do not differentiate residential from other land uses such as commercial and industrial and, therefore, allocating population to non-residential areas. Most datasets (e.g., WorldPOP, GHS-POP) model the night-time population (census), while LandScan provides the ambient population [33]. Generally, models show poor performance in high-density urban areas (e.g., informal areas) [34,35]. This is a consequence of most spatialization methods that distribute the census population over built-up areas, including non-residential areas [11]. To deal with these limitations, recently, bottom-up approaches are being developed that directly predict population within unsampled grid cells or integrate both modelling approaches (e.g., GRID3) [2].
Given the increasing availability of population datasets, it is important to know how accurate these datasets are. The common validation approach is to compare the model estimates with authoritative population data. However, fine-resolution census datasets are not readily available at a global scale [39], nor is there an accepted method to measure the level of errors in population estimates [4,33]. Commonly used methods include the root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) [9,39]. Table 2 summarizes five common causes of errors: the spatial heterogeneity of the environment (e.g., variations in population densities) [23,40]; the quality of census data (e.g., temporal and spatial resolution) [32,41]; the quality of ancillary data [42] (e.g., reliance on coarse-resolution night-time light data); the scale effect and temporal mismatch increase uncertainty (e.g., differences in data availabilities) [8]; and differences in regional and local characteristics (e.g., differences in the urbanization rate). Most HIC countries have a relatively slow rate of population growth and a more stable settlement pattern than LMICs, impacted by uncontrolled urbanization (e.g., slums). All global gridded population datasets underestimate the slum population [34]. Table 1. Strengths and weaknesses of population models reported in the literature [8,23,33,34,[36][37][38] (GPWv4 is based on 2010 round of census and includes over 12,500,000 units and GPWV3 was based on 2000 round of census and includes over 375,000 input units).  Table 2. Common causes of errors in modelling approaches and suggested solutions from literature.

Causes of Errors Probable Causes Recommendations Key References
Spatial heterogeneity affects population counts

Uniform distribution assumption
Complex spatialization models are likely to perform better in complex urban distribution settings [9,25,34] Quality of census data Gaps, and low spatial and temporal granularities Combine or replace with bottom-up models [32][33][34]41] Quality of ancillary data Poor ancillary data quality (errors propagate into the model) Inspect ancillary data before using as an input Check data redundancy and covariates contribution [7,16] Scale effect and temporal mismatch Data inconsistencies at spatial and temporal scale When available, use high-resolution RS input When not, be conscious on probable uncertainties and allocate time for data corrections [8] Differences in regional and local characteristics Top-down modelling approaches are less sensitive to regional characteristics Lack of reliable ancillary data with high resolution in the global level Experiment with different ancillary datasets regarding environmental and morphological variables to better characterize the local specificities increasing the model sensitivity [7,32]

Materials and Methods
To assess model uncertainties in the population allocation of the GHS-POP layer, seven cities were selected. These cities were selected for two main reasons. First, they represent a generalized world sample under the seven World Bank regions division. Second, they were selected based on the availability of spatially fine-grained local population and landuse datasets. Figure 1 shows a general overview of the approach used to analyze the city-level dataset. First, the collected datasets were inspected, cleaned and prepared for analysis. City-level datasets required a common geo-referencing system and were masked to the area of the administrative extent (as most local population datasets are only available for this extent and not for a morphologically built-up region). Next, the local population data and GHS-POP datasets were aggregated to 1 km grid cells, this being a common resolution used for global urban growth models (e.g., [43]). For the local population data and the GHS-POP data, we also included cells with zero population values in the analysis to avoid biased results.

Accuracy Assessment Approach
Four commonly used error metrics were used to gain insights into the absolute and the relative accuracy. The root mean square error (RMSE), the mean absolute percentage error (MAPE) and the R 2 where utilized to understand the overall accuracy of the GHS-POP data as compared to the local population data (Bai, Wang, Wang, Gao, & Sun, 2018; Xu, Ho, Knudby, & He, 2020; Calka & Bielecka, 2020). In addition, the relative estimation error (REE) per grid was used to assess the spatial distribution of errors using the mean population values per fishnet cell (Table 3). For the error estimation, we used different error metrics (for details, see Section 3.1) and analyzed the spatial patterns of model uncertainties. A complete evaluation of each case study was performed by relating the population estimation and acquired error metrics with land use. The available land-use data for each city were reclassified into three classes, i.e., non-residential built-up, residential built-up and non-built-up. This was necessary as the available land-use data for the different cities had various levels of detail. In general, a built-up area is defined by the presence of elevated structures and buildings, while non-built-up areas mostly lack such structures (e.g., agriculture, park and forest areas) (Pesaresi et al., 2013). Residential built-up areas are dominated by residential land uses, while non-residential built-up areas are dominated by non-residential uses (e.g., industries, large infrastructures). For cities for which we had access to data, and where it was relevant, the class 'residential' was further split into informal and formal residential.

Accuracy Assessment Approach
Four commonly used error metrics were used to gain insights into the absolute and the relative accuracy. The root mean square error (RMSE), the mean absolute percentage error (MAPE) and the R 2 where utilized to understand the overall accuracy of the GHS-POP data as compared to the local population data (Bai, Wang, Wang, Gao, & Sun, 2018; Xu, Ho, Knudby, & He, 2020; Calka & Bielecka, 2020). In addition, the relative estimation error (REE) per grid was used to assess the spatial distribution of errors using the mean population values per fishnet cell (Table 3). Table 3. Error Metrics used to assess the spatial distribution of the modelled population data.

Error Metrics Equations Comments
Root mean square error (RMSE) Where y pred is the predicted/estimated value and y ref the local reference value, (y pred − y ref ) 2 = differences, squared and n is number of observations Mean absolute percentage error (MAPE) Where AEE is the absolute estimation error (y pred − y ref ) Where RSS is total sum of square residuals and TSS total sum of squares For a detailed comparison of the two population grids, the REE was categorized into seven error classes, as shown in Table 4. A difference of +/−10% per cell was used as a threshold to define accurate estimation. The REE was used to show the spatial patterns of errors as it allows for a comparison of the difference between the GHS-POP and local population data at the grid-cell level. At the same time, the other metrics in Table 3 provided mean statistics at the city scale.

Selection of Case Studies
To support a global assessment for each of the World Bank global regions, one example city has been used ( Table 5). The selection was driven by the availability of relatively disaggregated local population data that are close to one of the reference years of the GHS layer (i.e., 1975, 1990, 2000 and 2015). The selected cities included different types (different urban morphologies), i.e., coastal (e.g., Jakarta) and inland (e.g., Enschede) cities, megacities (e.g., Sao Paulo) and secondary cities (Kumasi), economic hubs (e.g., New York) and cities with informal developments (e.g., Kabul). In cases where the local population data did not match with the exact GHS-POP reference year, the local population data were projected to make them comparable with GHS-POP using the following equations: Equation (1)-growth rate and Equation (2)-population projection. (1) where P t = value at time GHS-POP time, P 0 is the value at the start, r is the rate of growth and t is the number of years. The local population data aggregated to the 1 km grid of the GHS-POP layer was finally compared with the modelled population using the five selected error metrics (Section 3.1).

Comparison of Case Studies: Spatial Patterns of Uncertainties
To analyze the spatial patterns of uncertainties for all cities, local land-use maps were acquired. For consistency purposes, the land-use data were reclassified into residential, nonresidential built-up and non-built-up. For each of these classes, the REE was calculated. This allowed for the assessment of the relationship between land-use types and uncertainties. In four cities (Kumasi, Jakarta, Kabul and Cairo), the residential class was further split into formal and informal to further investigate the different uncertainties.

Results
The results are presented in a comparative way; to analyze the overall patterns of errors and the particularities of individual cities.

Overall Error Estimation per Case Study
The overall error estimation in Table 6 shows two distinct patterns. First of all, cities in the HICs have a better model fit, i.e., the R 2 values tend to be higher as compared to cities in the Low-and Middle-Income countries (LMICs). The only exception here was Sao Paulo, with an R 2 value of 0.86. Brazil has a very well-developed census, which is conducted on a regular basis. For the four cities in LMICs, the model exhibited a moderate to weak fit. Second, in most cities, the modelled population (GHS-POP) underestimated the local population data. This means that the modelled population tends to be lower than the actual population. However, two cities defied this trend: in Enschede and Kabul, the modelled population tended to be mostly too high. In Enschede, this occurred mainly in the outskirts of the urban area, while in Kabul it occurred more in the central locations.

Spatial Patterns of Errors
To compare the local population data with the GHS-POP model, Figure 2a-c presents the spatial patterns of the population values (at 1 × 1 km grid cells) and the relative estimation errors (REE). It shows the population distribution at the grid level across the cities, and it reveals similar patterns between estimated and measured values. In general, the GHS-POP population distribution follows major city features. Across cities, outskirts were overestimated, while more central parts were underestimated. In several cities, the commercial and industrial areas also showed a strong overestimation, most visible in Kabul and Jakarta. For example, in Jakarta, the harbor area was highly overestimated by GHS-POP.

Land Use and Population Distribution Relationships
To further analyze the relationship between the errors and land uses, the REE was calculated for the different land-use types at the grid-cell level. The results ( Figure 3) show that non-residential built-up areas tended to be overestimated. This is an obvious result of not using a land-use layer as part of the modelling approach. Generally, regions of commercial, industrial and infrastructure/transport activities (e.g., harbors) were overestimated. Nonbuilt-up areas were also often overestimated, due to small structures being detected by the GHSL, which in many cases are not residential, or small scattered settlements with much lower population density as modelled by GHS-POP. Two cities showed different results. In Kabul and in Sao Paulo, the non-built-up areas tended to be underestimated. In both cities, scattered developments on their outskirts are not well captured by the GHSL layer. In the case of Sao Paulo, errors were caused by dense vegetation cover, and in Kabul by informal developments in step slopes (rocky terrain with little contrast between buildings and rocks). For the cities with large informal settlements (we excluded Sao Paulo as most informal areas are much smaller than 1 km 2 ), informal areas tend to be underestimated, even at this coarse scale of analysis. Overall, we see that, due to the overestimation of the population in non-residential areas, residential and in particular high-density residential areas (such as informal areas) were underestimated.

Land Use and Population Distribution Relationships
To further analyze the relationship between the errors and land uses, the REE was calculated for the different land-use types at the grid-cell level. The results ( Figure 3) show that non-residential built-up areas tended to be overestimated. This is an obvious result of not using a land-use layer as part of the modelling approach. Generally, regions of commercial, industrial and infrastructure/transport activities (e.g., harbors) were overestimated. Non-built-up areas were also often overestimated, due to small structures being detected by the GHSL, which in many cases are not residential, or small scattered settlements with much lower population density as modelled by GHS-POP. Two cities showed the GHSL layer. In the case of Sao Paulo, errors were caused by dense vegetation cover, and in Kabul by informal developments in step slopes (rocky terrain with little contrast between buildings and rocks). For the cities with large informal settlements (we excluded Sao Paulo as most informal areas are much smaller than 1 km 2 ), informal areas tend to be underestimated, even at this coarse scale of analysis. Overall, we see that, due to the overestimation of the population in non-residential areas, residential and in particular highdensity residential areas (such as informal areas) were underestimated.

What Are the Common Issues across All Case Studies?
For the general error patterns (Table 7), we observed that cities in Upper-Middle-Income Countries (UMICs) and LMICs with frequent, open and well-established census data collection systems were better modelled by the GHS-POP data compared to cities with low census frequencies. Complex cities in LMICs that are dominated by large-scale informal (slum) developments had large estimation errors. The common issues observed in all case studies relate to estimation errors due to large non-residential built-up areas that were incorrectly assigned population estimates by the GHS-POP layer. Furthermore, high-density areas (e.g., informal settlements) were often underestimated. The absence of

What Are the Common Issues across All Case Studies?
For the general error patterns (Table 7), we observed that cities in Upper-Middle-Income Countries (UMICs) and LMICs with frequent, open and well-established census data collection systems were better modelled by the GHS-POP data compared to cities with low census frequencies. Complex cities in LMICs that are dominated by large-scale informal (slum) developments had large estimation errors. The common issues observed in all case studies relate to estimation errors due to large non-residential built-up areas that were incorrectly assigned population estimates by the GHS-POP layer. Furthermore, highdensity areas (e.g., informal settlements) were often underestimated. The absence of a basic land-use map in the modelling approach of the GHS-POP causes a major problem, i.e., much of the population is allocated to non-residential built-up areas while the population of high-density residential areas is underestimated. On a global average, around 25% of urban grid cells had a correctly estimated population (with +/−10%), while 75% of all urban grid cells were wrongly estimated. These numbers indicate profound uncertainties when it comes to using such data as an input for urban models, as 75% of urban areas were not well modelled in this study. • overestimation occurs in non-residential and non-built-up (e.g., industries, warehouses, stadiums and parks) • underestimation occurs in informal areas, high densities are not well captured The poor performance of GHS-POP can be attributed to the coarse resolution of the input population data used for Jakarta Built-up densities vary starkly in Jakarta between different land-use types, meaning the inclusion of land-use data that also differentiate between formal and informal would be beneficial Correct area: 16.2%

New York
The GHS-POP layer provides a good depiction of the population distribution In general, scattered development on the outskirts of cities shows overestimation and underestimation ( Figure 4). The sparse low-density areas are encompassed within large census units, assuming homogenous densities and not considering the settlement locations (in the census data), while the GHS-POP can better capture these density variations. However, the resolution of the GHSL layer is still too coarse to capture small-scale development and tends to over-predict built-up areas [44].
In general, scattered development on the outskirts of cities shows overestimation and underestimation (Figure 4). The sparse low-density areas are encompassed within large census units, assuming homogenous densities and not considering the settlement locations (in the census data), while the GHS-POP can better capture these density variations. However, the resolution of the GHSL layer is still too coarse to capture small-scale development and tends to over-predict built-up areas [44]. Another major problem observed with the GHS-POP data ( Figure 5) relates to an overestimation of non-residential areas (e.g., large transport infrastructure, industrial areas); this overestimation contributes to an underestimation of the moderate-to high-density residential areas. Many moderate-to high-density residential areas have high-rise structures, which are not captured by the GHS-POP model. Another major problem observed with the GHS-POP data ( Figure 5) relates to an overestimation of non-residential areas (e.g., large transport infrastructure, industrial areas); this overestimation contributes to an underestimation of the moderate-to highdensity residential areas. Many moderate-to high-density residential areas have high-rise structures, which are not captured by the GHS-POP model.

Example of greatly overestimated area (harbour and industrial area)
Example of greatly overestimated area (airport area)

Example of greatly underestimated area (multi-story housing)
Example of greatly underestimated area (mixed high-density housing) Figure 5. Examples of very greatly over-and underestimated areas in Enschede.

What Are the Recommendations for Built-up Modellers when Using the Data?
Global urban growth and built-up models require population data as an input. However, we have shown that the errors within such data have a geographic dimension. In general, based on these case studies, we observed that HIC cities and cities with good

What Are the Recommendations for Built-Up Modellers When Using the Data?
Global urban growth and built-up models require population data as an input. However, we have shown that the errors within such data have a geographic dimension. In general, based on these case studies, we observed that HIC cities and cities with good census data have, in general, much fewer model errors as compared to LMIC cities. However, for LMIC cities in particular, built-up models are very relevant for understanding the often-unplanned urban developments and for predicting future developments ( Figure 6). Due to the nature of the simple binary dasymetric model for GHS-POP, these data come with caveats for built-up models. GHS-POP tends to underestimate high-density areas and overestimate sparsely populated areas [34]. A similar result has been observed for GHS-POP data in Poland and Portugal [39]. Thus, the urban population density surface has a bias. A built-up model might wrongly predict densification of central areas, where the actual density is already very saturated, while assuming population in the outskirts, where actually no or very few scattered rural developments are found. It might also predict new development in surrounding areas where actually nobody is presently living.

Example of greatly overestimated area (formal residential)
Example of greatly underestimated area (informal settlement) Figure 6. Examples of very greatly over-and underestimated areas in Kabul.

What Are the Recommendations for Population Modellers to Improve Their Models?
Optimally, the inclusion of land-use data would be of great benefit in solving many of the observed problems in the GHS-POP data. This would allow for a reduction in errors in built-up non-residential areas. However, land-use datasets are not (yet) readily available for many parts of the world, and, if available (e.g., https://wri-datalab.earthengine.app/view/urbanlanduse (accessed on 15 June 2022), they might not be easily comparable, may lack validation and might exclude or insufficiently capture informal developments. In principle, global land-use data are under development that allow for the masking of non-residential built-up areas (e.g., industrial areas) as well as new layers that provide an estimation of building heights (e.g., [45,46]).
To deal with the global variation in data availability, a combination of top-down and bottom-up approaches will be essential. However, despite the continuous increase in population modelling, most models are top-down. In particular, the lightly modelled topdown models (e.g., GWP, GHS-POP and GRUMP) are assumed to be more suitable at the global scale because they rely less on ancillary data and, therefore, do not have a strong dependency on input data quality (as compared to heavily modelled approaches, e.g., WorldPop). However, we have shown that even for lightly modelled population data (i.e., GSH-POP), variations in input data led to large error variations.
Bottom-up models are seen as a solution for areas with an absent or infrequent census, as such models can be built using increasingly available global spatial covariates (e.g., based on open Geospatial and Earth Observation data). However, any model assumptions should be made with care. For example, the inclusion of increasingly available building footprints (e.g., the Google Open Buildings) is promising, but large-scale omissions are observed in these datasets, particularly for high-density informal areas. Furthermore, the assumption that night-time lights show the presence of human settlement can also be misleading in areas that are not connected to the formal electricity grid. Thus, the quality of covariates varies across the globe and will determine the generalizability of any popula-

What Are the Recommendations for Population Modellers to Improve Their Models?
Optimally, the inclusion of land-use data would be of great benefit in solving many of the observed problems in the GHS-POP data. This would allow for a reduction in errors in built-up non-residential areas. However, land-use datasets are not (yet) readily available for many parts of the world, and, if available (e.g., https://wri-datalab.earthengine.app/view/ urbanlanduse (accessed on 15 June 2022), they might not be easily comparable, may lack validation and might exclude or insufficiently capture informal developments. In principle, global land-use data are under development that allow for the masking of non-residential built-up areas (e.g., industrial areas) as well as new layers that provide an estimation of building heights (e.g., [45,46]).
To deal with the global variation in data availability, a combination of top-down and bottom-up approaches will be essential. However, despite the continuous increase in population modelling, most models are top-down. In particular, the lightly modelled top-down models (e.g., GWP, GHS-POP and GRUMP) are assumed to be more suitable at the global scale because they rely less on ancillary data and, therefore, do not have a strong dependency on input data quality (as compared to heavily modelled approaches, e.g., WorldPop). However, we have shown that even for lightly modelled population data (i.e., GSH-POP), variations in input data led to large error variations.
Bottom-up models are seen as a solution for areas with an absent or infrequent census, as such models can be built using increasingly available global spatial covariates (e.g., based on open Geospatial and Earth Observation data). However, any model assumptions should be made with care. For example, the inclusion of increasingly available building footprints (e.g., the Google Open Buildings) is promising, but large-scale omissions are observed in these datasets, particularly for high-density informal areas. Furthermore, the assumption that night-time lights show the presence of human settlement can also be misleading in areas that are not connected to the formal electricity grid. Thus, the quality of covariates varies across the globe and will determine the generalizability of any population modelling effort. Presently, the data availability and consistency of covariates are improving. These provide new opportunities for population modelling. However, there is often higher quality data available in the HICs than in LMICs, where data is often unavailable. Further studies could assess the overall model uncertainty in a way that is not constrained by census data (e.g., using micro-censuses). Efforts are needed to understand how errors inherent in different ancillary data influence the modelling process. Furthermore, newly available population data (e.g., GHS-POP July 2022 release [47]) should be compared with other even more fine-grained population data (e.g., the HRS).

Conclusions
Our analysis of the GHS-POP data for several cities representing the major global World Regions shows considerable differences in model fit and estimation errors. In general, the population in HIC and UMIC cities was better estimated (around 35% of the urban grid cells at 1 km were correctly estimated) with a better model fit (R 2 above 0.7) as compared to LMIC cities (around 15% of the urban grid cells at 1 km were correctly estimated). In most LMIC cities, the population in high-density (often informal areas) was not well captured. In addition, across all cities, the model exhibited a tendency to incorrectly allocate population to non-residential built-up areas. Furthermore, for all cities, the population in high-rise built-up areas was not well captured, as no building height information (building volumes) is used in the model. Furthermore, the population estimation of the GHS-POP was limited by the built-up mapping accuracy of the GHSL layer (e.g., large problems were observed in the case of Kabul). Thus, to improve the GHS-POP layer, it would be important to use a basic layer that restricts the allocation to residential areas. Furthermore, to improve the allocation for high-density built-up areas, a layer that provides information on slums/informal areas and basic information on building heights would be of great advantage (several such products have recently been developed).    2,3,5 ,6,7,8,9 Paid/ free for research purpose *1 Land cover/use, *2 built-up, roads, night-time lights, *3 infrastructure, environmental/topographic data, *4 protected areas, *5 waterbodies, *6 cities or urban areas.  2,3,5 ,6,7,8,9 Paid/ free for research purpose *1 Land cover/use, *2 built-up, roads, night-time lights, *3 infrastructure, environmental/topographic data, *4 protected areas, *5 waterbodies, *6 cities or urban areas.