Exploring the Impact of Multi-Source Gridded Population Datasets on Flood-Exposed Population Estimates in Gangnam, Seoul

Bersabe, Julieber T.; Jun, Byong-Woon

doi:10.3390/ijgi14070262

Open AccessArticle

Exploring the Impact of Multi-Source Gridded Population Datasets on Flood-Exposed Population Estimates in Gangnam, Seoul

by

Julieber T. Bersabe

and

Byong-Woon Jun

^*

Department of Geography, Kyungpook National University, Daegu 41566, Republic of Korea

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2025, 14(7), 262; https://doi.org/10.3390/ijgi14070262

Submission received: 14 May 2025 / Revised: 27 June 2025 / Accepted: 3 July 2025 / Published: 4 July 2025

Download

Browse Figures

Versions Notes

Abstract

Accurate demographic data are essential for evaluating flood exposure in urban areas, where heterogeneous environment and localized risks complicate modeling efforts. Gridded population datasets serve as valuable resources for such assessments; however, differences in spatial resolution and methodology can significantly affect flood-exposed population estimates. This study evaluates how various gridded population datasets influence the sensitivity and accuracy of flood exposure estimates in Gangnam District, Seoul. Seven datasets from Statistical Geographic Information Service (SGIS), National Geographic Information Institute (NGII), and Intelligent Dasymetric Mapping (IDM), ranging from 30 m to 1 km in resolution, were evaluated against census data to assess their accuracy and variability in flood exposure estimates. The results indicate that multi-source gridded population datasets with different spatial resolutions and modeling approaches strongly affect both the accuracy and variability of flood-exposed population estimates. IDM 30 m outperformed other datasets, showing the lowest variability (CV = 0.310) and the highest agreement with census data (RMSE = 193.51; R² = 0.9998). Coarser datasets showed greater estimation errors and variability. These findings demonstrate that fine-resolution IDM population dataset yields reliable results for flood exposure estimation in Gangnam, Seoul. They also highlight the need for further comparative evaluations across different hazard and spatial contexts.

Keywords:

gridded population datasets; intelligent dasymetric mapping; flood exposure assessment; population estimation

1. Introduction

Rising flood risk presents a significant challenge for urban areas worldwide, driven by climate variability and rapid urbanization. A fundamental aspect of flood risk assessment is the availability of gridded population data, which provides spatially explicit population estimates at varying spatial resolutions, ranging from tens of meters to several kilometers. However, the suitability of these datasets for analyzing flood exposure, particularly their ability to yield accurate and reliable estimates of at-risk populations, remains insufficiently explored.

Gridded population datasets have emerged, in part, to overcome the limitations associated with aggregated census data, notably the Modifiable Areal Unit Problem (MAUP) [1]. Traditional census data are typically reported within arbitrarily defined administrative units, such as districts or neighborhoods, that vary in size, shape, and internal population heterogeneity [2,3]. This zonal aggregation introduces statistical artifacts and spatial uncertainties when applied to fine-scale environmental or hazard modeling [4,5,6,7,8]. Gridded population datasets aim to mitigate these issues by redistributing population counts onto uniform and continuous grid cells, thereby enhancing integration with environmental data, such as flood hazard maps. However, since most grids are still derived from aggregated census totals, disaggregation techniques may reintroduce spatial biases if not rigorously validated.

In South Korea, two nationally available gridded population datasets—the Statistical Geographic Information Service (SGIS) and the National Geographic Information Institute (NGII)—are frequently employed in urban planning and hazard research. These datasets differ significantly in terms of input data sources, aggregation methodologies, and privacy-masking techniques [9]. Despite their increasing utilization, neither SGIS nor NGII has been systematically validated for flood exposure assessments. Recently, top-down approaches such as Intelligent Dasymetric Mapping (IDM), which incorporates ancillary spatial data to improve population redistribution accuracy for urban applications, have been proposed [10]. However, the comparative effectiveness of these newer approaches in estimating flood-exposed populations remains inadequately investigated.

This study aims to address these gaps by examining the influence of different gridded population datasets on estimating flood-exposed populations in Gangnam District, Seoul. Specifically, it evaluates how various gridded population products affect these estimates and highlights key considerations for selecting appropriate datasets for local hazard assessments.

By focusing on a high-density district recently impacted by severe flooding, this research offers both practical insights for case studies and emphasizes the need for context-specific validation of spatial data in disaster risk management. The subsequent section presents the literature review and research gap. Section 3 describes the data sources and methodology. Section 4 and Section 5 present the results and discussion, respectively, and Section 6 concludes the study.

2. Literature Review and Research Gap

2.1. Population Modeling Techniques for Flood Exposure Assessment

Accurate representation of population distribution is essential for evaluating disaster exposure, particularly in urban flood contexts where inconsistencies between hazard extents and administrative boundaries can result in substantial risk miscalculations [11]. Traditional exposure assessments typically rely on areal interpolation techniques, which distribute population uniformly across intersecting spatial units [2,12]. While these methods are relatively straightforward, they fail to account for variations in population density and land use within administrative units, particularly in densely populated urban areas with mixed land use or vertical residential development.

To address these limitations, dasymetric mapping techniques have been introduced. These approaches redistribute population based on ancillary spatial data, such as land cover or impervious surfaces [13,14,15]. More advanced variants include multi-class dasymetric methods [16] and IDM [10]. These methods integrate multiple ancillary datasets and sampling-based weighting schemes to refine population estimates. Although these techniques offer enhanced spatial accuracy, they remain constrained by the quality of the underlying census data. The aggregation biases introduced by administrative units, termed the MAUP, continue to present significant challenges [1,17].

In flood exposure estimation, integrating population datasets with flood hazard data requires the use of spatial overlay techniques. These techniques include polygon containment, centroid containment, and proportional areal intersection. Polygon containment aggregates entire census units that intersect with hazard zones, but this approach may lead to overestimation of exposure [18,19]. Centroid containment classifies a census unit as exposed if its centroid lies within a flood zone. However, this method can result in misclassifications in areas where flood boundaries closely align with built-up environments [18,19,20,21]. Proportional areal intersection, which allocates populations based on the proportion of census units that overlap with hazard zones, has been advocated as a more precise method for exposure modeling [3,5,7].

Despite these methodological advancements, few studies have systematically examined how variations in population modeling approaches and the corresponding datasets influence flood-exposed population estimates across different spatial scales [5]. This study aims to address this gap by investigating the impact of various gridded population datasets—each developed through distinct modeling strategies—on the estimation of populations exposed to flooding.

2.2. Gridded Population Datasets

Gridded population datasets have become essential tools in flood exposure analysis, providing spatially continuous population surfaces that enable seamless integration with raster-based hazard data [22,23,24]. Prominent global datasets include the Gridded Population of the World (GPW), WorldPop, the LandScan Global Population Database (LandScan), and the Global Human Settlement-Population Grid (GHS-POP). These datasets typically employ top-down disaggregation techniques, which merge census data with remote sensing and ancillary datasets to model population distributions [22,25,26].

While these datasets are valuable for conducting risk assessments at national and regional scales, their relatively coarse resolutions (often exceeding 100 m) and generalized modeling assumptions limit their applicability for fine-scale urban hazard assessments. Accurate population representation at the building or block level is crucial for flood risk analysis in densely populated urban areas [3,27]. In such scenarios, even minor spatial misallocations can lead to significant inaccuracies in exposure estimates.

Emerging disaggregation techniques, such as IDM, aim to enhance the accuracy of gridded datasets by incorporating high-resolution ancillary data, including detailed land use and building footprint information [10,28,29,30]. Nevertheless, even improved top-down methods remain vulnerable to uncertainties associated with the MAUP and necessitate rigorous validation against context-specific observations [5,31].

In contrast, bottom-up methods aggregate population counts directly from address points or building-level data, achieving high spatial accuracy in densely populated urban settings [32,33,34]. These approaches provide a more precise representation of population distributions but often rely on detailed microdata that may not be publicly accessible, thus limiting their application compared to gridded datasets produced through top-down disaggregation.

In South Korea, gridded population datasets from SGIS and NGII are vital resources for research across various disciplines. The SGIS dataset employs building-based centroids and a bounded small cell adjustment (BSCA) method to anonymize low-population cells [9,35,36], while the NGII dataset utilizes address-matched points and a multi-tiered geocoding strategy to construct its grids [37]. Both datasets incorporate privacy-preserving techniques, such as rounding or suppression, which can obscure population data in sensitive or low-density areas.

Despite their extensive utilization, the SGIS and NGII gridded population datasets have not been independently validated for accuracy or sensitivity in flood exposure assessments. Recently developed IDM-based grids, which incorporate ancillary land use data, offer a promising alternative; however, these products have yet to be systematically evaluated in hazard-specific contexts.

Given the limitations and methodological variations among existing gridded population datasets in South Korea, a systematic evaluation of their performance in flood-exposed population estimation is urgently needed. This study aims to address this gap by comparing SGIS, NGII, and IDM-generated datasets within the context of flood exposure assessment in Gangnam District, Seoul.

2.3. Research Gaps and Objectives

Although gridded population datasets are widely used in hazard exposure assessments, few studies have systematically compared how dataset selection and modeling assumptions influence flood-exposed population across different spatial resolutions [3,5,32]. While official gridded datasets such as SGIS and NGII are extensively utilized in South Korea [9,38,39], their effectiveness for fine-scale urban flood risk assessments remains largely unexplored. Emerging top-down methods, including IDM, have demonstrated potential for enhancing the spatial accuracy of population redistribution [10,40]. However, there is limited empirical evidence regarding whether these enhancements result in more accurate flood exposure assessments in densely populated urban areas, compared to bottom-up methods such as gridded population datasets from SGIS and NGII.

This study aims to address these gaps by conducting a comparative analysis of seven gridded population datasets derived from three distinct sources—SGIS, NGII, and IDM—across four spatial resolutions (30 m, 100 m, 500 m, and 1 km). The analysis focuses on two key objectives: (1) evaluating how the choice of gridded population dataset influences flood-exposed population estimates under uniform hazard conditions, and (2) assessing the accuracy of each dataset by comparing aggregated grid estimates with official census data at the administrative neighborhood (dong) level.

Given the absence of ground-truth data for flood-exposed populations, census-based cross-validation is used as a proxy to assess the reliability of the population datasets. Specifically, the study aims to (1) analyze the sensitivity of flood-exposed population estimates to various gridded population datasets with different spatial resolutions; (2) assess the overall accuracy of gridded population estimates in relation to official census counts; and (3) provide practical suggestions on how to select appropriate gridded population datasets for urban flood exposure assessment.

By systematically comparing multiple gridded population datasets and their impact on flood-exposed population estimates, this research underscores the critical role that dataset selection plays in influencing risk assessment outcomes, particularly in high-density urban settings.

3. Data and Methods

3.1. Study Area

The Gangnam District, located south of the Han River in Seoul, South Korea, serves as the study area for this analysis. Recognized as one of the country’s most densely populated and economically dynamic districts, Gangnam exemplifies a highly urbanized environment characterized by complex land use patterns, vertical residential development, and mixed-use zoning. According to the 2020 Population and Housing Census conducted by Statistics Korea, the 22 administrative divisions (dong) within Gangnam vary significantly in both area (ranging from 1.12 to 10.13 km²) and population density, reflecting the spatial heterogeneity typical of metropolitan districts in South Korea (Figure 1).

Residential buildings occupy approximately 9.6% of the total land area within the district. Among these structures, 68.56% are classified as multi-family housing, while 31.44% consist of detached houses. This distribution highlights the diverse urban landscape of Gangnam and provides a basis for analyzing population distribution patterns in relation to land use.

In August 2022, Gangnam experienced severe flooding caused by extreme rainfall, resulting in considerable property damage and heightened public awareness of urban flood risks [41]. The district’s high urban density, mixed land use, and recent exposure to major flood events make it an exemplary case for evaluating the suitability of gridded population datasets in flood exposure assessment.

3.2. Datasets

This study utilizes three types of datasets: flood hazard maps, official census data, and three gridded population products (SGIS, NGII, and IDM-derived datasets). Table 1 presents the characteristics of the gridded population datasets used in the study.

3.2.1. Flood Hazard Map

Although government-issued flood hazard datasets are accessible to the public, many of these maps are not consistently updated to account for alterations in the built environment. Consequently, a modeled flood hazard map that incorporates urban variables, such as drainage systems, was selected for this analysis. The flood hazard data were sourced from Bersabe and Jun [42], who applied a random forest model to estimate flood susceptibility across Seoul at a 30 m resolution. This model demonstrated an accuracy of 83.70%, which is deemed sufficiently reliable for the purposes of this study.

The original model classified flood hazard into five levels: very low, low, moderate, high, and very high. For this study, the hazard classes were reclassified into a binary system, designating areas classified as “high” or “very high” as significant flood hazard zones, while excluding all other categories from the analysis.

3.2.2. Census Data

This study utilizes data from the 2020 Population and Housing Census obtained from the Statistics Korea website (https://kosis.kr/, accessed on 10 November 2024). Since 2015, the Korean census has adopted a register-based methodology, integrating monthly resident registration records with a 20% sample survey to enhance data coverage [9]. In 2020, Gangnam District reported a population of 508,135 residents across 22 administrative dongs, accounting for approximately 5.3% of Seoul’s total population.

Although these census data provide the most reliable reference available, minor spatial uncertainties may exist. These include errors due to boundary changes, address misclassification, and high rates of residential mobility in urban areas [4,8,9]. Nevertheless, the 2020 census serves as the official reference data for validating gridded population datasets in this study.

3.2.3. SGIS Gridded Population Datasets

The SGIS gridded population datasets, produced by Statistics Korea, are based on official household-based census data. These datasets allocate population counts to building centroids using the Unique Feature Identifier (UFID) system [9,36]. While this approach improves spatial representation, it may introduce errors where building footprints extend beyond single grid cells or where centroids are inaccurately positioned [43,44].

To protect privacy, SGIS employs the BSCA method, masking cells with low population counts by either suppressing values or applying random noise adjustments [35]. Although effective for confidentiality, these adjustments may cause underestimation in low-density areas and spatial inconsistencies at finer resolutions. The SGIS gridded population for the year 2020 were obtained from SGIS Plus platform (https://sgis.kostat.go.kr/, accessed on 10 November 2024). For this study, the 100 m, 500 m, and 1 km population grids were selected among SGIS gridded population datasets because these resolutions are finer than the smallest administrative neighborhood in the study area, which has an area of 1.12 km².

3.2.4. NGII Gridded Population Datasets

The NGII population grids rely on resident registration data and apply a three-stage geocoding method: direct address matching, adjusted address conversion, and default address assignment for unmatched cases [37]. Similarly to SGIS, privacy-preserving techniques such as suppression and aggregation are implemented for cells with low population counts [37]. However, differences in geocoding protocols and aggregation strategies between NGII and SGIS may affect their spatial distributions and introduce dataset-specific biases. The NGII gridded population for the year 2020 were obtained from the NGII Geospatial Information Platform. Although NGII provides population grids at 100 m, 250 m, 500 m, and 1 km resolutions, only the 100 m, 500 m, and 1 km datasets were utilized to enable direct comparison with the corresponding SGIS products.

3.2.5. IDM Population Grid

The IDM dataset was generated by applying intelligent dasymetric mapping to the 2020 Level 3 Land Use and Land Cover (LULC) map, with a resolution of 30 m. The original LULC map was obtained from the Environmental Geospatial Information System (https://egis.me.go.kr/, accessed on 23 May 2023) and comprises 41 land cover categories derived from orthophotos and KOMPSAT-3 satellite imagery. For the IDM application, these categories were reclassified into three residential types (low-density, high-density, and mixed residential/commercial), with all other land uses grouped as non-residential (Figure 2).

3.3. Methods

3.3.1. Data Pre-Processing

All datasets underwent systematic preprocessing to ensure spatial compatibility. The 2020 Population and Housing Census data, obtained in tabular format, were merged with the corresponding administrative boundaries sourced from the SGIS Plus platform. For SGIS datasets, population values were linked to grid boundary shapefiles using the UFID. In contrast, NGII datasets already contained population attributes embedded within the shapefile, eliminating the need for additional merging.

The 2020 LULC dataset, originally in vector format at a scale of 1:5000, was rasterized to a 30 m resolution using ArcGIS Pro 3.4. The flood hazard map was already available at a 30 m resolution and thus required no further resampling. All spatial layers were projected to the WGS84 coordinate system to ensure alignment in spatial extent and cell resolution during overlay operations.

3.3.2. Intelligent Dasymetric Mapping

The IDM was used to redistribute census-based population counts into 30 m grid cells, incorporating land use as ancillary information. Modeling procedures were conducted in ArcGIS Pro 3.4. using the United States Environmental Protection Agency (USEPA) IDM Toolbox obtained from the USEPA website (https://www.epa.gov/enviroatlas/dasymetric-toolbox, accessed on 19 December 2024). For the IDM population grid, a 30 m resolution was selected to match the resolution of the flood hazard layer, ensuring consistency during spatial overlay operations. This resolution choice is further supported by previous studies demonstrating that higher-resolution LULC data enhance the accuracy of dasymetric mapping outputs [45].

Following the approach proposed by Mennis and Hultgren [10], IDM assigns a representative density, denoted as

{\hat{D}}_{c}

, to each land use category

c

. If a land use type constitutes at least 70% of three or more source units, its density is estimated through sampling. Otherwise, a refined areal weighting (RAW) approach is applied. The resultant population for the target cell

t

, represented as

\hat{y_{t}}

, can be expressed as

{\hat{y}}_{t} = \sum_{s} \sum_{c = 1}^{k} [y_{s} \times \frac{{\hat{D}}_{c} \times A_{t_{c}}}{\sum_{j = 1}^{k} ({\hat{D}}_{j} \times A_{t_{j}})}]

(1)

where

k

represents the number of land use categories within source unit

s

,

y_{s}

denotes the total population in each source unit

s

,

A_{t_{c}}

indicates the area of cell

t

classified under category

c

, and

\sum_{j = 1}^{k} ({\hat{D}}_{j} \times A_{t_{j}})

serves to normalize the population redistribution. This normalization ensures that the total population of each administrative unit is preserved while allocating population proportionally based on land use categories.

Table 2 summarizes the representative population densities assigned during the IDM process. High-density residential areas met the sampling criteria, while low-density residential and mixed residential/commercial categories were assigned densities through RAW. Non-residential land uses were assigned a default density of zero, excluding them from population allocation.

3.3.3. Estimating Population Exposed to Flood Hazard

Flood-exposed population estimates were derived by overlaying various gridded population datasets (IDM, SGIS, and NGII) at their respective selected resolutions (30 m for IDM; 100 m, 500 m, and 1 km for SGIS and NGII) with designated flood hazard zones. A proportional areal intersection method was employed, wherein the population within each grid cell was multiplied by the fraction of its area intersecting with flood hazard zones. Aggregated estimates were then calculated at the dong level. Although the proportional intersection method offers improved spatial accuracy over centroid-based approaches, it assumes uniform population distribution within each grid cell, which may introduce minor inaccuracies at very fine scales.

3.3.4. Evaluating the Sensitivity of Flood-Exposed Population Estimates

To assess the relative variability of flood-exposed population estimates, two sensitivity metrics were computed: the coefficient of variation (CV) and the median absolute deviation (MAD). The CV was calculated as the ratio of the standard deviation to the mean exposure estimate across dongs, providing a unitless measure of relative dispersion [46,47]. Higher CV values indicate greater relative dispersion of exposure estimates across neighborhoods. The MAD is defined as the median of the absolute deviations of the observations from their median value. This metric provides a robust measure of variability that is less sensitive to extreme values compared to standard deviation [47,48].

Additionally, cumulative distribution functions (CDFs) of exposed populations were constructed at the dong level. These plots illustrate the probability distribution of exposed population estimates less than or equal to a certain value across datasets and highlight potential outliers or dataset-specific clustering effects [5].

3.3.5. Cross-Validating Gridded Population Estimates

To evaluate the accuracy of the gridded population datasets, population estimates aggregated at the dong level were compared with the official 2020 census totals. Aggregation was performed using the Zonal ExactExtract plugin [49] in QGIS 3.38, which supports partial overlaps between grid cells and administrative boundaries.

The following metrics were used to quantify estimation performance: coefficient of determination (R²), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).

The formulas for these metrics are as follows

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {({\hat{p}}_{j} - p_{j})}^{2}}{n}}

(2)

M A E = \frac{\sum_{i = 1}^{n} |{\hat{p}}_{j} - p_{j}|}{n}

(3)

M A P E = \frac{\sum_{i = 1}^{n} |\frac{{\hat{p}}_{j} - p_{j}}{p_{j}}| \times 100}{n}

(4)

where

{\hat{p}}_{j}

represents the estimated population for dong

j

,

p_{j}

indicates the census (actual) population for dong

j

, and

n

denotes the total number of dongs.

Relative estimation error (REE) was also computed for each dong to assess directional bias.

R E E = \frac{{\hat{p}}_{j} - p_{j}}{p_{j}} \times 100

(5)

Following the approach of Bai et al. [27], the REE values were categorized into five groups: greatly underestimated, slightly underestimated, accurately estimated, slightly overestimated, and greatly overestimated (Table 3). The classification thresholds were set based on the observed distribution of errors, with ±5% defined as the acceptable error range.

To statistically evaluate differences between SGIS and NGII population estimates at each spatial resolution, the Wilcoxon signed-rank test was applied. This non-parametric test assesses whether paired samples differ significantly in their medians without assuming that the differences follow a normal distribution [47]. All tests were performed at a significance level of 0.05.

4. Results

4.1. Sensitivity of Flood-Exposed Population Estimates Across Datasets

Flood-exposed population estimates varied considerably across the seven gridded population datasets with their respective spatial resolutions. Using a proportional areal intersection method, exposed population values were estimated for each dataset and aggregated at the dong level. Both the total and spatial distributions of the estimates differed depending on the dataset and resolution.

Figure 3 presents the spatial distribution of flood-exposed population estimates in Gangnam District derived from the seven gridded population datasets at different spatial resolutions. The finer-resolution datasets—IDM 30 m (Figure 3a) and SGIS 100 m (Figure 3b)—reveal more detailed patterns, with exposed populations spread across many small grid cells in different neighborhoods. The NGII 100 m (Figure 3e) shows a distribution pattern relatively similar to that of SGIS 100 m, with only slight differences in exposed population counts across certain areas.

In contrast, the coarser-resolution datasets—SGIS 500 m (Figure 3c), SGIS 1 km (Figure 3d), NGII 500 m (Figure 3f), and NGII 1 km (Figure 3g)—show less spatial detail. In these datasets, exposed populations are grouped into fewer, larger grid cells that tend to cover broader areas. This reduction in spatial resolution produces a more generalized representation of population exposure, making it harder to identify variations within smaller neighborhoods.

The CDFs of dong-level flood-exposed population estimates (Figure 4a) further illustrate the variation across datasets. The IDM 30 m dataset exhibited a steep increase in cumulative probability, reaching approximately 5000 people. This indicates that most flood-exposed population estimates in the IDM dataset were concentrated at lower values. In contrast, the SGIS and NGII datasets at 100 m, 500 m, and 1 km resolutions displayed more gradual increases in cumulative probability, with population estimates extending to higher values, ranging from approximately 10,000 to 25,000 people. Across the SGIS and NGII datasets, the shape of the CDF curves appeared similar at different spatial resolutions, although minor differences were observed in the steepness of the upper distribution tails, particularly for the 500 m and 1 km grids.

Figure 4b presents the total number of individuals estimated to be exposed to flooding across different datasets. Compared to IDM 30 m, which reported the lowest total flood-exposed population estimates (99,743), all other datasets showed substantial overestimation of flood-exposed populations. Specifically, the SGIS datasets exhibited overestimations ranging from 87% (SGIS 1 km) to over 98% (SGIS 100 m and 500 m, with 98.07% and 99.15%, respectively). The NGII datasets demonstrated even greater overestimations relative to IDM 30 m, with the NGII 100 m and 500 m grids reporting increases of 109.70% and 108.49%, respectively, while the NGII 1 km grid showed an overestimation of 81.49%.

Table 4 presents the CV and MAD values for each dataset. IDM 30 m exhibited the lowest CV (0.310) and MAD (2256.00), followed by SGIS 100 m (CV = 0.337, MAD = 3981.50). NGII 100 m recorded a slightly higher CV (0.360) and the highest MAD (4842.50). Coarser datasets at 500 m and 1 km resolutions showed higher CV values, ranging from 0.358 to 0.413. Among these, SGIS 1 km and NGII 1 km recorded the highest CVs and maintained relatively high MAD values, indicating greater relative variability in flood exposure estimates at coarser resolutions.

4.2. Accuracy of Gridded Population Estimates

The accuracy of the seven gridded population datasets was assessed by comparing aggregated dong-level population estimates with the official 2020 census data. Estimates were aggregated using weighted zonal statistics to account for partial overlaps between grid cells and administrative boundaries.

Figure 5 illustrates the comparison between estimated and census-based populations. The IDM 30 m and SGIS 100 m datasets showed strong correlations with the census data, with estimates tightly converging along the best-fit line. In contrast, the NGII datasets, particularly at coarser resolutions, exhibited wider dispersions, indicating larger estimation errors across neighborhoods.

Table 5 summarizes the error metrics (RMSE, MAE, MAPE, and R²) for each dataset. The IDM 30 m dataset demonstrated the best performance, exhibiting the lowest RMSE (193.5) and MAPE (0.70%) values, and an almost perfect R² value (0.9998). SGIS 100 m also showed strong accuracy, with low MAE (625.32) and high R² (0.9964) values. In contrast, the SGIS and NGII datasets at 1 km resolution exhibited high error rates and low explanatory power (R² < 0.4), reflecting the degradation of spatial fidelity at coarser scales.

Wilcoxon signed-rank tests were performed to assess median differences between the SGIS and NGII datasets. At the 100 m resolution, the SGIS dataset significantly outperformed the NGII dataset (p < 0.001), whereas the differences diminished at the 500 m resolution (p = 0.485) and became marginal at 1 km resolution (p = 0.073). These results suggest that SGIS datasets maintain an advantage primarily at finer spatial scales.

Figure 6 shows that the IDM 30 m dataset achieved near-perfect agreement with census population counts across administrative units, with the majority of dongs classified within the ±5% accurate estimation range. SGIS 100 m also showed a high proportion of dongs within the accurate estimation range, although slight overestimations were visible in certain neighborhoods. As resolution coarsened to 500 m and 1 km for both SGIS and NGII datasets, greater deviations from census values were observed, particularly in the form of overestimation across larger areas.

Overall, comparing the spatial distribution of REE across administrative dongs with the population density map in Figure 1b reveals that, most low-population-density dongs tend to be overestimated, with the degree of overestimation increasing at coarser resolutions. In contrast, high-population-density dongs exhibit a mix of overestimation and underestimation, with no consistent pattern. To better understand this variation, we also examined the land use characteristics shown in Figure 2a. In many high-population-density dongs, residential and commercial areas are closely mixed, and significant portions of the land are occupied by open spaces such as urban forests and parks.

Table 6 supports these spatial patterns by quantifying the proportion of dongs falling within each REE classification. The IDM 30 m dataset achieved perfect accuracy, with 100% of dongs falling within the ±5% accurate estimation range. SGIS 100 m followed closely, with approximately 95.82% of dongs accurately estimated. However, at coarser resolutions (500 m and 1 km), both SGIS and NGII datasets showed declines in estimation accuracy, with increasing proportions of dongs falling into slight or great overestimation categories.

5. Discussion

5.1. Key Findings and Implications

This study evaluated the influence of seven gridded population datasets with different population modeling approaches and spatial resolutions on flood-exposed population estimates in a dense urban context. The findings revealed substantial variation in both the count and distribution of estimated flood-exposed populations, depending on the dataset used [5,50].

First, the choice of population dataset and spatial resolution had a significant impact on flood-exposed population estimates. Finer-resolution datasets, such as IDM 30 m and SGIS 100 m, produced population distributions that aligned more closely with census data and revealed smaller variability in flood exposure across administrative neighborhoods. Notably, the IDM 30 m dataset consistently yielded lower total estimates of exposed population while maintaining 100% agreement within a ±5% error margin across all administrative units. This outcome indicates that the IDM dataset offers strong reliability and spatial consistency for local-scale analyses. Its superior performance of IDM can be partly attributed to the incorporation of high-resolution ancillary land use data, which supports more refined population allocation. Such refinement is especially beneficial in heterogeneous urban environments where exposure levels can vary significantly across small spatial units [10,14,40]. These results reinforce earlier findings by Smith et al. [11], who demonstrated that combining high-resolution population and hazard data, leads to more accurate flood-exposed population estimates.

On the other hand, datasets at coarser resolutions—especially 1 km grids—exhibited increased relative variability and a tendency toward overestimation, with greater proportions of administrative units falling into categories of high relative error. Although NGII and SGIS datasets produced similar exposure distributions at intermediate resolutions (e.g., 500 m), their accuracy declined as spatial resolution coarsened. These results align with findings from comparative studies indicating that dataset accuracy varies across scales and locations, potentially leading to divergent conclusions depending on the dataset selected for analysis [5,51].

Second, the sensitivity of flood-exposed population estimates, as captured by the CV and MAD, indicated that finer-resolution datasets generated more consistent estimates across neighborhoods. In contrast, coarser-resolution datasets exhibited higher relative variability, suggesting reduced reliability when applied to localized flood risk assessments. Additionally, the CDFs highlighted differences in the spread and skewness of flood-exposed population estimates across datasets. The IDM 30 m dataset displayed a steep curve, indicating tightly clustered exposure values, whereas the SGIS and NGII datasets showed broader distributions, reflecting greater variability across neighborhoods.

These findings suggest that datasets such as IDM 30 m, which integrate high-resolution ancillary data and disaggregation methods, offer significant advantages for localized flood exposure assessments. These datasets yield more spatially consistent estimates that align closely with census data, making them particularly valuable for intra-urban-scale disaster risk planning and micro-targeted interventions. However, careful selection of gridded population datasets, with attention to both methodological foundations and spatial resolution, remains crucial [52], particularly in hazard analyses where both underestimation and overestimation may have critical consequences [11].

As cities increasingly rely on spatial data to inform climate adaptation and hazard mitigation strategies, the greater scrutiny of the suitability of population datasets is essential to ensure that policy interventions are appropriately targeted and scientifically grounded. To better support disaster risk management, future studies should systematically evaluate the operational suitability of gridded population datasets across a range of hazard contexts and urban environments.

5.2. Limitations of the Study

Despite its contributions, several limitations of this study warrant attention. First, the accuracy of the IDM-derived population datasets is inherently dependent on the quality of ancillary land use data. The 2020 LULC map, which served as the ancillary data, lacks publicly available accuracy documentation [53] and does not include vertical building information. The absence of height or floor area data may result in an underestimation of populations in high-rise urban environments—a recognized limitation in dasymetric modeling approaches [32,54,55]. To improve vertical population representation and enhance modeling accuracy in dense cityscapes, future applications of IDM should integrate three-dimensional building datasets or cadastral floor area ratios. Additionally, the study did not account for actual flood depth values; integrating this variable alongside vertical population distribution could significantly supplement future research endeavors.

Second, while population estimates were cross-validated against the official dong-level census data, there remains a lack of empirical ground-truth data for flood-exposed populations. Although comparative assessments against census totals provide valuable insights into dataset robustness, direct validation of exposed population estimates—such as through post-disaster demographic surveys—would strengthen conclusions regarding the accuracy of flood exposure assessment.

Third, a key limitation stems from the inherent uncertainties within gridded population products. Spatial datasets generated through disaggregation techniques introduce varying degrees of positional, thematic, and temporal uncertainty [23,52], depending on the methods and ancillary data employed. To better characterize these uncertainties, future studies should incorporate multiple gridded population datasets in exposure analyses rather than relying solely on a single dataset [5]. This approach facilitates comparison across datasets and supports the estimation of uncertainty ranges in population exposure outcomes. Li et al. [56] demonstrated this method by combining four global digital elevation models and five gridded population datasets to estimate potential population exposure in low-elevation coastal zones, along with its associated uncertainty.

Despite these limitations, the methodological framework employed in this study provides a reproducible and adaptable approach for evaluating gridded population datasets in urban hazard contexts. By prioritizing both statistical accuracy and spatial variability, this research advances to the development of more context-sensitive strategies for data selection and enhances the transparency and robustness of risk assessments.

6. Conclusions

This study examined the impact of multi-source gridded population datasets with different methodologies and spatial resolutions on flood-exposed population estimates in a dense urban environment. A comparative analysis of seven datasets derived from SGIS, NGII, and IDM revealed that differences in methodological approaches and spatial resolutions significantly influence the accuracy and variability of flood exposure estimates.

Among the datasets, IDM 30 m demonstrated strong internal consistency and alignment with census-based benchmarks. It produced the lowest total estimates of flood-exposed population and achieved the highest agreement with census data (RMSE = 193.51; R² = 0.9998) and lowest variability (CV = 0.310) at the neighborhood level. In contrast, coarser-resolution datasets, particularly 1 km grids, showed reduced accuracy and greater relative variability, often overestimating flood-exposed population estimates. Sensitivity analysis using CV, MAD, and CDFs further confirmed that finer-resolution datasets produced more stable and consistent flood-exposed population estimates, while coarser population grids introduced greater variability across neighborhoods.

These findings highlight the value of high-resolution gridded population datasets developed using high-quality spatial data and robust disaggregation techniques. Although the IDM 30 m dataset exhibited a relatively strong suitability for flood exposure assessment in Gangnam district in Seoul, it is important to interpret its apparent superiority with caution. Furthermore, while spatial resolution is a critical factor, dataset performance also depends significantly on population allocation methods and the incorporation of spatially detailed ancillary data.

However, certain limitations persist. These include the absence of vertical building information in the ancillary dataset and the lack of empirical validation for flood-exposed populations. Addressing these challenges by incorporating three-dimensional building data and post-disaster demographic surveys would strengthen future flood exposure assessments.

Ultimately, careful selection of gridded population datasets is essential in hazard analysis, as both underestimation and overestimation can significantly impact disaster preparedness and planning outcomes. The study demonstrates that IDM 30 m can be instrumental in estimating populations at risk, especially in the context of flood hazards. However, it also emphasizes the need for broader comparative evaluations across diverse spatial and hazard contexts to strengthen data-driven decision-making.

Author Contributions

Conceptualization, Julieber T. Bersabe and Byong-Woon Jun; methodology, Julieber T. Bersabe and Byong-Woon Jun; software, Julieber T. Bersabe; validation, Julieber T. Bersabe and Byong-Woon Jun; formal analysis, Julieber T. Bersabe; investigation, Julieber T. Bersabe and Byong-Woon Jun; resources, Julieber T. Bersabe; data curation, Julieber T. Bersabe; writing—original draft preparation, Julieber T. Bersabe; writing—review and editing, Julieber T. Bersabe and Byong-Woon Jun; visualization, Julieber T. Bersabe; supervision, Byong-Woon Jun; project administration, Byong-Woon Jun; funding acquisition, Byong-Woon Jun. All authors have read and agreed to the published version of the manuscript.

Funding

This research has received partial support from the Global Korea Scholarship (GKS) program through the National Institute for International Education (NIIED).

Data Availability Statement

All original datasets used in this study are publicly accessible and have been fully documented in the manuscript. Derived data, including those that have been cleaned, integrated, or analyzed, are available from the corresponding author upon request for reasonable academic purposes.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Openshaw, S.; Taylor, P.J. The Modifiable Areal Unit Problem. In Quantitative Geography: A British View; Wrigley, N., Bennett, R.J., Eds.; Routledge: London, UK, 1981; pp. 60–69. [Google Scholar]
Fisher, P.F.; Langford, M. Modeling sensitivity to accuracy in classified imagery: A study of areal interpolation by dasymetric mapping. Prof. Geogr. 1996, 48, 299–309. [Google Scholar] [CrossRef]
Mohanty, M.P.; Simonovic, S.P. Understanding dynamics of population flood exposure in Canada with multiple high-resolution population datasets. Sci. Total. Environ. 2021, 759, 143559. [Google Scholar] [CrossRef] [PubMed]
Gregory, L.N.; Ell, P.S. Breaking the boundaries: Geographical approaches to integrating 200 years of the census. J. R. Stat. Soc. A Stat. Soc. 2005, 168, 419–437. [Google Scholar] [CrossRef]
Karagiorgos, K.; Georganos, S.; Fuchs, S.; Nika, G.; Kavallaris, N.; Grahn, T.; Haas, J.; Nyberg, L. Global population datasets overestimate flood exposure in Sweden. Sci. Rep. 2024, 14, 20410. [Google Scholar] [CrossRef] [PubMed]
Lei, Z.; Xie, Y.; Cheng, P.; Yang, H. From auxiliary data to research prospects, a review of gridded population mapping. Trans. GIS 2023, 27, 3–39. [Google Scholar] [CrossRef]
Swanwick, R.H.; Read, Q.D.; Guinn, S.M.; Williamson, M.A.; Hondula, K.L.; Elmore, A.J. Dasymetric population mapping based on US census data and 30-m gridded estimates of impervious surface. Sci. Data 2022, 9, 523. [Google Scholar] [CrossRef]
Wan, H.; Yoon, J.; Srikrishnan, V.; Daniel, B.; Judi, D. Landscape metrics regularly outperform other traditionally-used ancillary datasets in dasymetric mapping of population. Comput. Environ. Urban Syst. 2023, 99, 101899. [Google Scholar] [CrossRef]
Byeon, S.; Kim, K. Estimating actual population of Sejong City using open data and grid system. J. Korean Data Anal. Soc. 2020, 22, 1793–1801. [Google Scholar] [CrossRef]
Mennis, J.; Hultgren, T. Intelligent dasymetric mapping and its application to areal interpolation. Cartogr. Geogr. Inf. Sci. 2006, 33, 179–194. [Google Scholar] [CrossRef]
Smith, A.; Bates, P.D.; Wing, O.; Sampson, C.; Quinn, N.; Neal, J. New estimates of flood exposure in developing countries using high-resolution population data. Nat. Commun. 2019, 10, 1–7. [Google Scholar] [CrossRef]
Lloyd, C.T.; Sorichetta, A.; Tatem, A.J. High resolution global gridded data for use in population studies. Sci. Data 2017, 4, 170001. [Google Scholar] [CrossRef] [PubMed]
Eicher, C.L.; Brewer, C.A. Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartogr. Geogr. Inf. Sci. 2001, 28, 125–138. [Google Scholar] [CrossRef]
Langford, M. An evaluation of small area population estimation techniques using open access ancillary data. Geogr. Anal. 2013, 45, 324–344. [Google Scholar] [CrossRef]
Mennis, J. Generating surface models of population using dasymetric mapping. Prof. Geogr. 2003, 55, 31–42. [Google Scholar] [CrossRef]
Reibel, M.; Agrawal, A. Areal interpolation of population counts using pre-classified land cover data. Popul. Res. Policy Rev. 2007, 266, 619–633. [Google Scholar] [CrossRef]
Fotheringham, A.S.; Wong, D.W.S. The modifiable areal unit problem in multivariate statistical analysis. Environ. Plan. A Econ. Space 1991, 23, 1025–1044. [Google Scholar] [CrossRef]
Jun, B.-W. Effects of areal interpolation methods on environmental equity analysis. J. Korean Assoc. Reg. Geogr. 2008, 14, 736–751. [Google Scholar]
Mennis, J. Using geographic information systems to create and analyze statistical surfaces of population and risk for environmental justice analysis. Soc. Sci. Q. 2002, 83, 281–297. [Google Scholar] [CrossRef]
Chakraborty, J.; Armstrong, M.P. Exploring the use of buffer analysis for the identification of impacted areas in environmental equity assessment. Cartogr. Geogr. Inf. Syst. 1997, 24, 145–157. [Google Scholar] [CrossRef]
Montgomery, M.C.; Chakraborty, J. Social vulnerability to coastal and inland flood hazards: A comparison of GIS-based spatial interpolation methods. Int. J. Appl. Geospat. Res. 2013, 4, 58–79. [Google Scholar] [CrossRef]
Balk, D.L.; Deichmann, U.; Yetman, G.; Pozzi, F.; Hay, S.I.; Nelson, A. Determining global population distribution: Methods, applications and data. Adv. Parasitol. 2006, 62, 119–156. [Google Scholar] [PubMed]
Leyk, S.; Gaughan, A.E.; Adamo, S.B.; De Sherbinin, A.; Balk, D.; Freire, S.; Rose, A.; Stevens, F.R.; Blankespoor, B.; Frye, C.; et al. The spatial allocation of population: A review of large-scale gridded population data products and their fitness for use. Earth Syst. Sci. Data 2019, 11, 1385–1409. [Google Scholar] [CrossRef]
Jayapadma, J.M.M.U.; Souma, K.; Wickramaarachchi, T.N.; Magome, J.; Ishidaira, H. Impact of high-resolution settlement data on flood exposure: A comparative analysis of flood hazard and exposure in the Gin River Basin, Sri Lanka. Geomat. Nat. Hazards Risk 2024, 15, 2435719. [Google Scholar] [CrossRef]
Dobson, J.E.; Bright, E.A.; Coleman, P.R.; Durfee, R.C.; Worley, B.A. LandScan: A global population database for estimating populations at risk. Photogramm. Eng. Remote Sens. 2000, 66, 849–857. [Google Scholar]
Freire, S.; Schiavina, M.; Florczyk, A.J.; MacManus, K.; Pesaresi, M.; Corbane, C.; Borkovska, O.; Mills, J.; Pistolesi, L.; Squires, J.; et al. Enhanced data and methods for improving open and free global population grids: Putting ‘leaving no one behind’ into practice. Int. J. Digit. Earth 2020, 13, 61–77. [Google Scholar] [CrossRef]
Bai, Z.; Wang, J.; Wang, M.; Gao, M.; Sun, J. Accuracy assessment of multi-source gridded population distribution datasets in China. Sustainability 2018, 10, 1363. [Google Scholar] [CrossRef]
de Castro, K.B.; Roig, H.L.; Neumann, M.R.B. Comparison between different methods of zonal interpolation for population estimate: Case study of the federal district urban areas. Rev. Bras. Cartogr. 2019, 71, 207–232. [Google Scholar] [CrossRef]
Maantay, J.; Maroko, A. Mapping urban risk: Flood hazards, race, & environmental justice in New York. Appl. Geogr. 2009, 29, 111–124. [Google Scholar] [CrossRef]
Ortakavak, Z.; Çabuk, S.N.; Cetin, M.; Senyel Kurkcuoglu, M.A.; Cabuk, A. Determination of the nighttime light imagery for urban city population using DMSP-OLS methods in Istanbul. Environ. Monit. Assess. 2020, 192, 790. [Google Scholar] [CrossRef]
Jun, B.-W. An evaluation of a dasymetric surface model for spatial disaggregation of zonal population data. J. Korean Assoc. Reg. Geogr. 2006, 12, 614–630. [Google Scholar]
Calka, B.; Nowak Da Costa, J.; Bielecka, E. Fine scale population density data and its application in risk assessment. Geomat. Nat. Hazards Risk 2017, 8, 1440–1455. [Google Scholar] [CrossRef]
Sapena, M.; Kühnl, M.; Wurm, M.; Patino, J.E.; Duque, J.C.; Taubenböck, H. Empiric recommendations for population disaggregation under different data scenarios. PLoS ONE 2022, 17, e0274504. [Google Scholar] [CrossRef]
Zandbergen, P.A. Dasymetric mapping using high resolution address point datasets. Trans. GIS 2011, 15, 5–27. [Google Scholar] [CrossRef]
Hong, Y.; Park, M. Disclosure Risks and Confidentiality Protection Methods in Frequency Tables Provided by the Statistical Geographic Information Service (SGIS), SRI Data Research Brief Vol. 3; Statistics Korea, Statistics Korea Institute: Daejeon, Republic of Korea, 2021; Available online: https://kostat.go.kr/board.es?mid=a90102010200&bid=12047&tag=&act=view&list_no=389998&ref_bid= (accessed on 1 July 2025).
SGIS Small-Area Statistics User Manual; Statistics Korea, 2025. Available online: https://sgis.kostat.go.kr/view/board/expAndNoticeView?post_no=181 (accessed on 1 July 2025).
User Guide for the National Land Statistics Map on the National Land Information Platform; National Geographic Information Institute, n.d. Available online: https://www.ngii.go.kr/eng/main.do (accessed on 1 July 2025).
Lee, J. Analysis of population depending on spatial unit for setting suitable spatial unit to rural planning. J. Korean Soc. Rural. Plan. 2019, 25, 1–9. [Google Scholar] [CrossRef]
Lee, M.-H.; Choi, W.; Kim, Y.; Oh, J.; Park, J.; Shin, W.; Hong, S.-Y. Effects of spatial units on the detection of vulnerable areas for elderly population: A comparison of the grid system and administrative units. Geogr. J. Korea 2021, 55, 393–403. [Google Scholar] [CrossRef]
Baynes, J.; Neale, A.; Hultgren, T. Improving intelligent dasymetric mapping population density estimates at 30 m resolution for the conterminous United States by excluding uninhabited areas. Earth Syst. Sci. Data 2022, 14, 2833–2849. [Google Scholar] [CrossRef]
Park, S.; Kim, J.; Kang, J. Exploring optimal deep tunnel sewer systems to enhance urban pluvial flood resilience in the Gangnam region, South Korea. J. Environ. Manag. 2024, 357, 120762. [Google Scholar] [CrossRef]
Bersabe, J.T.; Jun, B.-W. The machine learning-based mapping of urban pluvial flood susceptibility in Seoul integrating flood conditioning factors and drainage-related data. ISPRS Int. J. Geo-Inf. 2025, 14, 57. [Google Scholar] [CrossRef]
Boscoe, F.P. The science and art of geocoding: Tips for improving match rates and handling unmatched cases in analysis. In Geocoding Health Data: The Use of Geographic Codes in Cancer Prevention and Control, Research and Practice, 1st ed.; Rushton, G., Armstrong, M.P., Gittler, J., Greene, B.R., Pavlik, C.E., West, M.M., Zimmerman, D.L., Eds.; CRC Press: Boca Raton, FL, USA, 2007; pp. 95–110. [Google Scholar]
Zandbergen, P.A. A comparison of address point, parcel and street geocoding techniques. Comput. Environ. Urban Syst. 2008, 32, 214–232. [Google Scholar] [CrossRef]
Jun, B.-W. Effect of grid cell size on the accuracy of dasymetric population estimation. J. Korean Assoc. Geogr. Inf. Stud. 2016, 19, 127–143. [Google Scholar] [CrossRef]
Aerts, S.; Haesbroeck, G.; Ruwet, C. Multivariate coefficients of variation: Comparison and influence functions. J. Multivar. Anal. 2015, 142, 183–198. [Google Scholar] [CrossRef]
Dodge, Y. The Concise Encyclopedia of Statistics; Springer: New York, NY, USA, 2008; ISBN 978-0-387-31742-7. [Google Scholar]
Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 2013, 49, 764–766. [Google Scholar] [CrossRef]
Charyton, J. QGIS Zonal ExactExtract. Available online: https://github.com/JakubCha/exactextract_qgis (accessed on 24 March 2025).
Tuholske, C.; Gaughan, A.E.; Sorichetta, A.; de Sherbinin, A.; Bucherie, A.; Hultquist, C.; Stevens, F.; Kruczkiewicz, A.; Huyck, C.; Yetman, G. Implications for tracking SDG indicator metrics with gridded population data. Sustainability 2021, 13, 7329. [Google Scholar] [CrossRef]
Thomson, D.R.; Leasure, D.R.; Bird, T.; Tzavidis, N.; Tatem, A.J. How accurate are WorldPop-Global-Unconstrained gridded population data at the cell-level?: A simulation analysis in urban Namibia. PLoS ONE 2022, 17, e0271504. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Xing, Y.; Mondal, S.K.; Liu, T.; Lin, Q. Evaluating the impact of gridded population datasets variability on flood exposure estimates across South Asia. Geomat. Nat. Hazards Risk 2025, 16, 2481996. [Google Scholar] [CrossRef]
Jo, W.; Lim, Y.; Park, K.-H. Deep learning based land cover classification using convolutional neural network—A case study of Korea. J. Korean Geogr. Soc. 2019, 54, 1–16. [Google Scholar]
Lee, S.; Lee, S.W.; Hong, B.Y.; Eom, H.; Shin, H.-S.; Kim, K.-M. Representation of population distribution based on residential building types by using the dasymetric mapping in Seoul. J. Korea Spat. Inf. Soc. 2014, 22, 89–99. [Google Scholar] [CrossRef][Green Version]
Maroko, A.; Maantay, J.; Pérez Machado, R.P.; Barrozo, L.V. Improving population mapping and exposure assessment: Three-dimensional dasymetric disaggregation in New York City and São Paulo, Brazil. Pap. Appl. Geogr. 2019, 5, 45–57. [Google Scholar] [CrossRef]
Li, F.; Yao, C.; Fu, J.; Yang, X. Uncertainty analysis of potential population exposure within the coastal lowlands of mainland China. Environ. Res. Lett. 2023, 18, 124003. [Google Scholar] [CrossRef]

Figure 1. (a) Geographic location of the study area and (b) population density distribution across the 22 administrative dong units based on 2020 census data.

Figure 2. (a) Original 2020 LULC map of Gangnam District and (b) reclassified LULC map distinguishing three residential categories used for IDM population redistribution.

Figure 3. Spatial distribution of flood-exposed population estimates across Gangnam District, Seoul, using different gridded population datasets at various spatial resolutions: (a) IDM at 30 m resolution, (b–d) SGIS at 100 m, 500 m, and 1 km resolutions, and (e–g) NGII at 100 m, 500 m, and 1 km resolutions.

Figure 4. (a) CDFs illustrating variations in flood-exposed population estimates across datasets, and (b) total estimated exposed populations by gridded population dataset and spatial resolution.

Figure 5. Scatterplots comparing gridded population estimates with 2020 census data at the administrative dong level for each dataset and resolution: (a) IDM at 30 m resolution, (b–d) SGIS at 100 m, 500 m, and 1 km resolutions, and (e–g) NGII at 100 m, 500 m, and 1 km resolutions. The red dashed lines represent the best-fit line, with the blue line representing the regression line and shaded areas indicating the confidence intervals.

Figure 6. Spatial distribution of REE across administrative dongs for each gridded population dataset and spatial resolution, categorized into underestimation and overestimation classes: (a) IDM at 30 m resolution, (b–d) SGIS at 100 m, 500 m, and 1 km resolutions, and (e–g) NGII at 100 m, 500 m, and 1 km resolutions.

Table 1. List of gridded population datasets used in the study.

Datasets	IDM	SGIS	NGII
Publication year	2025	2020	2020
Demographic data source	Census data by dong (2020)	Household-based census data (2020)	Resident registration data (2020)
Ancillary data	Land use	None	None
Redistribution approach	Top-down approach	Bottom-up approach	Bottom-up approach
Population allocation method	Intelligent dasymetric method	Building centroids aggregated by grid	Three-stage geocoding and aggregation by grid
Spatial resolution used	30 m	100 m, 500 m, 1 km	100 m, 500 m, 1 km

Table 2. Representative population densities assigned to residential land use categories for IDM, based on either direct sampling or RAW.

Ancillary Class	Frequency	${\bar{x}}_{s}$	$σ_{s}$	Estimation	Density
Low-Density Residential	-	-	-	RAW	69.749
Mixed Residential/Commercial	-	-	-	RAW	44.143
High-Density Residential	14	113.196	53.001	Sampling	96.553
Non-Residential	0	0	0	Preset	0

Note: In this table,

{\bar{x}}_{s}

represents the mean of sampled source units

s

, and

σ_{s}

denotes the standard deviation of these units.

Table 3. Classification thresholds for REE used to assess biases in gridded population estimates.

Value Range	Classification Category
[<−25%]	Greatly underestimated (GUE)
[−25%, −5%]	Slightly underestimated (SUE)
[−5%, 5%]	Accurately estimated (AE)
[5%, 25%]	Slightly overestimated (SOE)
[>25%]	Greatly overestimated (GOE)

Table 4. CV and MAD of flood-exposed population estimates across gridded datasets and spatial resolutions.

Dataset	CV	MAD
IDM 30 m	0.310	2256.00
SGIS 100 m	0.337	3981.50
SGIS 500 m	0.361	3999.50
SGIS 1 km	0.408	3959.00
NGII 100 m	0.360	4842.50
NGII 500 m	0.358	4176.00
NGII 1 km	0.413	4236.50

Table 5. Error metrics (RMSE, MAE, MAPE, and R²) for gridded population estimates compared to the 2020 census data.

Dataset	RMSE	MAE	MAPE (%)	$R^{2}$
IDM 30 m	193.51	152.68	0.70	0.9998
SGIS 100 m	721.56	625.32	2.70	0.9964
SGIS 500 m	5180.97	4401.73	20.14	0.7393
SGIS 1 km	15,484.29	13,264.05	64.42	0.2177
NGII 100 m	1518.74	1417.41	6.27	0.9983
NGII 500 m	3862.20	3006.41	13.26	0.8619
NGII 1 km	11,003.67	7546.23	37.62	0.3159

Table 6. Distribution of REE classes across gridded population datasets and spatial resolutions.

Dataset		GUE	SUE	AE	SOE	GOE
IDM 30 m	Total Population	0	0	504,776	0	0
IDM 30 m	Error Percentage (%)	0.00	0.00	100.00	0.00	0.00
SGIS 100 m	Total Population	0	0	477,124	20,824	0
SGIS 100 m	Error Percentage (%)	0.00	0.00	95.82	4.18	0.00
SGIS 500 m	Total Population	12,941	79,058	32,307	281,947	152,912
SGIS 500 m	Error Percentage (%)	2.31	14.14	5.78	50.42	27.35
SGIS 1 km	Total Population	34,264	55,145	21,785	23,854	536,440
SGIS 1 km	Error Percentage (%)	5.10	8.21	3.24	3.55	79.89
NGII 100 m	Total Population	0	0	134,616	404,702	0
NGII 100 m	Error Percentage (%)	0.00	0.00	24.96	75.04	0.00
NGII 500 m	Total Population	14,200	109,273	109,708	273,419	33,046
NGII 500 m	Error Percentage (%)	2.63	20.25	20.33	50.67	6.12
NGII 1 km	Total Population	45,810	13,210	160,373	130,565	189,694
NGII 1 km	Error Percentage (%)	8.49	2.45	29.72	24.19	35.15

Note: Greatly underestimated (GUE), slightly underestimated (SUE), accurately estimated (AE), slightly overestimated (SOE), and greatly overestimated (GOE).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bersabe, J.T.; Jun, B.-W. Exploring the Impact of Multi-Source Gridded Population Datasets on Flood-Exposed Population Estimates in Gangnam, Seoul. ISPRS Int. J. Geo-Inf. 2025, 14, 262. https://doi.org/10.3390/ijgi14070262

AMA Style

Bersabe JT, Jun B-W. Exploring the Impact of Multi-Source Gridded Population Datasets on Flood-Exposed Population Estimates in Gangnam, Seoul. ISPRS International Journal of Geo-Information. 2025; 14(7):262. https://doi.org/10.3390/ijgi14070262

Chicago/Turabian Style

Bersabe, Julieber T., and Byong-Woon Jun. 2025. "Exploring the Impact of Multi-Source Gridded Population Datasets on Flood-Exposed Population Estimates in Gangnam, Seoul" ISPRS International Journal of Geo-Information 14, no. 7: 262. https://doi.org/10.3390/ijgi14070262

APA Style

Bersabe, J. T., & Jun, B.-W. (2025). Exploring the Impact of Multi-Source Gridded Population Datasets on Flood-Exposed Population Estimates in Gangnam, Seoul. ISPRS International Journal of Geo-Information, 14(7), 262. https://doi.org/10.3390/ijgi14070262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring the Impact of Multi-Source Gridded Population Datasets on Flood-Exposed Population Estimates in Gangnam, Seoul

Abstract

1. Introduction

2. Literature Review and Research Gap

2.1. Population Modeling Techniques for Flood Exposure Assessment

2.2. Gridded Population Datasets

2.3. Research Gaps and Objectives

3. Data and Methods

3.1. Study Area

3.2. Datasets

3.2.1. Flood Hazard Map

3.2.2. Census Data

3.2.3. SGIS Gridded Population Datasets

3.2.4. NGII Gridded Population Datasets

3.2.5. IDM Population Grid

3.3. Methods

3.3.1. Data Pre-Processing

3.3.2. Intelligent Dasymetric Mapping

3.3.3. Estimating Population Exposed to Flood Hazard

3.3.4. Evaluating the Sensitivity of Flood-Exposed Population Estimates

3.3.5. Cross-Validating Gridded Population Estimates

4. Results

4.1. Sensitivity of Flood-Exposed Population Estimates Across Datasets

4.2. Accuracy of Gridded Population Estimates

5. Discussion

5.1. Key Findings and Implications

5.2. Limitations of the Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI