Assessing Policy Sensitivity in Grid-Level Depopulation Projections: A Machine Learning-Based Scenario Analysis for South Korea

Jo, Hyeryeon; Ahn, Miyeon; Kang, Youngeun

doi:10.3390/ijgi15050181

Open AccessArticle

Assessing Policy Sensitivity in Grid-Level Depopulation Projections: A Machine Learning-Based Scenario Analysis for South Korea

by

Hyeryeon Jo

,

Miyeon Ahn

and

Youngeun Kang

^*

Department of Landscape Architecture, Gyeongsang National University, Jinju 52725, Republic of Korea

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(5), 181; https://doi.org/10.3390/ijgi15050181

Submission received: 9 January 2026 / Revised: 4 April 2026 / Accepted: 20 April 2026 / Published: 23 April 2026

(This article belongs to the Special Issue Spatial Data Science and Knowledge Discovery)

Download

Browse Figures

Versions Notes

Abstract

Grid-level population projection is essential for spatial planning under demographic decline, particularly for ensuring that population allocation accounts for grid extinction risk. This study develops a two-stage machine learning framework to predict residential grid transitions across South Korea’s 1 km grid system and assess how spatial policies shape depopulation outcomes through 2050. Stage 1 employs Random Forest classification to predict grid state transitions (macro-averaged F1 score = 0.694), while Stage 2 applies LightGBM regression for population prediction (coefficient of determination = 0.950). The extinction probability map from Stage 1 is incorporated into scenario simulations to adjust population allocation based on predicted residential viability. Feature importance analysis reveals that baseline population, household count, and demographic composition are key determinants of grid-level residential transitions. Five spatial development scenarios simulated through 2050 reveal substantial policy sensitivity. Cumulative extinction rates range from 3.1% under extreme dispersion to 24.5% under extreme concentration, representing a 25 percentage point divergence attributable to spatial allocation policy. Provincial heterogeneity is pronounced, with rural provinces facing extinction rates up to 39.9% while metropolitan areas remain largely unaffected. Comparing scenario outcomes enables pre-identification of policy-sensitive grids (19.5%) where allocation choices determine residential survival. These grids are predominantly located in areas with high forest cover and greater spatial isolation compared to stable grids, but differ in demographic profiles. Aging-Vulnerable grids (14.0%) exhibit high aging ratios with limited economic base, while Moderate-Vulnerability grids (5.5%) show younger demographics with relatively higher economic activity. These differential characteristics provide a spatially explicit basis for differentiated policy responses. Beyond depopulation planning, the spatial outputs of this framework can inform related planning domains such as land use transition planning, carbon management, and infrastructure prioritization under demographic decline.

Keywords:

depopulation; machine learning; grid-level projection; policy scenarios; spatial planning; Random Forest; LightGBM

1. Introduction

Urban shrinkage has emerged as a defining challenge for spatial planning in the twenty-first century, particularly in East Asian societies experiencing rapid demographic transition [1,2]. While early discourse on shrinking cities focused primarily on deindustrialization in Western contexts, recent scholarship has increasingly recognized the distinctive dynamics of population-driven shrinkage in Japan, South Korea, and China, where low fertility, rapid aging, and selective youth outmigration produce spatially uneven patterns of decline [3,4,5]. Unlike the diffuse spatial decline associated with industrial restructuring in Western cities, East Asian urban shrinkage often occurs alongside continued growth in metropolitan cores, generating fragmented patterns of decline across urban and rural peripheries [6]. South Korea exemplifies this phenomenon with exceptional severity. Recording a total fertility rate of 0.72 in 2023 and achieving official designation as a super-aged society in December 2024, the nation has become the fastest-aging country globally [7,8]. Beyond aggregate demographic decline, the spatial distribution of population loss exhibits pronounced polarization. The Seoul Metropolitan Area concentrates 50.2% of the national population on approximately 12% of the total land area, while 89 of 226 municipalities have been designated as depopulation risk areas under the 2021 Special Act on Support for Depopulation Areas [7]. The concept of local extinction, initially introduced through Japan’s influential Masuda Report, has become central to Korean policy discourse, framing population decline not merely as a demographic shift but as an existential threat to peripheral communities [3,7]. A critical limitation constrains current planning responses to this challenge. Municipal-level assessments, while administratively convenient, obscure substantial within-municipality heterogeneity and remain vulnerable to the Modifiable Areal Unit Problem [9]. A municipality classified as demographically stable may contain neighborhoods experiencing rapid depopulation, while declining municipalities may include resilient urban cores, warranting different policy responses. Recent studies have demonstrated the value of grid-based approaches for identifying declining urban spaces at finer spatial resolution, revealing intra-urban heterogeneity that administrative boundaries cannot capture [6]. Grid systems provide temporally consistent boundaries, enabling longitudinal comparison, flexibility for aggregation to various policy-relevant scales, and reduced sensitivity to arbitrary administrative delineations [10,11].

This study develops a two-stage machine learning framework to predict grid-level population dynamics and assess policy sensitivity of depopulation outcomes across South Korea’s 1 km grid system. The analysis aims to address four objectives. First, develop a classification model predicting grid-level residential state transitions and a regression model predicting population magnitude. Second, simulate five spatial development scenarios representing alternative policy futures through 2050. Third, quantify how spatial allocation policies influence cumulative grid extinction rates across provinces. Fourth, identify policy-sensitive grids where spatial development choices most significantly affect depopulation outcomes. By integrating predictive modeling with scenario analysis, this framework provides spatially explicit evidence on where policy choices matter most under conditions of demographic decline.

The paper is structured as follows. Section 2 reviews literature on urban shrinkage, gridded population modeling, and scenario-based projection frameworks. Section 3 describes the study area, data sources, and two-stage machine learning framework, including scenario design. Section 4 presents results including model performance, feature importance, scenario simulations, and policy sensitivity classification. Section 5 discusses methodological and policy implications. Section 6 concludes with key findings and future directions.

2. Literature Review

Urban shrinkage encompasses demographic, economic, and spatial dimensions that produce distinctive planning challenges [1,12]. While early conceptualizations emphasized deindustrialization and economic restructuring in Western cities, contemporary scholarship recognizes multiple pathways to decline, including suburbanization, demographic aging, and selective outmigration of young populations [13,14]. Policy responses have evolved accordingly, from growth-oriented regeneration toward adaptive strategies such as right-sizing, smart decline, and compact development that acknowledge shrinkage as a structural condition requiring managed transition rather than reversal [15,16]. Japanese municipalities have pioneered compact city strategies to consolidate urban services under population decline, concentrating resources in designated centers while managing peripheral depopulation [3]. Korean scholarship has documented similar dynamics, with studies examining industrial restructuring in Daegu [5] and multidimensional policy interventions in municipalities designated as depopulation risk areas [7]. However, systematic assessment of how spatial allocation policies influence depopulation outcomes at fine spatial resolution remains limited. Grid-based spatial units offer methodological advantages for analyzing population dynamics under shrinkage. Unlike administrative boundaries subject to periodic revision, grid systems maintain fixed geometry, enabling longitudinal comparison across census periods [10]. Global initiatives, including WorldPop and GHS-POP, have produced freely available gridded population datasets supporting applications from disaster response to sustainable development monitoring [17,18]. These products typically employ dasymetric mapping or machine learning to disaggregate census counts using ancillary data, including land cover, nighttime lights, and building footprints as allocation weights [19,20]. Recent Korean studies have applied grid-based approaches to identify declining urban spaces, demonstrating that administrative boundaries obscure substantial intra-municipal heterogeneity in depopulation patterns [6]. Such findings underscore the value of fine-resolution analysis for targeting policy interventions.

Machine learning methods have achieved strong predictive performance in population modeling by capturing nonlinear relationships between spatial covariates and demographic outcomes. Random Forest models effectively disaggregate census populations using remotely sensed covariates [19], while gradient boosting approaches have been applied to population change prediction at various scales [21]. For projection applications specifically, studies have demonstrated that sociodemographic characteristics, accessibility, and land use significantly influence small-area forecast accuracy [22]. Wilson et al. [23] reviewed state-of-the-art methods for small-area forecasting, noting that machine learning shows promise for capturing complex covariate relationships but requires careful validation against demographic fundamentals. Hybrid approaches combining machine learning predictions with cohort-component demographic accounting represent an emerging direction [24]. Scenario-based frameworks enable assessment of how alternative policy assumptions influence population outcomes. Shared Socioeconomic Pathways provide standardized narratives with spatial implementations projecting population distribution under sustainability, regional rivalry, and intermediate scenarios [25,26]. At subnational scales, compact versus dispersed development scenarios have been examined in contexts of demographic decline. Studies from Japan demonstrate substantial divergence between concentration and dispersion policies, with spatial allocation choices significantly affecting which areas experience depopulation independent of overall decline trajectories [27]. In the Korean context, modified cohort-component approaches have been applied to a 500 m grid projection, revealing that approximately two-thirds of populated grids may experience decline by 2038 [28]. However, most scenario frameworks rely on theoretical allocation weights rather than empirically derived transition probabilities, limiting their capacity to reflect observed spatial dynamics.

This study addresses these gaps by integrating machine learning-based transition prediction with scenario simulation. The framework employs machine learning to predict grid-level state transitions from observed 2015–2020 patterns, then applies predicted probabilities within a scenario allocation framework to simulate alternative spatial development futures through 2050. Additionally, comparing scenario outcomes enables pre-identification of policy-sensitive grids where allocation choices determine residential survival, with further classification by demographic characteristics to inform differentiated intervention strategies.

3. Materials and Methods

3.1. Study Area and Data

3.1.1. Study Area

The study area encompasses the Republic of Korea, covering approximately 100,210 km² across 17 metropolitan cities and provinces (Figure 1). Following Statistics Korea’s census grid system, a 1 km × 1 km grid comprising 106,906 cells is employed nationwide. Of these, 58,148 grids were classified as residential (population > 10) in 2020. South Korea’s topography is characterized by mountainous terrain covering approximately 70% of the land area (Figure 1a). This topographic structure has historically constrained settlement patterns, concentrating population in coastal plains, river valleys, and western lowlands. Population distribution exhibits pronounced spatial polarization (Figure 1b). The Seoul Metropolitan Area, comprising Seoul, Incheon, and Gyeonggi-do, contains approximately 26 million residents (50.2% of the national population) on less than 12% of the total land area. In contrast, non-metropolitan provinces have experienced sustained population decline since the 1990s, driven by rural-to-urban migration and accelerating natural decrease. Demographic vulnerability extends across diverse rural settings (Figure 1c). Aging ratios exceeding 40% are observed not only in mountainous provinces such as Gangwon-do and Gyeongsangbuk-do, but also in agricultural plains, including Chungcheongnam-do and Jeollanam-do. This pattern reflects nationwide youth outmigration from non-metropolitan areas regardless of topographic characteristics, producing spatially extensive demographic fragility across rural Korea.

3.1.2. Data Sources and Preprocessing

Input data were compiled from multiple administrative and geospatial sources (Table 1). A complete list of 27 explanatory variables is provided in Table A1.

Population and socioeconomic data were obtained from the Statistical Geographic Information Service (SGIS) of Statistics Korea, which provides census-based grid statistics at 1 km resolution annually since 2000. Data from 2015 and 2020 were used to establish a 5-year transition period for identifying grid-level population dynamics. The year 2020 was selected as the baseline year to ensure temporal consistency with environmental datasets, including land cover and road networks. A key assumption of this study is that 2020 environmental conditions remain constant throughout the projection horizon, following common practice in spatial population projection studies to reduce uncertainty from environmental change predictions. Population variables include total population and six age cohorts. Sociodemographic characteristics and land use have been reported as important determinants in small-area population forecasting [22]. Following this literature, household composition, housing characteristics, and business establishment distribution were incorporated as explanatory variables using SGIS grid-level census data. Household variables include total households and single-person households, reflecting residential structure and demographic composition at the grid level. High-resolution population estimation studies have demonstrated the utility of household survey data for capturing settlement patterns not fully reflected in population counts alone [20]. Housing variables include total housing units and apartment units, serving as proxies for residential infrastructure capacity. Economic variables include total business establishments and total employees, capturing local economic activity that may influence residential attractiveness and population retention. Land cover data were obtained from the Environmental Geographic Information System (EGIS) of the Ministry of Environment, providing land cover classification at approximately 100 m resolution based on satellite imagery interpretation. These data were aggregated to 1 km grids to calculate proportions of seven land cover types: urban/built-up, cultivated, forest, grassland, wetland, barren, and water. Forest and cultivated cover serve as proxies for rural character, while urban cover indicates built infrastructure presence. Topographic variables were derived from a 90 m Digital Elevation Model (DEM) obtained from the Ministry of Land, Infrastructure and Transport’s V-World platform. Mean elevation and mean slope were calculated for each 1 km grid. These variables capture physical constraints on settlement. Prior studies have demonstrated significant relationships between topographic characteristics and population distribution, with steeper slopes and higher elevations associated with lower population density and higher depopulation risk [33,34]. Accessibility was measured as Euclidean distance from grid centroids to the nearest road, derived from the 2020 road network shapefile obtained from the Korea Transport Database (KTDB). Road accessibility serves as a proxy for connectivity to services and employment opportunities, factors known to influence residential location decisions and rural population retention [35].

All spatial visualizations in this study were projected in the Korea Central Belt coordinate system (EPSG:5186). Population and socioeconomic grid data were obtained from SGIS (Statistics Korea) [29], and satellite imagery from V-World basemap via QGIS (version 3.40.10). Municipal-level units are based on 250 district-level SGIS administrative codes as of 2020.

3.1.3. Target Variable Definition

The target variable for Stage 1 classification represents grid-level residential status transitions between 2015 and 2020. A grid is classified as “residential” if its population exceeds zero, capturing any population presence. This inclusive definition ensures that all grids with recorded population, including those with minimal residents, are considered for subsequent population projection in Stage 2. Grid state transitions are classified into four mutually exclusive categories (Table 2). The distribution exhibits substantial class imbalance, with two dominant classes (Persistence and Non-residential) accounting for 96.1% of observations, while policy-relevant minority classes (Extinction and Emergence) together comprise only 3.8%.

3.2. Two-Stage Machine Learning and Scenario Projection Framework

3.2.1. Framework Rationale

A two-stage prediction framework is employed, separating the classification of grid residential states from the regression of population quantities (Figure 2). This design reflects the conceptually distinct nature of the two prediction tasks. Stage 1 addresses a categorical question, namely, whether a grid will experience complete depopulation. Stage 2 addresses a continuous question, namely, what population a residential grid will contain. Two-stage approaches have been successfully applied in related spatial prediction contexts, including land use change modeling [36] and species distribution modeling [37]. The separation prevents conflation of presence/absence patterns with intensity patterns, which can degrade predictive performance when modeled jointly [38]. In this context, factors influencing whether a grid becomes completely depopulated (e.g., extreme terrain, remoteness) may differ from factors influencing population density conditional on residential status (e.g., housing stock, urban services). A key methodological consideration involves threshold definitions. Stage 1 classification uses complete depopulation (pop = 0) as the extinction criterion, ensuring all grids with any recorded population receive allocation in Stage 2. Scenario evaluation subsequently applies a functional threshold (pop > 10), recognizing that grids with minimal population (1–10 persons per km²) lack the capacity to sustain basic residential infrastructure. This two-threshold approach prevents premature exclusion of marginal grids from population projection, enabling accurate estimation across the full population range, including the 1–10 person interval that determines functional extinction outcomes. Sensitivity analysis with alternative threshold values (5, 10, and 20 persons) confirmed that the 10-person threshold best discriminates policy-induced extinction from baseline population dynamics (Table A2). This threshold is more conservative than the zero-population criterion adopted in prior grid-level studies [27], where grids were classified as extinct only when the population reached zero.

3.2.2. Stage 1: Residential State Classification

For the four-class classification task, three tree-based ensemble algorithms demonstrating strong performance in spatial prediction applications were evaluated: Random Forest [39], XGBoost [40], and LightGBM [41]. Tree-based ensembles are well-suited for spatial prediction tasks due to their ability to capture nonlinear relationships and variable interactions without requiring explicit specification, their robustness to outliers and mixed variable types, and their provision of feature importance measures supporting interpretation [42]. Random Forest constructs an ensemble of decision trees via bootstrap aggregation (bagging), with each tree trained on a bootstrap sample and considering a random subset of features at each split. This randomization reduces overfitting and improves generalization [39]. XGBoost and LightGBM are gradient boosting frameworks that build trees sequentially, with each tree fitted to the residuals of the ensemble [43]. XGBoost incorporates L1 and L2 regularization to prevent overfitting [40], while LightGBM employs histogram-based splitting and leaf-wise tree growth for computational efficiency [41]. The severe class imbalance (Extinction: 1.9%, Emergence: 1.9%) poses a challenge for classification algorithms, which may favor majority classes to minimize overall error at the expense of minority class detection [44]. Poor detection of Extinction and Emergence cases would undermine the policy relevance of predictions, as these transitions are precisely the outcomes of planning interest. Class imbalance is addressed using the Synthetic Minority Over-sampling Technique (SMOTE), which generates synthetic examples of minority classes by interpolating existing minority samples in feature space [45]. SMOTE has been shown to improve minority class recall without the information loss associated with majority class undersampling. SMOTE is applied to the training data only, generating synthetic samples until minority classes match majority class sizes, while preserving the original class distribution in validation folds.

Model performance was assessed using 5-fold stratified cross-validation, which preserves class proportions across folds. The primary evaluation metric is macro-averaged F1 score (F1-macro), computed as the unweighted mean of class-specific F1 scores. F1-macro equally weighs all classes regardless of prevalence, making it appropriate for imbalanced classification where minority class performance is important [46]. Overall accuracy is reported as a secondary metric, but it provides limited diagnostic value given that a classifier predicting only majority classes would achieve 96.1% accuracy while providing no useful predictions for Extinction or Emergence. Based on preliminary experiments and established guidelines, Random Forest was configured with 200 trees (n_estimators), unlimited depth (max_depth = None), minimum samples per split of 2, and balanced class weights. XGBoost and LightGBM were configured with 200 boosting rounds, a learning rate of 0.1, and default regularization parameters.

3.2.3. Stage 2: Population Density Regression

For grids classified as residential (Persistence or Emergence) in Stage 1, population is predicted using a regression model. The target variable is log-transformed population, log(pop + 1), where the addition of 1 prevents undefined values for zero-population grids that may occur during projection iterations. Log transformation addresses the right-skewed distribution of grid populations and stabilizes variance across the prediction range [47]. Three gradient boosting implementations were evaluated: LightGBM [41], XGBoost [40], and scikit-learn’s Gradient Boosting Regressor [48]. Performance was assessed via 5-fold cross-validation using the coefficient of determination (R²) as the primary metric, with root mean squared error (RMSE) reported for interpretability. LightGBM was configured with 200 boosting rounds, a learning rate of 0.1, 31 leaves per tree, and unlimited depth. The same 27 explanatory variables used in Stage 1 serve as predictors for Stage 2. Predictions are back-transformed via exponentiation, exp(ŷ) − 1, to obtain population estimates on the original scale. Following cross-validation for model selection, final models for both stages were trained on the complete dataset for use in scenario projection. The functional threshold for residential viability is determined through sensitivity analysis (Table A3) and applied in the scenario evaluation.

3.2.4. SHAP-Based Interpretation

To interpret model predictions and identify key drivers of grid-level population dynamics, SHAP (SHapley Additive exPlanations) analysis is employed [49]. SHAP values quantify each feature’s marginal contribution to individual predictions based on cooperative game theory, providing consistent and locally accurate feature attributions. SHAP values are computed for both Stage 1 (classification) and Stage 2 (regression) models to identify variables most influential for extinction risk and population magnitude, respectively. This interpretability analysis enables identification of policy-relevant predictors and supports understanding of the mechanisms driving grid-level population dynamics. All analyses were conducted in Python 3.10.16 using scikit-learn 1.6.1 for model evaluation, XGBoost 2.1.4 for Stage 1 classification, LightGBM 4.6.0 for Stage 2 regression, imbalanced-learn 0.14.1 for SMOTE oversampling, and SHAP 0.49.1 for model interpretation. Spatial data processing used GeoPandas 1.0.1.

3.2.5. Scenario-Based Projection

To assess how alternative spatial policies influence grid-level depopulation outcomes, ML predictions are integrated with a scenario simulation framework adapted from Hori et al. [27]. Their methodology demonstrated that spatial policy choices produce substantially different depopulation outcomes even when the total national population follows the same trajectory. For each projection interval, regional population totals [50] are allocated to grids based on computed weights as shown in Equation (1):

W_{i} = P_{i}^{δ} \times (1 - {\hat{p}}_{e x t, i}) \times M_{i},

(1)

where

P_{i}

is current grid population,

δ

is the amplification parameter governing concentration intensity,

{\hat{p}}_{e x t, i}

is the extinction probability from Stage 1 Random Forest (the predicted probability of transition to non-residential status, Figure A1), and

M_{i}

is a binary emergence mask indicating settlement eligibility based on 2020 infrastructure presence. The emergence mask reflects the physical and infrastructural constraints on new residential development identified through grid-type characterization (Section 4.3). Non-residential grids occupy challenging terrain with the highest mean elevation (367 m), steepest slopes (19.1°), and greatest distance from roads (1750 m), conditions that effectively preclude residential development regardless of policy intervention. Accordingly, the emergence mask permits residential transition only for grids demonstrating pre-existing settlement potential, defined as satisfying at least one of the following 2020 baseline conditions: population greater than zero, housing units present, households present, or urban land cover proportion greater than zero. This mask identifies 78,020 grids (73.0% of total) as eligible for emergence, excluding grids dominated by forest, water, or extreme topography where new residential formation is physically infeasible. Grids with higher extinction probability receive reduced population allocation, reflecting their lower expected residential viability.

Under Compact scenarios (γ > 0), a proportion γ of the total regional population is allocated preferentially to the top-ranked grids by population before the remaining (1 − γ) is allocated proportionally, representing policies that concentrate population into higher-density areas. Dispersed scenarios apply the inverse logic, allocating preferentially to bottom-ranked grids, representing policies that support population retention in lower-density areas. Across scenarios, γ ranges from 0.2 to 0.4, determining whether the top or bottom 20% or 40% of grids receive preferential allocation, while δ ranges from 1.0 to 2.0, controlling the degree of weight amplification (Table 3).

The BAU scenario applies ML predictions iteratively in 5-year steps from 2020 to 2050, representing continuation of observed transition patterns without preferential allocation (γ = 0, δ = 1). Each iteration applies Stage 1 classification to predict transition probabilities, computes allocation weights, distributes regional population proportionally to weights, and reclassifies grids based on the functional threshold (pop > 10). In the Compact and Dispersed scenarios, an established allocation method is applied as a single-period transformation from 2020 to 2050, enabling direct comparison with prior studies [27]. This methodological distinction reflects different conceptual purposes. BAU projects natural demographic trends while policy scenarios assess intervention effects.

Grid-level features (demographic composition, household, housing, economic variables) are held constant at 2020 baseline values throughout the projection horizon. Only population totals are redistributed via scenario-specific weights. Although grid-level features remain fixed at 2020 values, the allocation weights from Stage 1 are derived from a model trained on grid-level age structure, household composition, and land use characteristics, implicitly capturing the statistical relationships between these features and residential transitions. This design choice preserves consistent conditions for relative policy comparison while acknowledging that dynamic feature updating would require integration with cohort-based demographic models. The iterative BAU process captures path-dependent dynamics, as grids losing population in early periods have reduced weights in subsequent periods, potentially triggering decline cascades.

3.3. Policy Sensitivity Classification

To support policy interpretation, grid-level policy sensitivity is assessed through scenario comparison and demographic characterization. First, the four transition types from Stage 1 classification (Persistence, Extinction, Emergence, Non-residential) are characterized by their spatial-demographic attributes including population scale, demographic composition, physical environment, and accessibility. Results are reported descriptively rather than through inferential tests, as the large sample size would render even trivial differences statistically significant [51]. Second, a four-type classification compares scenario outcomes to identify policy-sensitive grids. Grids are classified by cross-tabulating trend continuation (BAU) residential status by 2050 with extreme concentration residential status: Grids surviving under both are classified as Stable, while grids surviving under BAU but extinct under concentration are policy-sensitive. Grids extinct under BAU are Already-Extinct. Policy-sensitive grids are further subdivided by aging ratio (40% threshold) into Moderate-Vulnerability (aging < 40%) and Aging-Vulnerable (aging ≥ 40%). This classification identifies where allocation choices determine residential outcomes and whether baseline demographic structure permits natural recovery.

4. Results

4.1. Model Performance

Table 4 presents cross-validation results for candidate algorithms across both stages. For Stage 1 classification, Random Forest achieved the highest F1-macro score (0.694), outperforming XGBoost (0.649) and LightGBM (0.648). This performance gap reflects a fundamental trade-off in imbalanced classification, where gradient boosting algorithms favor majority class prediction, while Random Forest with balanced class weights better detects minority classes. Temporal stability of the Stage 1 Random Forest model was further assessed through backtesting with 2010 SGIS grid data applied to 2010–2015 transitions, which yielded comparable F1-macro (0.696); detailed results are provided in Table A2. For Stage 2 regression, all algorithms achieved high performance (R² > 0.948). LightGBM was selected for its marginally higher R² (0.950) and faster training speed, important for iterative scenario simulations.

Table 5 presents class-specific metrics for the selected Random Forest model. Persistence and Non-residential classes achieved near-perfect performance (F1 = 0.979 and 0.968), reflecting the stability of established residential areas and uninhabited zones. Minority classes showed moderate but meaningful performance. Extinction achieved F1 = 0.441 with recall of 0.519, indicating the model identifies approximately half of the grids experiencing actual extinction. Although modest, these values substantially exceed random baseline expectations (1.9% for each class), demonstrating the model’s utility as a risk screening tool.

The relatively lower precision for minority classes (0.383 for Extinction, 0.369 for Emergence) indicates a tendency toward false positives, as shown in the full confusion matrix (Table A4). From a planning perspective, this conservative behavior is preferable to missed detections, as flagging stable grids as potentially vulnerable incurs a lower cost than failing to identify actual extinction cases. Model performance varied across provinces (Table A5), with higher extinction counts associated with better detection performance.

4.2. Feature Importance Analysis

SHAP analysis quantified variable contributions to model predictions across both stages (Table 6, Figure 3). Complete importance values for all 27 variables are provided in Table A6. Population and household variables dominated both prediction stages. Baseline population ranked first for extinction prediction (|SHAP| = 0.130) and population magnitude (|SHAP| = 0.703), followed by household count in both stages. Housing stock ranked third for Stage 1 but dropped to tenth for Stage 2, indicating stronger relevance for residential persistence than population size. Age cohort importance differed substantially between stages. For extinction prediction, the 45–64 cohort showed the highest importance among age groups (rank 4, |SHAP| = 0.023), while younger working-age (25–44) and youth (15–24) cohorts ranked considerably lower (ranks 9 and 12). For population magnitude, child population (0–14) emerged as the dominant age predictor (rank 4, |SHAP| = 0.087), followed by 45–64 (rank 3) and 25–44 (rank 6). Economic variables exhibited pronounced stage-dependent patterns. Employee counts ranked low for extinction prediction (rank 19) but substantially higher for population magnitude (rank 7, |SHAP| = 0.041). Business establishment counts remained low across both stages (ranks 16 and 19). Land cover variables showed stronger effects on population magnitude than on extinction prediction. Urban land cover ranked 18th for Stage 1 but 5th for Stage 2 (|SHAP| = 0.070), while barren land ranked 11th and 8th, respectively. Topographic variables and road accessibility showed consistently low importance across both stages. Direction analysis revealed distinct patterns between stages. In Stage 1, higher values of nearly all variables reduced extinction probability, reflecting the general protective effect of demographic and physical infrastructure presence. In Stage 2, directions diverged by variable types. Population and urban land cover showed positive associations with population magnitude, while household count and working-age population showed negative associations.

4.3. Grid-Type Characterization

Understanding characteristics that distinguish grids experiencing different transitions is essential for policy targeting. Table 7 presents descriptive statistics for the four transition types across spatial-demographic dimensions. Persistence grids (52.2%) exhibit substantially higher mean population (912 persons) and occupy favorable terrain, with the lowest elevation (130 m), gentlest slopes (8.9°), highest urban cover (13.2%), and closest road proximity (646 m). This reflects historical settlement concentration in accessible, developable locations. Extinction grids present a distinctive profile: marginal population (mean 29 persons) combined with challenging topography (231 m elevation, 14.7° slope). Notably, the aging ratio (41.5%) is only marginally higher than Persistence grids, suggesting population size rather than age structure primarily distinguishes extinction risk. Although Emergence and Extinction grids occupy comparable topographic settings in terms of elevation (223 m and 231 m, respectively) and slope (13.6° and 14.7°), they diverge in land cover characteristics. Extinction grids are characterized by higher forest cover (69.7%) and lower urban land cover (4.1%), while Emergence grids exhibit lower forest cover (64.1%) and higher urban land cover (5.8%), suggesting that spatial isolation distinguishes the two transition types. The mean population of Emergence grids remains low (7 persons). Non-residential grids occupy the most challenging terrain: the highest group mean elevation (367 m), the steepest slopes (19.1°), and the furthest from roads (1750 m), effectively precluding residential development. The spatial distribution of transition types reveals distinct regional patterns (Figure 4). Extinction grids concentrate in mountainous provinces, particularly Gyeongsangbuk-do (343 grids) and Gangwon-do (308 grids), which together account for 36% of all extinction cases.

4.4. Scenario Simulation Results

The scenario simulation operates under two key constraints. First, the total population for each province follows Statistics Korea’s official population projections through 2050 [50], fixed across all scenarios. Second, scenarios differ only in how provincial populations are distributed across grids. Compact scenarios concentrate population in fewer, higher-density grids, while BAU maintains proportional distribution. Consequently, scenario differences reflect within-province spatial restructuring rather than inter-provincial migration.

4.4.1. Cumulative Outcomes

Table 8 presents cumulative scenario outcomes by 2050. Substantial divergence emerges across policy scenarios despite identical provincial population trajectories. Under trend continuation (BAU), emergence events (3341 grids) approximately offset extinctions (3022 grids), producing a net increase of 319 residential grids by 2050. This balance reflects continuation of the dynamic equilibrium observed in 2015–2020 data, where ongoing residential frontier transitions maintain overall grid counts despite population decline. Concentration scenarios produce substantial grid loss, with moderate concentration reducing residential grids to 51,097 (12.1% extinction rate) and extreme concentration to 43,906 (24.5% extinction rate). Dispersion scenarios produce intermediate outcomes with 4.4% and 3.1% extinction rates for moderate and extreme variants, respectively, substantially lower than concentration but higher than trend continuation.

4.4.2. Provincial Variation

Scenario effects exhibit pronounced provincial heterogeneity reflecting differences in pre-existing settlement structure (Table 9, Figure 5). Under extreme concentration, rural provinces with dispersed settlement patterns face the highest extinction rates, including Jeonnam-do (39.9%), Gangwon-do (34.3%), and Gyeongsangbuk-do (32.1%). Metropolitan areas remain largely unaffected due to pre-existing population concentration, with Seoul (1.6%), Sejong (4.4%), and Gyeonggi-do (6.6%) showing minimal sensitivity even under extreme concentration. The policy sensitivity gap between Jeonnam-do and Seoul reaches 38 percentage points, indicating that a uniform national concentration policy would produce highly uneven territorial impacts.

Trend continuation produces fundamentally different dynamics. Nationally, 3022 grids experienced extinction while 3341 emerged as newly residential, yielding a net increase of 319 grids (+0.5%). Most provinces show positive net change as emergence offsets or exceeds extinction, with Sejong (+4.1%), Incheon (+2.8%), and Ulsan (+2.1%) exhibiting the highest net gains. Gyeonggi-do represents a notable exception with a net loss of 1.3%, suggesting suburban fringe contraction as population consolidates toward established centers within the capital region. This pattern indicates that trend continuation preserves the dynamic equilibrium between residential formation and dissolution observed in recent data. Dispersion scenarios produce intermediate outcomes between trend continuation and concentration. Extinction rates range from 0.0% (Sejong) to 7.0% (Gangwon) under moderate dispersion, and from 0.0% (Sejong) to 6.3% (Gangwon) under extreme dispersion. Rural provinces consistently show higher extinction rates than metropolitan areas across all policy scenarios, though the magnitude is substantially lower than under concentration. Concentration and dispersion scenarios produced no emergence events, as the allocation methodology focuses on redistribution within existing residential networks rather than new settlement formation. This design choice reflects the policy focus on managing population distribution among established areas under demographic decline.

4.5. Policy Sensitivity Analysis

4.5.1. Limitations of Short-Term Prediction

The Stage 1 classification model, trained on 2015–2020 transition patterns, achieves reasonable predictive performance for identifying grids at risk of complete depopulation. However, a 5-year training window may not fully capture long-term demographic trajectories embedded in age structure. A grid currently maintaining population above the residential threshold may nonetheless exhibit demographic characteristics indicating future decline potential not reflected in short-term transitions. To assess this concern, grids classified as Persistence were examined for latent demographic fragility. Three vulnerability indicators were defined based on established demographic thresholds: high aging ratio (≥40%), low potential support ratio (working-age population per elderly person < 2.0), and spatial isolation (distance to nearest road > 1500 m). Among 57,670 Persistence-classified grids, 42,509 (73.7%) exhibited at least one vulnerability characteristic, indicating that the majority of grids predicted to remain residential show signs of demographic fragility not captured by the ML classification. This finding motivates a complementary vulnerability classification that combines ML predictions with demographic structure assessment, enabling identification of policy-sensitive grids where targeted intervention may prevent future transition to extinction.

4.5.2. Four-Type Typology of Policy-Sensitive Grids

A four-type sensitivity classification framework was developed to characterize grid vulnerability based on scenario outcomes and demographic structure (Table 10). The classification cross-tabulates trend continuation (BAU) residential status by 2050 with extreme concentration residential status, further subdividing policy-sensitive grids by aging ratio threshold (40%). Stable grids (75.3%) maintain residential status under both trend continuation and extreme concentration, representing areas with sufficient population mass and favorable demographic structure to withstand concentration pressure. Moderate-Vulnerability grids (5.5%) transition to extinction only under extreme concentration, with aging ratios below 40%, suggesting intervention potential through economic development or service provision. Aging-Vulnerable grids (14.0%) share policy sensitivity but exhibit aging ratios at or above 40%, indicating limited natural recovery capacity even if extinction is avoided through policy intervention. These differential patterns underscore that policy-sensitive grids require assessment of both scenario outcomes and baseline characteristics. Moderate-Vulnerability grids warrant attention for their heightened responsiveness to policy intensity, while Aging-Vulnerable grids require recognition of structural constraints on recovery regardless of policy choice.

4.5.3. Characteristics and Spatial Distribution of Policy-Sensitive Grids

Table 11 presents extinction rates by vulnerability type across scenarios, quantifying differential policy sensitivity. By definition, Moderate-Vulnerability and Aging-Vulnerable grids exhibit 100% extinction under extreme concentration, as this outcome defines their classification. The substantive finding concerns their differential response to moderate concentration. Moderate-Vulnerability grids show higher extinction rates (46.5%) than Aging-Vulnerable grids (34.6%). This result highlights the dominant role of population scale in determining grid survival. Grids with younger demographics but lower population mass are more susceptible to moderate concentration pressure, while aging grids may have slightly higher baseline populations, providing a buffer against extinction. However, because the framework does not dynamically update age structure, Aging-Vulnerable grids may appear more resilient than they would under conditions of accelerated population decline through mortality. This motivated the complementary vulnerability classification, which flags Aging-Vulnerable grids as requiring particular attention despite their lower extinction rates in scenario simulations.

Policy-sensitive grids share broadly similar physical environments—high forest cover and greater road distance compared to Stable grids—but differ in economic and demographic profiles (Table 12). Moderate-Vulnerability grids retain higher employee counts and business establishments, while Aging-Vulnerable grids exhibit minimal economic presence combined with the highest aging ratio.

The spatial distribution of vulnerability types is illustrated at the grid level in Figure 6a. Aging-Vulnerable grids (14.0%) concentrate in non-metropolitan rural areas across diverse topographic settings, including mountainous interior regions (Gangwon-do, Gyeongsangbuk-do), agricultural plains (Chungcheongnam-do, Jeollanam-do), and peripheral areas distant from provincial urban centers. Moderate-Vulnerability grids (5.5%) show broader geographic distribution, including peri-urban areas surrounding metropolitan regions. Already-Extinct grids (5.2%) cluster in the same rural provinces experiencing high grid turnover—Gyeongsangbuk-do (550 grids), Gangwon-do (484), and Jeollanam-do (382)—indicating that these regions face depopulation pressure independent of policy intervention. Municipal-level aggregation (Figure 6b–d) reveals that the proportion of policy-sensitive grids varies substantially across municipalities. Notably, the spatial patterns of Aging-Vulnerable grids (Figure 6c) and Moderate-Vulnerability grids (Figure 6d) differ, suggesting that municipalities face distinct types of demographic challenges.

5. Discussion

5.1. Methodological Implications

The two-stage machine learning framework demonstrates that grid-level demographic transitions can be predicted from spatial-demographic covariates, enabling pre-identification of extinction-prone areas prior to policy implementation. Predicting grid extinction involves inherent uncertainties, particularly in rural contexts where decline is driven by multiple interacting factors, including economic opportunities, accessibility, demographic structure, and policy interventions that vary across local contexts [52,53], requiring caution in both prediction and interpretation. Stage 1 classification identified approximately half of actual extinction cases (recall = 0.52)—a moderate but meaningful performance that substantially exceeds random baseline expectations. Rather than viewing this as a limitation, the extinction probability map serves a critical screening function: by incorporating predicted extinction risk into population allocation weights, the framework prevents assignment of population to grids with low residential viability. This approach extends prior grid-level methods that relied on cohort-component accounting [24,28] by capturing nonlinear covariate relationships without explicit demographic modeling. Variable importance patterns reveal substantive insights for policy design. The dominance of baseline population across both stages confirms that demographic momentum embedded in existing settlement structure strongly constrains future trajectories, consistent with prior small-area projection studies [22,23]. The differential importance of age cohorts is particularly notable: the 45–64 cohort showed the highest importance for extinction prevention, likely reflecting residential stability accumulated through housing tenure and local social ties, while the child population (0–14) dominated population magnitude prediction, indicating that young families signal growth potential rather than settlement persistence. Economic variables exhibited asymmetric effects: employee counts ranked low for extinction prediction (rank 19) but high for population magnitude (rank 7). This suggests that employment opportunities attract additional population to grids that are already demographically stable, but do not prevent extinction in grids with marginal population and unfavorable demographic structure. These findings imply that whatever policy approach is adopted—whether population retention, managed relocation, or gradual transition—intervention design must account for each area’s existing demographic composition and socioeconomic conditions [54]. Scenario analysis revealed substantial policy sensitivity. The approximately 25 percentage point divergence between extreme concentration and trend continuation aligns with findings from Japan, where compact scenarios increased zero-population grids by 28% relative to baseline [27], suggesting comparable policy sensitivity across East Asian contexts. The emergence–extinction balance under trend continuation—where emergence (3341 grids) approximately offsets extinction (3022 grids)—reflects South Korea’s structural transition from growth to decline. The current pattern indicates that population redistribution now occurs within existing settlement networks rather than expanding the residential frontier, characteristic of post-growth demographic regimes documented across East Asia [1,3].

5.2. Policy Implications

The scenario results should not be interpreted as evidence that compact development is inherently inferior to dispersed alternatives. Compact policies may enable strategic infrastructure consolidation and service efficiency gains [55], consistent with compact city concepts promoted in Korea’s national territorial planning [56], while dispersed policies preserve spatial equity but at higher per capita service costs. This trade-off between efficiency and spatial equity requires explicit political deliberation informed by local context [56,57]. Comparing scenario outcomes identifies policy-sensitive grids (19.5%) where allocation choices determine residential survival. Classification by demographic structure reveals that these grids require differentiated approaches based on baseline characteristics. Aging-Vulnerable grids (14.0%) exhibit advanced aging that constrains the effectiveness of conventional retention strategies; the self-reinforcing relationship between youth outmigration and demographic aging accelerates decline trajectories in these areas [58]. Alternative approaches such as service consolidation with maintained accessibility, or managed transition planning, may warrant consideration. Moderate-Vulnerability grids (5.5%) show higher sensitivity to policy intensity despite younger demographics, suggesting greater potential for intervention through economic development or infrastructure investment. Provincial heterogeneity further complicates policy design. The 38 percentage point gap between Jeonnam-do and Seoul underscores the need for regionally differentiated approaches rather than uniform national policy. This spatial disparity suggests that regionally differentiated approaches may be warranted. Provinces with well-developed urban centers may benefit from moderate concentration that consolidates population into service-accessible areas, whereas provinces dominated by dispersed rural settlement may require policies that maintain existing residential networks to prevent widespread grid extinction, consistent with the broader context of Korea’s depopulation area support policies [7]. The pre-identification of extinction-prone areas through scenario comparison provides a basis for proactive planning rather than reactive response. Areas experiencing depopulation require ongoing attention to infrastructure maintenance, land management, and service provision regardless of policy choice [54]. By identifying which grids face extinction risk under specific scenarios and characterizing their baseline attributes, this framework enables policymakers to consider accompanying measures—infrastructure adaptation, service reorganization, or land-use transition planning—prior to implementation.

5.3. Limitations and Future Directions

Several limitations warrant consideration when interpreting these results.

Machine learning models face temporal constraints. Training on a single 5-year transition period (2015–2020) may not capture longer-term demographic dynamics or structural changes in migration patterns, and the assumption that observed transition patterns will persist through 2050 may not hold under changing economic conditions, policy interventions, or demographic regime shifts. Backtesting with 2010–2015 data demonstrated comparable model performance (F1-macro: 0.696, Table A2), supporting structural stability over decadal timescales, though this does not guarantee stability over the full 30-year projection horizon. Grid-level environmental features are held constant at 2020 baseline values, whereas infrastructure investments, land use changes, and accessibility improvements would alter grid characteristics over time. The framework also does not account for sudden demographic shocks—armed conflict, pandemics, natural disasters, or abrupt economic crises—that could alter population trajectories beyond historical patterns.

The scenario framework also involves several simplifying assumptions about population redistribution. The model assumes within-province redistribution without inter-provincial migration responses to policy, and the allocation methodology for policy scenarios focuses on redistribution among existing residential grids. The 1 km grid resolution may obscure intra-cell heterogeneity, particularly in peri-urban areas, and determining whether Emergence transitions represent sustained settlement formation or transient fluctuations would require observation across multiple census periods. The framework does not explicitly model spatial autocorrelation between neighboring grids, though accessibility variables and spatial isolation indicators partially address this limitation (Section 4.5.1), and aggregate migration dynamics are reflected through provincial population constraints derived from Statistics Korea’s cohort-component projections [50]. More fundamentally, the current framework does not model grid-level cohort aging progression or the feedback mechanisms between infrastructure withdrawal and residential decisions. Population distribution prediction inherently involves complex systemic interactions including cohort-specific demographic processes, population mobility across administrative boundaries, and dynamic interdependencies among infrastructure, housing, and economic activity [22,23].

These limitations suggest several directions for future research. The effects of spatial resolution on simulation outcomes warrant further investigation [9], as the 1 km grid adopted in this study captures the surrounding land use context through aggregation but may obscure finer-scale heterogeneity. Dynamic updating of environmental covariates could be addressed through coupling with land use change models [25,26], and the grid-level outputs could also inform land use transition planning and land-based carbon management [59,60]. The grid-level identification of extinction-prone areas also provides spatially explicit information applicable to broader planning domains, including land use transition planning and land-based carbon management [59], as well as future land use change modeling [60]. Finally, comparative analysis across East Asian countries experiencing similar demographic transitions [2] could strengthen the generalizability of the framework.

6. Conclusions

This study developed a two-stage machine learning framework to predict grid-level population distribution and assess policy sensitivity of depopulation outcomes across South Korea’s 1 km grid system. Stage 1 Random Forest classification predicts grid state transitions (F1-macro = 0.694), while Stage 2 LightGBM regression predicts population magnitude (R² = 0.950). Five spatial development scenarios were simulated through 2050 to quantify how allocation policies influence residential grid outcomes.

The findings demonstrate that the spatial distribution of population decline is substantially policy-amenable. Cumulative extinction rates range from 3.1% under extreme dispersion to 24.5% under extreme concentration, representing a 25 percentage point divergence attributable entirely to spatial allocation policy rather than demographic decline itself. Trend continuation produces near-zero net change (+0.5%) as emergence (3341 grids) approximately offsets extinction (3022 grids), reflecting the dynamic equilibrium characteristic of South Korea’s structural transition from growth to decline. Provincial heterogeneity is pronounced, with rural provinces facing extinction rates up to 39.9% under extreme concentration, while metropolitan areas remain largely unaffected. This heterogeneity suggests that a uniform national concentration policy would produce highly uneven spatial impacts, warranting regionally differentiated approaches.

Among grids predicted to remain residential, 73.7% exhibit demographic fragility indicators not captured by short-term predictions. Comparing scenario outcomes enables pre-identification of policy-sensitive grids (19.5%) where allocation choices determine residential survival. These grids tend to occupy more isolated environments with higher forest cover, and further divide into two distinct profiles. Aging-Vulnerable grids (14.0%) face structural constraints from advanced aging and a weak economic base, while Moderate-Vulnerability grids (5.5%) retain some economic activity despite heightened sensitivity to policy intensity.

The study contributes to spatial planning scholarship in three respects. First, the two-stage machine learning framework provides a replicable methodology for grid-level population projection that captures nonlinear relationships between spatial covariates and demographic transitions. Second, the scenario analysis quantifies policy sensitivity, demonstrating that depopulation outcomes depend substantially on spatial development choices rather than demographic inevitability alone. Third, the policy sensitivity classification bridges predictive modeling and policy application by identifying where interventions would most effectively influence residential outcomes.

As South Korea and other East Asian nations navigate demographic decline, spatial planning must shift from growth facilitation toward strategic contraction management within existing residential networks. However, population distribution under demographic decline involves complex systemic interactions that extend beyond the scope of any single framework. The analytical tools presented here provide a foundation for identifying policy-sensitive areas, comparing alternative spatial development strategies, and prioritizing interventions under conditions of population decline and fiscal constraint. Beyond depopulation policy, the grid-level identification of extinction-prone areas provides spatially explicit information applicable to broader spatial planning challenges, including land use transition planning, carbon management, and infrastructure investment prioritization.

Author Contributions

Conceptualization, Youngeun Kang and Hyeryeon Jo; Methodology, Youngeun Kang and Hyeryeon Jo; Investigation, Hyeryeon Jo and Miyeon Ahn; Software, Youngeun Kang and Hyeryeon Jo; Validation, Hyeryeon Jo and Miyeon Ahn; Formal analysis, Hyeryeon Jo and Youngeun Kang; Data curation, Hyeryeon Jo; Writing—original draft preparation, Hyeryeon Jo; Writing—review and editing, Youngeun Kang and Miyeon Ahn; Visualization, Hyeryeon Jo; Supervision, Youngeun Kang; Funding acquisition, Youngeun Kang. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Korea Environment Industry & Technology Institute through the “Climate Change R&D Project for New Climate Regime” funded by the Korea Ministry of Environment (2022003570008).

Data Availability Statement

The analysis code is available from the corresponding author upon reasonable request. Input data from the Statistical Geographic Information Service (SGIS) are subject to access restrictions by Statistics Korea and can be requested through SGIS (https://sgis.kostat.go.kr, accessed on 19 April 2026). Trained models and aggregated output results are available upon request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

Table A1. Complete list of explanatory variables (n = 27).

Variable	Description	Source	Unit
pop_total_2015	Total population	Census	persons
pop_0_14_2015	Population aged 0–14	Census	persons
pop_15_24_2015	Population aged 15–24	Census	persons
pop_25_44_2015	Population aged 25–44	Census	persons
pop_45_64_2015	Population aged 45–64	Census	persons
pop_65_74_2015	Population aged 65–74	Census	persons
pop_75plus_2015	Population aged 75+	Census	persons
aging_ratio_2015	Ratio of elderly (65+) to total population	Derived	ratio
very_old_ratio_2015	Ratio of very old (75+) to total population	Derived	ratio
working_ratio_2015	Ratio of working age (15–64) to total population	Derived	ratio
household_total_2015	Total households	Census	households
hh_single_2015	Single-person households	Census	households
business_total_2015	Total businesses	Census	establishments
employee_total_2015	Total employees	Census	persons
housing_total_2015	Total housing units	Census	units
house_apt_2015	Apartment units	Census	units
lc_urban	Urban land cover ratio	Land Cover Map	ratio
lc_cropland	Cropland ratio	Land Cover Map	ratio
lc_forest	Forest ratio	Land Cover Map	ratio
lc_grassland	Grassland ratio	Land Cover Map	ratio
lc_wetland	Wetland ratio	Land Cover Map	ratio
lc_barren	Barren land ratio	Land Cover Map	ratio
lc_water	Water body ratio	Land Cover Map	ratio
slope_mean	Mean slope	DEM	degrees
dem_mean	Mean elevation	DEM	meters
dist_road	Distance to nearest road	Road Network	meters
interaction_15_20	Population change interaction (2015–2020)	Derived	index

Note: All variables measured at 1 km grid level. Training uses 2015 values; scenario projection uses 2020 baseline.

Table A2. Temporal Validation: Comparison of original and backtesting performance.

Metric	Original (2015–2020)	Backtest (2010–2015)
F1-macro	0.694	0.696
Accuracy	0.961	0.923
F1 (Persistence)	0.979	0.966
F1 (Emergence)	0.427	0.514
F1 (Extinction)	0.441	0.353
F1 (Non-residential)	0.968	0.951

Note: The Stage 1 Random Forest model trained on 2015–2020 patterns was applied to 2010 SGIS grid-level features to predict 2010–2015 transitions. Environmental covariates (land cover, terrain, accessibility) were held at 2020 values. The comparable F1-macro (0.696) supports the temporal stability of the model.

Table A3. Sensitivity analysis of functional extinction.

Threshold	Baseline Grids	BAU	Compact Middle	Compact Exreme	Dispersed Middle	Dispersed Exreme
>5	65,005	10.00%	14.80%	21.50%	11.90%	11.20%
>10	58,148	−0.50%	12.10%	24.50%	4.40%	3.10%
>20	50,286	1.80%	13.70%	35.60%	−1.40%	−4.70%

Note: Net change rate calculated as (Residential₂₀₂₀ − Residential₂₀₅₀)/Residential₂₀₂₀ × 100. Negative values indicate net residential grid increase where emergence exceeds extinction. The >10 threshold (bold) is adopted as the primary threshold in this study, as it best discriminates policy-induced extinction from baseline dynamics: trend continuation (BAU) maintains near-zero net change while concentration scenarios produce substantial grid loss, and dispersion scenarios preserve residential grids more effectively than trend continuation.

Table A4. Confusion matrix for Stage 1 classification (Random Forest with SMOTE, 5-fold CV).

Actual\Predicted	Persistence	Emergence	Extinction	Non-Residential
Persistence	54,345	0	1505	1
Emergence	0	950	0	1347
Extinction	864	0	933	1
Non-residential	0	1625	0	45,335

Note: Misclassifications occur primarily within groups sharing the same 2015 baseline status. Grids residential in 2015 (Persistence, Extinction) are confused with each other but not with grids non-residential in 2015 (Emergence, Non-residential).

Table A5. Provincial model performance for Extinction detection (Stage 1 Random Forest, 5-fold CV).

Province	N (Total)	N (Extinction)	Precision	Recall	F1	Accuracy
Sejong	398	7	0.444	0.571	0.500	0.922
Gyeongbuk	18,339	343	0.410	0.574	0.478	0.950
Gangwon	16,887	308	0.409	0.545	0.467	0.949
Chungbuk	7490	139	0.391	0.554	0.458	0.940
Gyeongnam	11,236	209	0.408	0.512	0.454	0.948
Daejeon	535	17	0.500	0.412	0.452	0.944
Jeonnam	15,457	241	0.377	0.515	0.435	0.952
Seoul	710	8	0.500	0.375	0.429	0.973
Ulsan	1170	28	0.429	0.429	0.429	0.932
Daegu	1661	39	0.415	0.436	0.425	0.947
Jeonbuk	8356	130	0.346	0.508	0.411	0.951
Gyeonggi	10,483	143	0.367	0.462	0.409	0.961
Chungnam	9062	122	0.315	0.516	0.391	0.954
Busan	970	19	0.368	0.368	0.368	0.952
Incheon	1656	21	0.333	0.381	0.356	0.943
Jeju	2070	20	0.167	0.350	0.226	0.920
Gwangju	426	4	0.000	0.000	0.000	0.974

Note: Model performance correlates with extinction prevalence. Provinces with fewer extinction cases show lower detection performance due to limited training signal.

Table A6. SHAP feature importance rankings (all variables).

Variable	Stage1 (Importance)	Stage1 (Rank)	Stage2 (Importance)	Stage2 (Rank)
pop_total_2015	0.322	1	0.084	2
household_total_2015	0.133	2	0.034	10
housing_total_2015	0.108	3	0.024	18
pop_45_64_2015	0.073	4	0.029	15
working_ratio_2015	0.070	5	0.044	8
pop_65_74_2015	0.049	6	0.018	24
hh_single_2015	0.037	7	0.021	19
pop_25_44_2015	0.032	8	0.032	11
pop_75plus_2015	0.029	9	0.029	14
lc_urban	0.022	10	0.069	3
aging_ratio_2015	0.022	11	0.028	16
pop_15_24_2015	0.015	12	0.020	21
lc_cropland	0.012	13	0.032	12
employee_total_2015	0.011	14	0.064	4
pop_0_14_2015	0.011	15	0.019	22
lc_barren	0.009	16	0.124	1
very_old_ratio_2015	0.009	17	0.018	23
business_total_2015	0.008	18	0.030	13
lc_grassland	0.007	19	0.051	7
lc_forest	0.007	20	0.024	17
dist_road	0.006	21	0.020	20
slope_mean	0.004	22	0.037	9
dem_mean	0.003	23	0.057	6
interaction_15_20	0.002	24	0.061	5
lc_water	0.001	25	0.015	25
lc_wetland	0.001	26	0.010	26
house_apt_2015	0.000	27	0.008	27

Note: Values represent mean absolute SHAP contribution. Stage 1 focuses on Extinction class prediction.

Figure A1. 2050 Grid-level extinction probability predicted by Stage 1 Random Forest classification. Values represent probability of transition from residential to non-residential status. These probabilities are incorporated into scenario allocation weights.

References

Martinez-Fernandez, C.; Audirac, I.; Fol, S.; Cunningham-Sabot, E. Shrinking Cities: Urban Challenges of Globalization. Int. J. Urban Reg. Res. 2012, 36, 213–225. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Ma, J.; Sho, K.; Seta, F. Are East Asian “Shrinking Cities” Falling into a Loop? Insights from the Interplay between Population Decline and Metropolitan Concentration in Japan. Cities 2024, 155, 105445. [Google Scholar] [CrossRef]
Hattori, K.; Kaido, K.; Matsuyuki, M. The Development of Urban Shrinkage Discourse and Policy Response in Japan. Cities 2017, 69, 124–132. [Google Scholar] [CrossRef]
Long, Y.; Zhang, E. Fine-Scale Recognition-Based Design Guidelines for Dealing with Shrinking Cities: A Case Study of Hegang. In Data Augmented Design; Springer: Cham, Switzerland, 2021; pp. 93–105. [Google Scholar] [CrossRef]
Joo, Y.M.; Seo, B. Dual Policy to Fight Urban Shrinkage: Daegu, South Korea. Cities 2018, 73, 128–137. [Google Scholar] [CrossRef]
Yang, S.; Roh, J. Identifying Declining Urban Spaces in the Context of Shrinkage: A Case Study of Busan, South Korea. Cities 2026, 169, 106582. [Google Scholar] [CrossRef]
Kim, S. Are Small Cities Disappearing? The Policy Responses to Urban Shrinkage Oriented toward Young People in Uiseong-Gun, South Korea. Cities 2024, 155, 105450. [Google Scholar] [CrossRef]
Park, M.; Kim, Y. Analysis of Population Policy for the Lowest-Low Fertility in South Korea: The Impacts of Policy Bundles and Spatiotemporal Dynamics. Korean Policy Stud. Rev. 2024, 33, 219–247. (In Korean) [Google Scholar] [CrossRef]
Openshaw, S. The Modifiable Areal Unit Problem. In Concepts and Techniques in Modern Geography; Geo Books: Kerala, India, 1984. [Google Scholar]
Leyk, S.; Gaughan, A.E.; Adamo, S.B.; Sherbinin, A.D.; Balk, D.; Freire, S.; Rose, A.; Stevens, F.R.; Blankespoor, B.; Frye, C.; et al. The Spatial Allocation of Population: A Review of Large-Scale Gridded Population Data Products and Their Fitness for Use. Earth Syst. Sci. Data 2019, 11, 1385–1409. [Google Scholar] [CrossRef]
Lloyd, C.T.; Sorichetta, A.; Tatem, A.J. Data Descriptor: High Resolution Global Gridded Data for Use in Population Studies. Sci. Data 2017, 4, 170001. [Google Scholar] [CrossRef]
Hollander, J.B.; Pallagst, K.; Schwarz, T.; Popper, F.J. Planning Shrinking Cities. Prog. Plan. 2009, 72, 223–232. [Google Scholar]
Haase, D.; Haase, A.; Rink, D. Conceptualizing the Nexus between Urban Shrinkage and Ecosystem Services. Landsc. Urban Plan. 2014, 132, 159–169. [Google Scholar] [CrossRef]
Wiechmann, T. Errors Expected-Aligning Urban Strategy with Demographic Uncertainty in Shrinking Cities. Int. Plan. Stud. 2008, 13, 431–446. [Google Scholar] [CrossRef]
Hospers, G.J. Policy Responses to Urban Shrinkage: From Growth Thinking to Civic Engagement. Eur. Plan. Stud. 2014, 22, 1507–1523. [Google Scholar] [CrossRef]
Mallach, A.; Haase, A.; Hattori, K. The Shrinking City in Comparative Perspective: Contrasting Dynamics and Responses to Urban Shrinkage. Cities 2017, 69, 102–108. [Google Scholar] [CrossRef]
Caminade, C.; Kovats, S.; Rocklov, J.; Tompkins, A.M.; Morse, A.P.; Colón-González, F.J.; Stenlund, H.; Martens, P.; Lloyd, S.J. Impact of Climate Change on Global Malaria Distribution. Proc. Natl. Acad. Sci. USA 2014, 111, 3286–3291. [Google Scholar] [CrossRef] [PubMed]
Wardrop, N.A.; Jochem, W.C.; Bird, T.J.; Chamberlain, H.R.; Clarke, D.; Kerr, D.; Bengtsson, L.; Juran, S.; Seaman, V.; Tatem, A.J. Spatially Disaggregated Population Estimates in the Absence of National Population and Housing Census Data. Proc. Natl. Acad. Sci. USA 2018, 115, 3529–3537. [Google Scholar] [CrossRef] [PubMed]
Stevens, F.R.; Gaughan, A.E.; Linard, C.; Tatem, A.J. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data. PLoS ONE 2015, 10, e0107042. [Google Scholar] [CrossRef] [PubMed]
Boo, G.; Darin, E.; Leasure, D.R.; Dooley, C.A.; Chamberlain, H.R.; Lázár, A.N.; Tschirhart, K.; Sinai, C.; Hoff, N.A.; Fuller, T.; et al. High-Resolution Population Estimation Using Household Survey Data and Building Footprints. Nat. Commun. 2022, 13, 1330. [Google Scholar] [CrossRef]
Grossman, I.; Bandara, K.; Wilson, T.; Kirley, M. Can Machine Learning Improve Small Area Population Forecasts? A Forecast Combination Approach. Comput. Environ. Urban Syst. 2022, 95, 101806. [Google Scholar] [CrossRef]
Chi, G.; Wang, D. Population Projection Accuracy: The Impacts of Sociodemographics, Accessibility, Land Use, and Neighbour Characteristics. Popul. Space Place 2018, 24, e2129. [Google Scholar] [CrossRef]
Wilson, T.; Grossman, I.; Alexander, M.; Rees, P.; Temple, J. Methods for Small Area Population Forecasts: State-of-the-Art and Research Needs. Popul. Res. Policy Rev. 2021, 41, 865–898. [Google Scholar] [CrossRef] [PubMed]
Breidenbach, P.; Kaeding, M.; Schaffner, S. Population Projection for Germany 2015–2050 on Grid Level (RWI-GEO-GRID-POP-Forecast). Jahrb. Natl. Stat. 2019, 239, 733–745. [Google Scholar] [CrossRef]
Zhuang, H.; Liu, X.; Li, B.; Wu, C.; Yan, Y.; Zeng, L.; Zheng, C. Mapping High-Resolution Global Gridded Population Distribution from 1870 to 2100. Sci. Total Environ. 2024, 955, 176867. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Guo, F.; Wang, J.; Cai, W.; Wang, C.; Wang, K. Provincial and Gridded Population Projection for China under Shared Socioeconomic Pathways from 2010 to 2100. Sci. Data 2020, 7, 83. [Google Scholar] [CrossRef]
Hori, K.; Saito, O.; Hashimoto, S.; Matsui, T.; Akter, R.; Takeuchi, K. Projecting Population Distribution under Depopulation Conditions in Japan: Scenario Analysis for Future Socio-Ecological Systems. Sustain. Sci. 2021, 16, 295–311. [Google Scholar] [CrossRef]
Lee, B.; Jeong, B. Development of Grid-Based Population Projection Method: A Modified Cohort Component Approach Applied to South Korea. Popul. Space Place 2026, 32, e70161. [Google Scholar] [CrossRef]
Statistics Korea. Statistical Geographic Information Service (SGIS) Grid Statistics; Statistics Korea: Daejeon, Republic of Korea, 2023. Available online: https://sgis.kostat.go.kr (accessed on 9 January 2026).
Ministry of Environment. Environmental Geographic Information Service (EGIS): Land Cover Map; Ministry of Environment: Sejong, Republic of Korea, 2023; Available online: https://egis.me.go.kr (accessed on 9 January 2026).
Ministry of Land; Infrastructure and Transport. V-World Open Platform: Digital Elevation Model; Ministry of Land, Infrastructure and Transport: Sejong, Republic of Korea, 2023; Available online: https://www.vworld.kr/v4po_main.do (accessed on 9 January 2026).
Korea Transport Institute. Korea Transport Database (KTDB): National Road Network; Korea Transport Institute: Sejong, Republic of Korea, 2023; Available online: https://www.ktdb.go.kr (accessed on 9 January 2026).
Gallego, F.J. A Population Density Grid of the European Union. Popul. Environ. 2010, 31, 460–473. [Google Scholar] [CrossRef]
Linard, C.; Gilbert, M.; Snow, R.W.; Noor, A.M.; Tatem, A.J. Population Distribution, Settlement Patterns and Accessibility across Africa in 2010. PLoS ONE 2012, 7, e31743. [Google Scholar] [CrossRef]
Partridge, M.D.; Rickman, D.S.; Ali, K.; Olfert, M.R. Employment Growth in the American Urban Hierarchy: Long Live Distance. BE J. Macroecon. 2008, 8, 1–38. [Google Scholar] [CrossRef]
Verburg, P.H.; Soepboer, W.; Veldkamp, A.; Limpiada, R.; Espaldon, V.; Mastura, S.S.A. Modeling the Spatial Dynamics of Regional Land Use: The CLUE-S Model. Environ. Manag. 2002, 30, 391–405. [Google Scholar] [CrossRef]
Barry, S.; Elith, J. Error and Uncertainty in Habitat Models. J. Appl. Ecol. 2006, 43, 413–423. [Google Scholar] [CrossRef]
Welsh, A.H.; Cunningham, R.B.; Donnelly, C.F.; Lindenmayer, D.B. Modelling the Abundance of Rare Species: Statistical Models for Counts with Extra Zeros. Ecol. Model. 1996, 88, 297–308. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc.: San Diego, CA, USA, 2017; Volume 30. [Google Scholar]
Hastie, T. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Cham, Switzerland, 2009. [Google Scholar]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Grandini, M.; Bagli, E.; Visani, G. Metrics for Multi-Class Classification: An Overview. arXiv 2020, arXiv:2008.05756. [Google Scholar] [CrossRef]
Fox, J. Applied Regression Analysis and Generalized Linear Models; Sage Publications: Thousand Oaks, CA, USA, 2015. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc.: San Diego, CA, USA, 2017; Volume 30. [Google Scholar]
Statistics Korea. Population Projections by Province (2020–2050); Korean Statistical Information Service (KOSIS): Daejeon, Republic of Korea, 2024; Available online: https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1BPB001 (accessed on 9 January 2026).
Lin, M.; Lucas, H.C., Jr.; Shmueli, G. Research Commentary—Too Big to Fail: Large Samples and the p-Value Problem. Inf. Syst. Res. 2013, 24, 906–917. [Google Scholar] [CrossRef]
Lasanta, T.; Arnáez, J.; Pascual, N.; Ruiz-Flaño, P.; Errea, M.P.; Lana-Renault, N. Space–Time Process and Drivers of Land Abandonment in Europe. Catena 2017, 149, 810–823. [Google Scholar] [CrossRef]
Li, Y.; Westlund, H.; Liu, Y. Why Some Rural Areas Decline While Some Others Not: An Overview of Rural Evolution in the World. J. Rural Stud. 2019, 68, 135–143. [Google Scholar] [CrossRef]
Koning, J.d.; Hobbis, S.K.; McNeill, J.; Prinsen, G. Vacating Place, Vacated Space? A Research Agenda for Places Where People Leave. J. Rural Stud. 2021, 82, 271–278. [Google Scholar] [CrossRef]
Holden, E.; Norland, I.T. Three Challenges for the Compact City as a Sustainable Urban Form: Household Consumption of Energy and Transport in Eight Residential Areas in the Greater Oslo Region. Urban Stud. 2005, 42, 2145–2166. [Google Scholar] [CrossRef]
OECD. Compact City Policies: Korea—Towards Sustainable and Inclusive Growth; OECD Publishing: Paris, France, 2014; ISBN 9789264225503. [Google Scholar]
Burton, E. The Compact City: Just or Just Compact? A Preliminary Analysis. Urban Stud. 2000, 37, 1969–2006. [Google Scholar] [CrossRef]
Reynaud, C.; Miccoli, S. Depopulation and the Aging Population: The Relationship in Italian Municipalities. Sustainability 2018, 10, 1004. [Google Scholar] [CrossRef]
Park, C.S.; Kim, G.H.; Song, S.; Ban, Y.W.; Lee, S.I. Analysis of Land-Based Carbon Neutrality Impacts According to Land Use Transition Scenarios; Korea Environment Institute: Sejong, Republic of Korea, 2024; Available online: https://www.nkis.re.kr/subject_view1.do?otcCd=RB&otpId=OTP_0000000000015929 (accessed on 31 March 2026).
Han, S.; Kang, Y.; Jo, H.; Ahn, M.; Kim, T.; Son, S. Future Land Use and Cover Modeling in South Korea: Linking SSP-RCP with FLUS Model. Land 2025, 14, 2380. [Google Scholar] [CrossRef]

Figure 1. Study area and baseline conditions (2020). (a) Satellite imagery showing topographic characteristics of South Korea. Mountainous terrain dominates eastern and central regions, while western plains support higher settlement density. (b) Grid-level population distribution across 58,148 residential grids. Population concentration in the Seoul Metropolitan Area contrasts with sparse distribution in non-metropolitan provinces. (c) Aging ratio by grid. High aging ratios (≥40%) occur across diverse rural settings, including mountainous provinces (Gangwon-do, Gyeongsangbuk-do) and agricultural plains (Chungcheongnam-do, Jeollanam-do).

Figure 2. Two-stage machine learning framework and scenario projection methodology. The upper panel illustrates BAU Projection, where Stage 1 Random Forest classifies grid state transitions and Stage 2 LightGBM predicts population for residential grids. BAU applies ML predictions iteratively in 5-year steps from 2020 to 2050, with environmental features held constant at 2020 baseline. The lower panel shows Compact and Dispersed Scenario Projection, where population is allocated using the method of Hori et al. [27] with weights determined by current population (

P_{i}^{δ}

), ML-predicted extinction probability (

{\hat{p}}_{e x t, i}

), and emergence mask (

M_{i}

). Four policy scenarios are applied as single-period 2020 to 2050 transformations.

Figure 2. Two-stage machine learning framework and scenario projection methodology. The upper panel illustrates BAU Projection, where Stage 1 Random Forest classifies grid state transitions and Stage 2 LightGBM predicts population for residential grids. BAU applies ML predictions iteratively in 5-year steps from 2020 to 2050, with environmental features held constant at 2020 baseline. The lower panel shows Compact and Dispersed Scenario Projection, where population is allocated using the method of Hori et al. [27] with weights determined by current population (

P_{i}^{δ}

), ML-predicted extinction probability (

{\hat{p}}_{e x t, i}

), and emergence mask (

M_{i}

). Four policy scenarios are applied as single-period 2020 to 2050 transformations.

Figure 3. SHAP feature importance analysis. (a) Stage 1 classification model for extinction prediction. (b) Stage 2 regression model for population magnitude prediction. Bar length indicates mean absolute SHAP value; color indicates effect direction.

Figure 4. Spatial distribution of grid state transitions (2015–2020). (a) National overview showing four transition categories: Persistence (light green), Extinction (red), Emergence (blue), and Non-residential (gray). Province and municipality boundaries are overlaid for reference. (b,c) Enlarged views of Gangwon-do and Gyeongsangbuk-do, the two provinces with highest extinction counts (308 and 343 grids respectively), displayed over satellite imagery. Extinction grids (red points) predominantly occur in mountainous areas distant from urban centers, reflecting the topographic and accessibility constraints identified in Table 7.

Figure 5. Spatial distribution of residential grid status by scenario (2050). Green indicates residential grids (population > 10), red indicates extinction from 2020 baseline. Extreme concentration produces widespread extinction concentrated in rural provinces, while trend continuation and dispersion scenarios maintain residential distribution with limited extinction.

Figure 6. Policy sensitivity classification and municipal-level aggregation. (a) Grid-level classification. (b) Policy-Sensitive (%). (c) Aging-Vulnerable (%). (d) Moderate-Vulnerability (%). Panels (b–d) show the proportion among residential grids by municipality.

Table 1. Summary of input variables and data sources.

Category	Description	Source	Year
Population	Total and age-cohort populations	SGIS (Statistics Korea) [29]	2010, 2015, 2020
Demographics	Derived demographic indicators	Computed	2010, 2015, 2020
Household	Total and single-person households	SGIS (Statistics Korea) [29]	2010, 2015, 2020
Housing	Total housing units and apartments	SGIS (Statistics Korea) [29]	2010, 2015, 2020
Economic	Business establishments and employees	SGIS (Statistics Korea) [29]	2010, 2015, 2020
Land Cover	Proportional coverage (0–1)	EGIS (Ministry of Environment) [30]	2020
Topography	Mean elevation (m) and slope (degrees)	V-World (MOLIT) [31]	2015
Accessibility	Distance to nearest road (m)	KTDB [32]	2020

Note: Training uses 2015 features to predict 2015–2020 transitions. Scenario projection (2020–2050) uses 2020 baseline features held constant throughout the projection horizon. 2010 data from the same SGIS source were additionally used for temporal backtesting (Table A2).

Table 2. Grid state transition categories and distribution (2015–2020).

Category	Definition	N	Proportion
Persistence	Residential in both 2015 and 2020 (pop > 0 in both years)	55,851	52.2%
Extinction	Transition from residential to non-residential (pop > 0 in 2015, pop = 0 in 2020)	1798	1.7%
Emergence	Transition from non-residential to residential (pop = 0 in 2015, pop > 0 in 2020)	2297	2.1%
Non-residential	Non-residential in both years (pop = 0 in both years)	46,960	43.9%

Note: Residential status for ML classification is defined as pop > 0. Scenario evaluation (Section 3.2.4) applies a functional threshold of pop > 10 to assess minimum viable residential population.

Table 3. Spatial development scenario specifications.

Scenario	γ	δ	Policy Interpretation
Compact Extreme	0.2	1.5	Aggressive urban concentration; top 20% high-population grids receive 1.5× amplified allocation
Compact Middle	0.4	1.25	Moderate concentration policy
BAU (Business-as-Usual)	0.0	1.0	Continuation of current trends; allocation proportional to existing population
Dispersed Middle	0.4	1.5	Moderate dispersion; bottom 40% low-population grids receive preferential allocation
Dispersed Extreme	0.2	2.0	Aggressive rural support; bottom 20% low-population grids receive 2.0× amplified allocation

Note: Compact scenarios allocate preferentially to high-population grids; Dispersed scenarios allocate preferentially to low-population grids. γ determines the proportion of grids receiving preferential allocation; δ determines amplification intensity.

Table 4. Model performance comparison (5-Fold cross-validation).

Stage	Algorithm	Primary Metric	SD	Secondary Metric
Stage 1	Random Forest	F1-macro: 0.694	0.008	Acc: 0.927
	XGBoost	F1-macro: 0.649	0.005	Acc: 0.959
	LightGBM	F1-macro: 0.648	0.006	Acc: 0.959
Stage 2	LightGBM	R²: 0.950	0.002	RMSE: 0.361
	XGBoost	R²: 0.949	0.003	RMSE: 0.365
	Gradient Boosting	R²: 0.948	0.003	RMSE: 0.368

Note: Bold indicates selected models. Stage 1 uses SMOTE oversampling.

Table 5. Class-specific performance (Random Forest with SMOTE).

Class	Precision	Recall	F1
Persistence	0.984	0.973	0.979
Extinction	0.383	0.519	0.441
Emergence	0.369	0.414	0.39
Non-residential	0.971	0.965	0.968
Macro Average	-	-	0.694

Table 6. SHAP feature importance by variable category.

Category	Variable	Stage 1			Stage 2
Category	Variable	\|SHAP\|	Rank	Dir	\|SHAP\|	Rank	Dir
Population	pop_total	0.13	1	−	0.703	1	+
	pop_45_64	0.023	4	−	0.105	3	−
	pop_0_14	0.004	14	−	0.087	4	−
	pop_25_44	0.010	9	−	0.058	6	−
Household	household_total	0.048	2	−	0.165	2	−
	hh_single	0.012	8	−	0.010	15	+
Housing	housing_total	0.045	3	−	0.019	10	−
Economic	employee_total	0.001	19	−	0.041	7	+
	business_total	0.003	16	−	0.006	19	+
Land Cover	lc_urban	0.001	18	−	0.07	5	+
	lc_barren	0.005	11	−	0.024	8	−
Topography	dem_mean	0.001	24	−	0.018	11	+
	slope_mean	0.001	25	−	0.016	12	+
Accessibility	dist_road	0.001	23	−	0.005	22	−

Note: Direction indicates effect on prediction outcome. Stage 1 (−) indicates higher values reduce extinction probability. Stage 2 (+) indicates higher values increase population magnitude. Full results for all 27 variables provided in Table A6.

Table 7. Grid type characteristics.

Variable	Persistence	Extinction	Emergence	Non-Residential
N (grids)	55,851	1798	2297	46,960
Proportion (%)	52.2	1.7	2.1	43.9
Population (mean)	912	29	7	2
Aging ratio (%)	40.6	41.5	40.4	7.8
Working ratio (%)	68.1	87.3	78.8	16.0
Elevation (m)	130	231	223	367
Slope (°)	8.9	14.7	13.6	19.1
Urban LC (%)	13.2	4.1	5.8	1.2
Forest LC (%)	42.8	69.7	64.1	79.6
Road distance (m)	646	790	814	1750

Table 8. Cumulative scenario outcomes by 2050.

Scenario	Residential 2050	Extinction	Emergence	Net Change	Extinction Rate
Baseline 2020	58,148	-	-	-	-
BAU	58,467	3022	3341	319	5.20%
Compact Middle	51,097	7051	0	−7051	12.10%
Compact Extreme	43,906	14,242	0	−14,242	24.50%
Dispersed Middle	55,607	2541	0	−2541	4.40%
Dispersed Extreme	56,319	1829	0	−1829	3.10%

Note: BAU applies machine learning predictions iteratively in 5-year intervals, capturing both extinction and emergence dynamics.

Table 9. Provincial extinction rates by scenario (%).

Province	Residential 2020(N)	BAU Net	Compact Middle	Compact Extreme	Dispersed Middle	Dispersed Extreme
Jeollanam-do	7923	−1.4	21.9	39.9	4.8	3.5
Gangwon-do	5468	−1.9	14.7	34.3	7	6.3
Gyeongsangbuk-do	9063	−0.8	17	32.1	6.1	4.4
Daejeon	386	0.8	4.4	25.6	2.8	1.3
Jeju-do	1057	−0.2	12.8	25.4	3.2	1.8
Jeonbuk-do	4994	−0.3	13.8	25.2	4.7	3.3
Busan	648	0.9	5.1	23.9	2.3	1.1
Chungcheongbuk-do	4250	−1.3	8.8	23	4.3	3
Gwangju	342	−0.6	2.6	23.1	2	1.2
Chungcheongnam-do	6675	0.2	10.7	20.5	3	1.9
Gyeongsangnam-do	6452	−0.3	8.8	19.7	4.6	3.2
Ulsan	663	−2.1	10.3	15.8	5.3	3.3
Daegu	936	0.3	9	11.8	4.5	3.5
Incheon	943	−2.8	3.9	10.7	2.5	1.2
Gyeonggi-do	7418	1.3	3	6.6	1.8	1
Sejong	315	−4.1	1.6	4.4	1.3	0
Seoul	615	−1	1.8	1.6	0.7	0.3

Note: Residential 2020(N) = residential grids in 2020; BAU Net = net change rate under trend continuation (%), where positive values indicate emergence exceeding extinction. Policy scenario columns report extinction rates only. Provinces ordered by Compact Extreme extinction rate.

Table 10. Four-type policy sensitivity classification.

Type	Definition	N	%	Characteristics
Stable	BAU residential, Compact residential	43,779	75.3	Robust across all scenarios
Moderate-Vulnerability	BAU residential, Compact extinct, Aging < 40%	3212	5.5	Policy-sensitive, younger demographic
Aging-Vulnerable	BAU residential, Compact extinct, Aging ≥ 40%	8135	14.0	Policy-sensitive, high aging
Already-Extinct	BAU extinct	3022	5.2	Baseline extinction

Note: Classification based on 2020 residential grids (population > 10). Percentages calculated from 58,148 baseline residential grids.

Table 11. Extinction rates by vulnerability type and scenario.

Type	BAU	Compact Middle	Compact Extreme
Stable	0.00%	0.00%	0.00%
Moderate-Vulnerability	0.00%	46.5%	100.00%
Aging-Vulnerable	0.00%	34.6%	100.00%
All Grids	5.2%	12.1%	24.5%

Table 12. Characteristics of policy sensitivity types.

Variable	Stable	Moderate-Vuln	Aging-Vuln	Already-Extinct
N (grids)	43,779	3212	8135	3022
Population (mean)	1174	21	23	38
Aging ratio (%)	40	20	80	50
Employees (mean)	539	60	12	66
Businesses (mean)	129	9	4	8
Forest LC (%)	37.4	60.4	62.3	65.9
Urban LC (%)	15.8	5.6	3.5	4.8
Road distance (m)	583	815	806	771
Isolated grids (%)	1.7	5.4	5.9	5.7

Note: Isolated grids defined as distance to nearest road > 1500 m.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Jo, H.; Ahn, M.; Kang, Y. Assessing Policy Sensitivity in Grid-Level Depopulation Projections: A Machine Learning-Based Scenario Analysis for South Korea. ISPRS Int. J. Geo-Inf. 2026, 15, 181. https://doi.org/10.3390/ijgi15050181

AMA Style

Jo H, Ahn M, Kang Y. Assessing Policy Sensitivity in Grid-Level Depopulation Projections: A Machine Learning-Based Scenario Analysis for South Korea. ISPRS International Journal of Geo-Information. 2026; 15(5):181. https://doi.org/10.3390/ijgi15050181

Chicago/Turabian Style

Jo, Hyeryeon, Miyeon Ahn, and Youngeun Kang. 2026. "Assessing Policy Sensitivity in Grid-Level Depopulation Projections: A Machine Learning-Based Scenario Analysis for South Korea" ISPRS International Journal of Geo-Information 15, no. 5: 181. https://doi.org/10.3390/ijgi15050181

APA Style

Jo, H., Ahn, M., & Kang, Y. (2026). Assessing Policy Sensitivity in Grid-Level Depopulation Projections: A Machine Learning-Based Scenario Analysis for South Korea. ISPRS International Journal of Geo-Information, 15(5), 181. https://doi.org/10.3390/ijgi15050181

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessing Policy Sensitivity in Grid-Level Depopulation Projections: A Machine Learning-Based Scenario Analysis for South Korea

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Study Area and Data

3.1.1. Study Area

3.1.2. Data Sources and Preprocessing

3.1.3. Target Variable Definition

3.2. Two-Stage Machine Learning and Scenario Projection Framework

3.2.1. Framework Rationale

3.2.2. Stage 1: Residential State Classification

3.2.3. Stage 2: Population Density Regression

3.2.4. SHAP-Based Interpretation

3.2.5. Scenario-Based Projection

3.3. Policy Sensitivity Classification

4. Results

4.1. Model Performance

4.2. Feature Importance Analysis

4.3. Grid-Type Characterization

4.4. Scenario Simulation Results

4.4.1. Cumulative Outcomes

4.4.2. Provincial Variation

4.5. Policy Sensitivity Analysis

4.5.1. Limitations of Short-Term Prediction

4.5.2. Four-Type Typology of Policy-Sensitive Grids

4.5.3. Characteristics and Spatial Distribution of Policy-Sensitive Grids

5. Discussion

5.1. Methodological Implications

5.2. Policy Implications

5.3. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI