1. Introduction
Global climate change causes extreme weather events, such as droughts, to occur frequently. These climatic conditions are considered as primary causes for the increasing frequency and scale of wildfire damage [
1,
2,
3]. Globally, the frequency of mega-fires is rising, and wildfire seasons are becoming prolonged, leading to severe ecological and economic losses. In the era of the climate crisis, it is particularly concerning that wildfires release massive amounts of carbon dioxide into the atmosphere in a short period of time. This emitted carbon dioxide exacerbates climate change [
4]. Boreal forest fires typically account for about 10% of global wildfire carbon dioxide emissions; however, in 2021, this figure surged to 23%, recording the highest proportion since 2000. In 2021, driven by global warming, abnormal moisture deficits led to extreme wildfires across North America and Eurasia [
5]. Wildfires do not merely cause short-term damage; they trigger a severe positive feedback loop that accelerates global warming by releasing large quantities of carbon previously stored in forests [
6].
In the ROK, the largest mega-fire in the nation’s history occurred in March 2025, primarily affecting Gyeongbuk, Gyeongnam, and Ulsan, with the damaged area reaching approximately 104,000 hectares, equivalent to 1.64% of total forest area [
7]. Furthermore, this wildfire recorded the highest number of human casualties in history, in addition to extensive forest damage [
8]. Korea is constantly exposed to the risk of wildfires, which have become an everyday threat. Given that about 63% of the land is covered by forests, combined with dry spring weather, strong localized winds, and rugged topography, there is a high risk of fires spreading rapidly.
The current fire-prone landscape in Korea is in part a product of post-war reforestation policy. Following the devastation during the Korean War (1950–1953), which left forests severely degraded, the government launched large-scale afforestation campaigns from the 1960s through the 1980s. These programs relied heavily on fast-growing coniferous species—most notably
Pinus densiflora (Japanese red pine) and
Pinus koraiensis (Korean pine)—which, although native, were planted in dense monoculture stands far exceeding their natural distribution [
9]. As a result, coniferous plantations now dominate large portions of the landscape in Gangwon and Gyeongbuk provinces, creating structurally homogeneous forests with high fuel continuity that are particularly susceptible to rapid fire spread.
An examination of wildfire trends in Korea reveals that the interaction of climatic, structural, and social factors is making wildfires increasingly larger and more routine. According to statistics, an average of 450 to 500 wildfires occurs annually, destroying approximately 3700 to 4000 hectares of forests each year [
10,
11]. Notably, the success of past reforestation projects and continuous forest protection efforts have led to an increase in forest growing stock since the 1960s. This has resulted in fuel accumulation, acting as a structural factor that exacerbates wildfire scale [
10,
11]. Forest growing stock in Korea has increased 18.4-fold compared to 1946, reaching 1,040,447,000
(165.2
/ha) as of 2020 [
12].
Meteorological patterns, characterized by decreasing precipitation days and a sharp increase in dry weather warnings due to climate change, further aggravate the risk of mega-fires [
13]. Changes in the timing and patterns of occurrence are also distinct. Wildfires, once concentrated in the spring, are expanding into May and the winter season, while the increase in recreational hiking has led to more fires on weekends [
10,
11]. The decline and aging of rural populations make initial firefighting efforts increasingly difficult [
10,
11]. This implies that the ignition and spread of wildfires are closely linked to human activities and social structural changes, beyond simple climatic or topographical conditions.
Unlike natural causes such as lightning strikes, wildfires in Korea are predominantly caused by anthropogenic factors, such as accidental fires by hikers or the burning of agricultural waste [
14]. The current National Forest Fire Danger Rating System, operated by the National Institute of Forest Science, issues fire risk ratings based on statistical analyses of weather, stand, and topographical factors in relation to wildfire occurrences (2000–2010) [
14], but it does not incorporate anthropogenic factors. Therefore, it is necessary to account for the impact of human-induced factors on wildfire occurrence for improved forest fire forecasting systems and effective forest fire prevention plans.
Won et al. [
15] developed a national integrated Daily Weather Index (DWI) model for calculating forest fire danger ratings in spring and autumn, identifying temperature, relative humidity, effective humidity, and wind speed as the key meteorological predictors. Ryu et al. [
16] analyzed data of forest fire breakouts over the last 30 years and found that wildfire risk periods have been extended due to climate change in the ROK. Kwak et al. [
17] showed that slope, elevation, aspect, distance to roads, and population density are significant explanatory factors for wildfire occurrence. Kim et al. [
18] analyzed forest fire probability using multi-temporal socio-economic and environmental variables, demonstrating that fire risk is associated with both biophysical conditions and human land-use patterns. Kim et al. [
19] reported that over a 30-year period (1991–2020), annual wildfire incidents have been increasing, with worsening spatial unevenness, as mega-fires are concentrated in the northeastern regions of Gangwon and Gyeongbuk.
However, there is a lack of empirical research that quantitatively and spatially analyzes the impact of anthropogenic factors on wildfires. Some previous studies (Hong et al. [
20]) proposed a hypothesis that forest roads can exacerbate wildfire breakouts. Conversely, others suggest that forest roads hinder the spread of wildfires (Lee et al. [
21]). A question of whether there is a relationship between forest roads and wildfires or not needs to be clarified. Some argue that the ROK Government’s policy of subsidizing “forest improvement tending” promotes forest fires (Park et al. [
22]).
To better understand the factors associated with wildfire occurrence in the ROK, this study aims to test the following three hypotheses:
Forest stand age and its species composition influence wildfire occurrence.
Plantation forest tending activities can promote wildfire occurrence.
Expansion of forest road networks and trail infrastructure (density and accessibility) increases wildfire risk.
2. Materials and Methods
2.1. Study Area and Research Flow
The spatial scope of this study was set to the Gangwon and Gyeongbuk provinces in the ROK (
Figure 1). Both regions feature rugged terrain and have recorded frequent wildfires. The temporal scope was defined from January 2022 to August 2025. The research followed the procedure illustrated in
Figure 2 below.
Variables were selected and collected from relevant organizations, followed by a data refinement process to determine their impact on wildfire occurrence. In the data preprocessing stage, the study areas of Gangwon and Gyeongbuk were divided into 1 km × 1 km square grids to construct grid data. All spatial data preprocessing and variable mapping were performed using the open-source software QGIS 3.40.15.
The dependent variable (Target) was set as a binary classification, defining grids with a wildfire occurrence during the period as 1, and those without as 0. For grids where wildfires occurred (Target = 1), addresses provided by the Korea Forest Service’s wildfire statistics were converted into latitude and longitude coordinates using Geocoder (
https://geocoder.gimi9.com/ (accessed date 26 May 2026)), a web-based geocoding site. The coordinates represent the centroid of the address parcel area. The geocoded location data were mapped to grids as Point data to identify Target grids. Independent variables
to
were also mapped to these grids along with the dependent variable Y. Data collection and processing methods for these variables are detailed in
Section 2.2. The constructed spatial information was then transformed into a dataset structured suitably for machine learning model training and binary logistic regression analysis.
First, data on wildfire occurrence locations were obtained through an information disclosure request to the Korea Forest Service. A total of 492 wildfire cases were recorded in Gangwon and Gyeongbuk provinces from January 2022 to August 2025. However, after excluding cases where the exact address was difficult to identify and duplicate cases occurring within the same 1 km × 1 km grid, the final number of wildfire occurrence grids (Target = 1) was established as 471.
Second, non-occurrence grids (Target = 0) were extracted for machine learning and binary logistic regression analysis based on the refined data. Simple random sampling of non-occurrence grids could lead to spatial bias regarding regional meteorological and topographical characteristics. Therefore, to resolve the class imbalance caused by the difference in quantities between occurrence and non-occurrence grids, Region-based Stratified Random Sampling was employed, extracting 1000 non-occurrence grids. The specific sampling process is as follows.
The study area was divided into three zones based on topographical and meteorological similarities: Yeongdong (Gangwon East Coast), Yeongseo (Gangwon Inland), and Gyeongbuk (
Figure 3). The wildfire occurrences in each zone were examined. Out of 471 wildfires between January 2022 and August 2025, Gyeongbuk accounted for 266 (56.48%), Yeongseo for 139 (29.51%), and Yeongdong for 66 (14.01%). To ensure the model is balanced by learning each region’s unique environmental characteristics based on occurrence weights, the actual occurrence ratio was used as the extraction weight. The target of 1000 non-occurrence grids was allocated accordingly: 566 from Gyeongbuk, 296 from Yeongseo, and 138 from Yeongdong were randomly selected. Through this process, a final analysis dataset of 1471 grids was constructed, comprising 471 occurrence grids and 1000 stratified non-occurrence grids (
Table 1). Notably, for the Yeongdong region—characterized by unique weather conditions like Yangganjipung (local strong winds)—138 non-occurrence grids (about double the 66 occurrence grids) were allocated to effectively train the model on the complex ignition risk factors of the area.
Modeling, feature selection, logistic regression analysis, and SHAP visualization were implemented using the pandas, scikit-learn, xgboost, statsmodels, and shap libraries in Python 3.12.13 (GCC 11.4.0, Linux).
2.2. Data Collection and Processing Methods
To analyze the multidimensional factors affecting wildfire occurrence, independent variables were categorized into five groups based on raw data:
(Weather),
(Forest Characteristics),
(Infrastructure),
(Forest Management), and
(Temporal Factors) (
Table 2). The spatial distribution of forest attributes, infrastructure, and management areas across the study region, together with wildfire occurrence points, is illustrated in
Figure 4. Each independent variable was spatially joined to the grids. Missing rates and treatment strategies varied by variable type. Forest management variables (plantation forest tending, natural forest tending, other management) and infrastructure variables (road density, trail density, distance to road, distance to trail) exhibited a missing rate of 0%, as grids with no management history or infrastructure were assigned a structural zero value of 0 (meaning non-occurrence) at the raw data construction stage, rather than being treated as missing. For meteorological variables, minor missing rates were observed: effective humidity (1.16%), maximum wind speed (0.14%), and daily precipitation (0.07%). To verify that these negligible rates did not affect the results, all three model specifications (VIF-based, Random Forest-based, and XGBoost-based logistic regression) were re-estimated after applying listwise deletion to meteorological missing values (
). Across all three models, the selected variable sets were identical to those obtained with the original dataset, and model performance either improved marginally or remained equivalent (VIF: AUC 0.803 → 0.809; RF: AUC 0.795 → 0.801; XGBoost: AUC 0.791 → 0.795), confirming that the missing data had little substantial effect on the analysis. Detailed collection and processing methods for each variable are described below.
2.2.1. Weather Factors ()
Daily weather data provided by the Korea Meteorological Administration (KMA) were utilized. For each Grid point, the shortest distance to weather stations was calculated to extract data from the nearest station. The extracted data included effective humidity (eff_hum; %), daily precipitation (daily_precip; mm), and maximum wind speed (max_wind; m/s). For occurrence grids, weather data from the day before the fire (D-1) were used. For non-occurrence grids, weather data from a randomly assigned day between January 2022 and August 2025 were extracted.
Particularly, effective humidity (
eff_hum) was employed as a crucial indicator of the moisture content in forest fuels, which reflects the cumulative dryness of the environment. Unlike simple daily average humidity (
avg_hum), effective humidity is calculated as a weighted moving average of the daily relative humidity over a specific preceding period. A decay coefficient (
) was applied to assign higher weights to more recent days according to the following formula:
where
is the effective humidity,
is the relative humidity on the analysis date (D-1 for occurrence grids), and
denotes the relative humidity
n days prior to the analysis date, with the decay coefficient
r weighting more recent observations more heavily [
23]. This approach effectively captures the persistent drying conditions that critically influence wildfire ignition probabilities.
Daily precipitation (
daily_precip) was initially included as a continuous variable (mm). However, because 81.1% of the 1471 observations recorded 0 mm—a highly right-skewed distribution in which the median is zero—we additionally tested a binary specification (
precip_binary: 1 = precipitation recorded, 0 = no precipitation). Rows with missing meteorological values were excluded for the robustness check (retained
), reducing the missing rate from 1.16% to 0% for effective humidity. The results of this sensitivity analysis are reported in
Section 3.2.
2.2.2. Forest Characteristics ()
To reflect the ecological structure and physical state of the forest, the 2024 large-scale forest type map (1:5000) produced by the Korea Forest Service was used. Based on the spatial data, five variables were calculated per grid: mean stand age (stand_age_mean; years) indicating maturity, coniferous tree ratio (conifer_ratio; ratio) representing species composition, mean diameter class (dmcls; cm), stand density (dnst; %), and mean tree height (height; m).
2.2.3. Infrastructure Factors ()
Data for national forest roads and trails were sourced from the Korea Forest Service’s Forest Spatial Information Service. Data on public and private forest roads were obtained through information disclosure requests submitted to officials in Gangwon and Gyeongbuk provinces. Infrastructure factors act as indicators of human accessibility and firefighting resources. The forest road density (
road_density;
), trail density (
trail_density;
), and the distance to the nearest road and trail (
dist_road,
dist_trail; km) per grid were calculated through GIS spatial analysis (Infrastructure density was calculated as Total Length per grid area using the QGIS ‘Sum line lengths’ function, while infrastructure distance was calculated as the Euclidean distance (km) from the grid Centroid to the infrastructure object using the ‘Distance to nearest hub’ function.). Province-level summary statistics of forest area, forest road length and density, and trail length and density across the two study provinces are presented in
Table 3.
2.2.4. Forest Management Factors ()
Spatial data on tending and afforestation projects in public and private forests from 2015 to 2017, provided by the Korea Forest Service, were utilized. The implementation records for detailed forest management activities in the study areas are shown in
Table 4. The 12 detailed activities from the collected data were calculated as Area Ratios, dividing the total activity area performed within each grid (1 km × 1 km) by the grid’s total area. These variables were classified into three groups based on their purpose:
A preliminary validation was conducted using the AUC (Area Under the ROC Curve) to evaluate which configuration—treating activities individually or grouping them—best predicts wildfire risk. The validation showed that the model’s predictive performance was superior when utilizing the three grouped categories (Plantation Tending, Natural Tending, and Other Management; AUC: 0.7590) compared to inputting 12 individual variables or integrating them into a single category (AUC: 0.7491). Thus, the three-group classification system was adopted.
2.2.5. Temporal Factors ()
To account for temporal variations in wildfire occurrence, a ‘Season’ variable was derived from the mapped date of each event. The months were grouped into four categories: Summer (June–August), Spring (March–May), Fall (September–November), and Winter (December–February). For statistical analysis, these were transformed into dummy variables, with Summer serving as the reference category to prevent multicollinearity.
2.3. Feature Selection Based on Machine Learning
To identify the core variables substantially affecting wildfire occurrence among the independent variables, statistical multicollinearity diagnostics and machine learning algorithms were used concurrently. First, to prevent variables that behave similarly from degrading the model’s statistical accuracy, Variance Inflation Factor (VIF) values were calculated to diagnose multicollinearity [
25]. Stable variables with a VIF of less than 10 were initially selected (
Table 5).
Second, ensemble machine learning models (Random Forest, XGBoost) were introduced to capture complex interactions among variables. Wildfires are nonlinear phenomena in which weather, topography, and infrastructure intertwine. To overcome the limitations of traditional statistical methods that only consider linear relationships, tree-based models, which excel at finding hidden patterns, were utilized. The entire dataset was split into training (80%) and validation (20%) sets using stratified sampling.
Feature importance was extracted from each trained model to identify the top 10 variables (
Table 6). The results demonstrated a high degree of consensus between the two algorithms, with 9 out of the top 10 variables overlapping. Notably, while the VIF-based model retained the ’Season’ variables as significant linear predictors, both machine learning models excluded all seasonal variables from the top 10. Instead, effective humidity and stand age mean—which were excluded in the VIF diagnostics due to collinearity issues—emerged as the absolute top-tier variables in both models. This indicates that the machine learning algorithms accurately identified the fundamental physical trigger of wildfires (i.e., extreme dryness represented by low effective humidity) rather than relying on the superficial temporal proxy of ’Season’.
To construct the final probability function, the variable set from the Random Forest (RF) model was adopted, as it exhibited a slightly higher predictive performance (AUC: 0.8001, Pseudo : 0.1950) compared to XGBoost (AUC: 0.7960, Pseudo : 0.1818).
2.4. Estimation of Forest Fire Probability Function and Hypothesis Verification
The core objective of this study lies in causal inference rather than predictive accuracy per se. Specifically, this research aims to identify the direction and statistical significance of the impacts of anthropogenic factors—such as forest road density, trail density, and forest management activities—on wildfire occurrence. To achieve this inferential goal, a model that generates interpretable coefficients is essential. While ‘black-box’ models, such as Random Forest or deep neural networks, offer high raw predictive power, they lack the coefficient-level transparency required to refine the algorithms of the Korea Forest Service’s National Forest Fire Danger Rating System.
Therefore, in this study, a Logistic Regression Model [
26] was constructed to determine the directionality and statistical significance of the top variables selected via machine learning on actual wildfire occurrence probability. Logistic regression is suitable for binary dependent variables, with wildfire occurrence set as 1 and non-occurrence as 0. The regression equation comprising the selected 10 explanatory variables is expressed as Equation (
2):
Here, P is the probability of wildfire occurrence, is the constant term, to are the regression coefficients representing the influence of each independent variable, and represents unobserved factors and the random error term not explained by the model’s independent variables.
McFadden’s Pseudo was used to evaluate the goodness-of-fit. The final machine learning-based model, which exhibited the highest explanatory power, was adopted. The regression coefficients (Coefficient), significance probability (p-value), and Odds Ratios derived from this model were calculated to verify the practical impact of each factor.
2.5. Contribution Analysis to Wildfire Risk Using SHAP
Despite high predictive performance, machine learning models possess a ’Black-box’ characteristic, making internal decision-making processes hard to grasp. To overcome this, the SHAP (Shapley Additive exPlanations) technique, based on Game Theory, was introduced [
27]. SHAP quantitatively decomposes and explains each variable’s contribution at the individual prediction level using Shapley Values [
27]. SHAP allows for the observation of how wildfire risk changes as variable values shift at the individual grid level. In the SHAP summary plot (beeswarm plot), each point represents one grid observation. The y-axis lists variables ranked by mean absolute SHAP value (i.e., overall importance), and the x-axis shows the SHAP value, where positive values indicate increased fire probability and negative values indicate suppressed probability. Point color reflects the original feature value: red indicates a high feature value, and blue indicates a low feature value. For example, a cluster of red points on the positive x-axis for a given variable means that high values of that variable are associated with higher wildfire risk.
3. Results
3.1. Seasonal Distribution of Wildfire Occurrences
A total of 471 wildfire events were recorded across the study area from January 2022 to August 2025. Analysis of the monthly distribution revealed a pronounced seasonal concentration during the spring period (
Figure 5). April recorded the highest share at 20.4% (96 events), followed by March (18.5%, 87 events), February (17.8%, 84 events), and January (12.1%, 57 events). Collectively, the January–April peak season accounted for 68.8% of all recorded wildfires, consistent with Korea’s climatological pattern of low humidity and strong winds in spring.
In contrast, the summer and early autumn months (June–October) showed markedly suppressed activity, with monthly shares ranging from 1.1% (July, 5 events) to 3.6% (June, 17 events). A secondary, modest increase was observed in November (4.7%, 22 events) and December (6.2%, 29 events), reflecting the dry conditions of the winter season.
Regional disaggregation revealed that Gyeongbuk consistently dominated fire counts throughout the year, peaking at 51 events in February. Yeongseo exhibited its highest activity in April (39 events), while Yeongdong maintained relatively low but persistent counts across all months, with a maximum of 13 events in February. All three regions followed the same unimodal seasonal pattern, confirming that the spring peak is a province-wide phenomenon driven by shared meteorological conditions rather than region-specific factors.
3.2. Machine Learning-Based Forest Fire Probability Function Analysis Results
To reflect the non-linear and complex mechanisms of wildfire occurrence, the top 10 core variables derived from machine learning algorithms were incorporated into the final logistic regression model. The model’s Pseudo
was 0.1950, demonstrating valid analytical reliability with an explanatory power about 1.3 times higher than the logistic regression model, excluding variables with high VIF values (0.1505). The final model’s AUC score reached 0.8001, indicating excellent predictive performance. The results of the logistic regression and the Odds Ratios of each variable are shown in
Table 7.
As a robustness check, we re-estimated the model replacing the continuous daily precipitation variable with a binary indicator (precip_binary: 1 = precipitation recorded, 0 = no precipitation). This transformation is statistically motivated by the highly right-skewed distribution of daily_precip, in which 81.1% of observations recorded 0 mm. Rows with missing meteorological values were excluded for this analysis (). The binary specification yielded virtually identical model performance (Pseudo = 0.212; AUC = 0.806). Critically, the direction and significance of all primary predictors—effective humidity, conifer ratio, mean stand age, road density, and trail density—remained unchanged across both specifications. When precip_binary was included alongside the RF-selected variables, it attained a significant negative coefficient (, OR = 0.439, ), confirming that precipitation occurrence reduces wildfire probability. These results demonstrate that the primary findings of this study are robust to the operationalization of the precipitation variable.
To precisely interpret non-linear patterns between variables and individual data contributions that logistic regression struggles to capture, a SHAP Summary Plot was analyzed. SHAP summary plots showed that effective humidity (
eff_hum), mean stand age (
stand_age_mean), and coniferous tree ratio (
conifer_ratio) were identified as the top contributing variables to fire prediction. Variables shifting towards the positive (+) direction when the point color is red (high variable value), such as coniferous ratio and trail density, aggravate wildfire risk. Conversely, variables moving towards the negative (−) direction, such as stand age and road density, suppress the risk (
Figure 6).
3.3. Hypothesis 1 Testing: Impact of Forest Age and Species Composition
Based on the regression results in
Table 7, Hypothesis 1 was statistically supported. The regression coefficient for mean stand age was
(
), identifying it as the strongest suppressor of wildfire occurrence among the model’s variables. The coniferous tree ratio had a coefficient of 1.4446 (
), acting as a positive (+) factor that significantly increases fire risk. Odds ratio analysis indicated that areas with a high proportion of conifers are roughly 4.24 times more likely to experience wildfires compared to other areas.
In the SHAP plot regarding age and species, the mean stand age showed a trend of SHAP values falling below 0 during the maturation stage (
Figure 7). This indicates high vulnerability in young forests but a decreasing risk as age increases. In contrast, as the coniferous ratio increased, SHAP values rose linearly, showing a clear pattern of heightening fire risk.
3.4. Hypothesis 2 Testing: Impact of Plantation Forest Tending Activities
Hypothesis 2, which posited an association between plantation forest tending activities and wildfire occurrence, was rejected in this analysis. The regression coefficient for plantation forest tending was 0.0501 (), indicating no significant relationship. This does not support the argument that anthropogenic forest management activities like afforestation or thinning directly cause wildfires.
On the SHAP plot for plantation forest tending, the majority of data points were densely clustered around a SHAP value of 0. This suggests that the marginal contribution of fluctuations in forest management ratios to the prediction of wildfire occurrence in individual grids is negligible.
3.5. Hypothesis 3 Testing: Impact of Forest Road and Trail Infrastructure
Hypothesis 3, which suggested that infrastructure factors increase fire risk, was partially supported as conflicting results emerged depending on the infrastructure type. Trail density recorded a coefficient of 1.4625 (), showing a tendency to increase fire probability within a 10% significance level. The odds ratio was high at 4.317, confirming that areas with frequent hiker access face aggravated fire risks due to accidental ignitions. In stark contrast, forest road density (road_density) significantly decreased the occurrence probability with a coefficient of (). This implies that forest roads facilitate the rapid deployment of firefighting equipment and personnel, thereby preventing the spread of flames and reducing damage probabilities.
The SHAP graphs for infrastructure variables clearly illustrate these opposing roles (
Figure 8). As trail density increases, SHAP values rise in the positive (+) direction, indicating heightened risk; whereas for road density, higher densities push SHAP values in the negative (−) direction, exhibiting a suppressive trend on occurrences.
3.6. Impact of Other Environmental Variables: Effective Humidity and Precipitation
Meteorological factors acted as critical control variables determining wildfire occurrence in both machine learning importance evaluations and logistic regression. Effective humidity (
eff_hum) had a coefficient of
(
), and daily precipitation (
daily_precip) was
(
), confirming that drier atmospheres drastically and significantly increase the probability of fires. These results align with previous studies [
23]. On the effective humidity SHAP plot, dropping below a specific dryness threshold caused SHAP values to spike, aggravating the risk (
Figure 9). In zones with sufficient humidity, risk was consistently suppressed. This suggests that even under identical topographical, structural, and infrastructure conditions, reaching meteorological tipping points has a profound impact on triggering wildfires.
4. Discussion
4.1. Wildfire Suppression Effect of Mature Forests and Ecological Mechanisms
The analysis revealed that among ecological factors, the mean stand age lowered fire probability second only to meteorological variables. This supports our hypothesis that mature forests ecologically suppress wildfires, aligning with Zald and Dunn’s [
28] findings that young forests heavily impact fire severity. Immature forests or homogeneous plantation stands often have canopies close to the ground, serving as ladder fuels that carry flames upward, and they tend to have abundant dry fine debris, making them highly vulnerable to ignition.
The vertical and horizontal heterogeneity and dense canopies of mature forests provide an insulating effect by shading the forest floor, lowering temperatures, and retaining moisture. As forests age, bark thickness increases and canopy fuels become more elevated, both of which substantially enhance resistance to surface fires beyond the microclimate effects alone.
As noted in the Introduction, Korea’s post-war coniferous plantations have matured into dense stands aged 31 to 50 years, with forest growing stock increasing 18.4-fold since 1946 [
12]. Recently, there has been an active debate between forest policies focused on short-rotation clearcutting for economic timber and carbon absorption of young forests versus ecological preservation. At this juncture, forest policies must be established through scientific evaluations of stand age, climate change mitigation, and ecosystem services. Our findings present evidence that extending rotation periods—allowing stands to reach older age classes—may serve as a forest management strategy for suppressing mega-fires in the climate crisis era.
However, because topographical characteristics were not jointly considered, this study could not clarify whether the suppressive effect of longer-rotation management stems solely from ecological factors such as fuel elevation and bark thickness, or from a combination of topographical isolation that restricts human activity. Further research is needed.
Currently, research verifying whether aging forests suppress wildfires in Korea is scarce. Therefore, in-depth follow-up studies utilizing remote sensing technologies to explore the relationship between forest structure, stand age, and wildfire resistance are required.
4.2. The Paradox of Human Activity Infrastructure: Conflicting Roles of Forest Roads and Trails
Anthropogenic infrastructures demonstrated opposing impacts depending on their characteristics. An increase in hiking trail density was identified as an ignition factor raising fire risk. In Korea, dry spring and autumn seasons coincide with peak hiking periods. Fine fuels like dry fallen leaves are easily exposed around trails. Thus, accidental fires caused by human negligence, such as discarded cigarette butts or illegal cooking, readily escalate into actual wildfires.
Conversely, increased forest road density served as a suppressor. While forest roads could act as potential ignition sources by increasing human access, they simultaneously perform a vital fire prevention role regarding emergency response. During a fire, roads enable rapid access for fire trucks and personnel, facilitating mopping-up operations and restricting spread. They are functionally indispensable, especially at night or when helicopters cannot be deployed.
Nevertheless, arguments exist advocating for minimizing road construction due to increased landslide risks and ecosystem fragmentation. Studies investigating the link between roads and fires also present varied outcomes. Hong et al. [
20] suggested roads could be primary ignition sources by increasing accessibility. Yet, others (Lee et al. [
21]) argue they inhibit the spread. Thus, sophisticated empirical research proving the exact effects of forest roads is required, alongside the development of construction methods that minimize ecological damage.
4.3. Comparison with Recent Wildfire Prediction Models
The predictive performance of the present model compares favorably with recent studies employing similar methodologies in analogous settings. Lee et al. [
29], applying ensemble machine learning models (Extra Trees, Random Forest, XGBoost, and LightGBM) with SHAP analysis to daily wildfire prediction in Gangwon Province, achieved a maximum AUC of 0.839 using meteorological, forest-related, and socioeconomic variables. Lee et al. [
30], applying a Random Forest model integrating human proximity variables, topographic, and meteorological factors along Korea’s eastern coast (2015–2024), reported an overall accuracy of 0.733 and F1-score of 0.515. At the national scale, Choi et al. [
31], comparing Random Forest, XGBoost, and ANN using satellite-based environmental variables across the entire Republic of Korea, reported AUC values of 0.74–0.76. The logistic regression model developed in this study, augmented by machine learning-based variable selection, achieved an AUC of 0.8001 and a McFadden Pseudo
of 0.1950—competitive with these benchmarks while additionally providing coefficient-level interpretability that black-box models cannot offer. This balance between predictive accuracy and statistical transparency is particularly relevant for policy applications such as refining the Korea Forest Service’s National Forest Fire Danger Rating System.
5. Conclusions
To combat the increasing scale and routine nature of wildfire disasters driven by climate change, this study investigated the complex impacts of meteorological, ecological, and anthropogenic factors on wildfire occurrences in Gangwon and Gyeongbuk provinces of KOR via logistic regression analysis. The results confirm that, excluding weather, the most critical fire-suppressing factor is forest stand age. We found that a high proportion of coniferous forests and increased trail density could serve as primary ignition and spread factors of forest fires. Furthermore, increased forest road density significantly reduces occurrence probability, identifying it as a core firefighting asset. The hypothesis that plantation forest tending increases fuel loads and fire risk requires further follow-up research.
These findings suggest the necessity of re-evaluating Korea’s current forest policies centered on economic timber and short-rotation logging. To effectively mitigate fire damage and enhance climate resilience, forests should be managed until older age classes by extending rotation periods. Additionally, ecological forest tending that transitions highly flammable, uniform coniferous plantation forests into fire-resistant broadleaf ecosystems is necessary.
Since this study conducted macroscopic spatial analyses at a 1 km grid resolution, the effects of microscopic changes in understory microclimates conditioned by topographical factors like slope and aspect were not addressed. Furthermore, the use of infrastructure density as a proxy for human activity is a simplification; future studies incorporating direct measures such as visitor counts, ignition source records, and agricultural burning data would better characterize anthropogenic fire risk. The forest management data used in this study were limited to the period 2015–2017, the most recent publicly available records, while fire occurrence data span 2022–2025; this temporal gap is a limitation, though silvicultural treatments such as thinning and stand cleaning may influence forest structure after implementation [
32,
33]. Finally, the model was developed for Gangwon and Gyeongbuk provinces, and applying these findings to other regions or countries would require similarly detailed infrastructure and management data, which may not be readily available elsewhere.