Wildfire Susceptibility Mapping in Greece Using Ensemble Machine Learning

Symeonidis, Panagiotis; Vafeiadis, Thanasis; Ioannidis, Dimosthenis; Tzovaras, Dimitrios

doi:10.3390/earth6030075

Open AccessArticle

Wildfire Susceptibility Mapping in Greece Using Ensemble Machine Learning

by

Panagiotis Symeonidis

^*

,

Thanasis Vafeiadis

,

Dimosthenis Ioannidis

and

Dimitrios Tzovaras

Information Technologies Institute, Center for Research and Technology Hellas, 57001 Thessaloniki, Greece

^*

Author to whom correspondence should be addressed.

Earth 2025, 6(3), 75; https://doi.org/10.3390/earth6030075

Submission received: 29 May 2025 / Revised: 30 June 2025 / Accepted: 2 July 2025 / Published: 5 July 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study explores the use of ensemble machine learning models to develop wildfire susceptibility maps (WFSMs) in Greece, focusing on their application as regressors. We provide a continuous assessment of wildfire risk, enhancing the interpretability and accuracy of predictions. Two key metrics were developed: Ensemble Mean and Ensemble Max. This dual-metric approach improves predictive robustness and provides critical insights for wildfire management strategies. The ensemble mode effectively handles complex, high-dimensional data, addressing challenges such as over fitting and data heterogeneity. Utilizing advanced techniques like XGBoost, GBM, LightGBM, and CatBoost regressors, our research demonstrates the potential of these methods to enhance wildfire risk estimation. The Ensemble Mean model classified 50% of the land as low risk and 21% as high risk, while the Ensemble Max model identified 38% as low risk and 33% as high risk. Notably, 83% of wildfires between 2000 and 2024 occurred in areas marked as high-risk by both models. The findings reveal that a significant proportion of wildfires occurred in areas identified as high risk by both ensemble models, underscoring their effectiveness. This approach offers significant potential to mitigate wildfires’ environmental, economic, and social impacts, enhance climate resilience, and strengthen preparedness for future wildfire events.

Keywords:

wildfire susceptibility mapping; ensemble learning techniques; geospatial analysis; natural hazards risk

1. Introduction

Wildfires are among the most destructive natural hazards, posing significant threats to ecosystems, human lives, and economies worldwide. In recent decades, the frequency and intensity of wildfires have increased due to climate change, land-use changes, and human activities. Globally, wildfires contribute to biodiversity loss, soil erosion, air pollution, and greenhouse gas emissions, amplifying their widespread environmental and social impacts. At the regional scale, Mediterranean countries, including Greece, are particularly vulnerable due to their hot, dry summers, complex topography, and dense vegetation.

Greece faces serious challenges in predicting wildfire risk, as wildfires not only threaten valuable forest resources but also disrupt local communities and damage cultural and economic assets. Understanding wildfire susceptibility and predicting areas at high risk are critical for effective prevention, management, and mitigation efforts. Traditional methods often fall short in capturing the complex, nonlinear relationships among environmental variables—such as temperature, humidity, vegetation, and wind speed—and other factors that drive wildfire ignition and spread.

Recent advancements in artificial intelligence (AI), machine learning (ML), and deep learning (DL) have revolutionized wildfire susceptibility mapping and risk assessment. These technologies can process vast amounts of geospatial and environmental data to identify patterns and predict wildfire-prone areas with remarkable accuracy. This research aims to improve wildfire risk estimation by applying ensemble learning techniques, which combine multiple models to enhance predictive accuracy and reliability. These models can more effectively handle the region’s varied geographical and climatic conditions, providing more precise and localized wildfire risk assessments.

Geospatial technologies—remote sensing (RS) and Geographic Information Systems (GIS)—are indispensable tools that support stakeholders in making informed decisions before, during, and after a wildfire event. These tools also enable the creation of wildfire susceptibility maps (WFSMs): geospatial representations that depict the likelihood of wildfire occurrence across a specific area, based on a combination of environmental, climatic, and anthropogenic factors.

WFSMs can be developed using either knowledge-based or data-driven methods. Knowledge-based methods, such as the Analytic Hierarchy Process (AHP) [1], the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) [2], and the Weighted Linear Combination (WLC) [3], rely on expert judgment to identify and integrate influencing factors into the modeling process. These methods are susceptible to human errors, which can compromise the accuracy of the results [4].

On the other hand, data-driven approaches leverage statistical and machine learning models, such as Multiple Linear Regression (MLR) [5], Logistic Regression (LR) [6,7], Random Forest (RF) [6,7,8,9,10], Bagging [11], Decision Tree (DT) [12,13], Adaptive Neuro-Fuzzy Inference System (ANFIS) [14], Support Vector Machine (SVM) [12,15], Boosting [6,8,12,16], and Artificial Neural Network (ANN) [17,18,19], to identify patterns in large-scale nonlinear data and predict susceptibility based on historical wildfire records.

With recent advancements in machine learning, it is now possible to accurately model the complexities contributing to wildfire risk. Wildfire predictions, historically characterized by significant inaccuracies, have become increasingly reliable thanks to the growing availability of remote sensing data. Machine learning models now provide more precise predictions. For instance, the authors in [20] provide a comprehensive analysis of ML techniques in wildfire prediction.

Machine learning algorithms have been widely applied to wildfire control. In [21], the authors used RF to model the probability of fire occurrence. In [22], wildfire susceptibility was assessed using multiple methods, including Logistic Regression, probit regression, ANN, and RF. In [23], a comprehensive analysis of hybrid AI models for spatially explicit wildfire probability prediction is presented. These models integrate ANFIS with metaheuristic optimization algorithms, such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Shuffled Frog Leaping Algorithm (SFLA), and Imperialist Competitive Algorithm (ICA).

The authors in [24] proposed a novel hybrid approach—Particle Swarm Optimized Neural Fuzzy (PSO-NF)—for spatial modeling of tropical forest fire susceptibility, where PSO was used to optimize model parameters. In [12], Adaptive Boosting (AdaBoost) was combined with Decision Tree and SVM for forest fire prediction, noting that fuzzy c-means clustering and AdaBoost achieved the best results. In [6], the XGBoost ensemble technique was used alongside Logistic Regression (LR) and RF to test whether fire risk mapping improved.

Deep learning techniques—including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM) networks—have also gained traction. For example, ref. [25] used CNNs to map wildfire spatial distributions and assess forest fire susceptibility, while ref. [26] designed three progressive LSTM-based models (Couple Spatio-Graphical LSTM [CSG-LSTM], Multi-Dimensional Gated LSTM [MDG-LSTM], and Feature-Network-Updated LSTM [FNU-LSTM]) to predict fire spread rates using infrared camera images, exploring interactions between fire and wind.

The motivation behind this work lies in the urgent need for improved wildfire management strategies in Greece, where the increasing threat of wildfires demands more accurate early-warning systems. By leveraging ensemble learning, this study aims to provide better predictions that can assist in prioritizing high-risk areas, optimizing resource allocation, and enhancing decision making for fire-fighting and disaster management. Ultimately, the goal is to reduce the environmental, economic, and social impacts of wildfires, contributing to improved preparedness and resilience in the face of climate change. This research also advances the broader field of wildfire prediction, demonstrating the potential of machine learning to handle complex, high-dimensional data for more effective risk estimation.

Ensemble methods were chosen for their well-documented strengths in capturing complex, nonlinear data relationships, which are common in wildfire susceptibility mapping. These techniques, widely used in the scientific literature, combine the predictions of multiple models to enhance accuracy and robustness, effectively mitigating overfitting—a key consideration given the spatial and environmental heterogeneity of wildfire-prone regions in Greece. Additionally, ensemble methods are well suited for handling large datasets with mixed data types, missing values, and complex interactions—all of which are common in wildfire modeling. Compared to traditional ML or DL techniques, ensemble methods offer better interpretability via feature importance analysis, making them particularly useful in understanding the factors that drive wildfire risk.

A key innovation of this research is the integration of Ensemble Mean and Ensemble Max models, which synthesize predictions to offer a robust approach to wildfire risk assessment. This study aims to improve both predictive accuracy and interpretability, offering valuable insights for wildfire management and prevention strategies in high-risk regions. Although the available data pertain only to the presence or absence of fires, all ensemble methods were trained and evaluated as regressors rather than classifiers. This approach allows the models to provide a continuous measure of fire likelihood or intensity, yielding a more quantitative assessment of wildfire susceptibility.

The remainder of this work is organized as follows: Section 2 describes the conceptual framework and the integration of ensemble models, followed by a detailed overview of the study area, including geographic and climatic characteristics. The data subsection covers data sources, preprocessing steps, and an analysis of historical wildfire trends and associated environmental, socioeconomic, and anthropogenic factors. Section 3 discusses in depth the ensemble methods used, focusing on Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LGBM), and Categorical Boosting (CatBoost), highlighting their unique features and relevance to wildfire prediction. Section 4 outlines the simulation setup, including the experimental design, computational tools, and evaluation metrics used to assess model accuracy and robustness. Each model—XGBoost, GBM, LGBM, and CatBoost—is trained and optimized, with results compared for performance and scalability, while Section 5 presents the model outputs, statistical significance, and practical implications for wildfire management. Finally, Section 6 concludes the study, summarizing the findings and offering recommendations for future work.

2. Methods and Data

2.1. Methodology

This study introduces a machine learning approach to estimate and map wildfire susceptibility, following a four-step methodology: data collection, data preprocessing, model development, and, finally, model evaluation and creation of wildfire susceptibility maps. Figure 1 summarizes the methodology employed in this study.

In the first step, diverse geospatial data were collected, focusing on open data from reputable sources that are publicly available, operationally maintained, and have pan-European coverage. In the second step, these data were processed within a GIS environment to generate the necessary inputs for the ensemble machine learning models. These inputs were represented as raster datasets covering the Greek territory. The dataset included “ground truth” data—burned areas from wildfires in Greece between 2000 and 2024—as well as various wildfire conditioning factors categorized into four groups: topographic, human-related, land use/vegetation, and climatic.

In the third step, wildfire susceptibility modeling was performed using four ensemble machine learning methods: Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), its lighter variant Light Gradient Boosting Machine (LGBM), and Categorical Boosting (CatBoost). Multiple parameters and configurations were tested for each model to optimize their performance. Based on the results, two final ensemble models were created: the Ensemble Mean, derived from the average of the model outputs, and the Ensemble Max, derived from their maximum values.

Finally, in the fourth step, the accuracy of the wildfire susceptibility maps was evaluated using the standard deviation of the Normalized Root Mean Square Error (std-NRMSE) as the primary metric. The final wildfire susceptibility maps were then produced.

2.2. Study Area

Greece is located in Southern and Southeastern Europe, approximately between latitudes 34° and 42° N, and longitudes 19° and 30° E (see Figure 2). It spans a total area of 131,957 km² and has a population of approximately 10 million people. Predominantly mountainous, Greece’s terrain is rich in biodiversity, with forests covering nearly 25% of the land. These forests host a wide variety of tree species, including firs, Aleppo pines, black pines, and other conifers, as well as broadleaf species such as beeches, chestnuts, oaks, and plane trees.

The country experiences a predominantly Mediterranean climate, characterized by warm to hot summers and mild winters. Between 2000 and 2024, the European Forest Fire Information System (EFFIS), utilizing remote sensing satellite data, recorded an average of 52 wildfire events annually, resulting in an average of 43.4 thousand hectares of burned areas per year. Notably, 2007 was the most catastrophic year, with 135 recorded wildfire events and over 271 thousand hectares of forested land destroyed. These fires—primarily occurring in the western and southern Peloponnese and southern Euboea during the summer months—caused extensive environmental damage and human casualties, marking a devastating chapter in Greece’s wildfire history [27,28,29,30].

2.3. Data

2.3.1. Previous Wildfires

Wildfire data were obtained from the Copernicus Emergency Management Service (EMS), specifically through the European Forest Fire Information System (EFFIS). Burned area extents were derived from the analysis of satellite imagery. According to the product manual, these burned areas were identified using data from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensors aboard NASA’s TERRA and AQUA satellites, as well as from the Visible Infrared Imaging Radiometer Suite (VIIRS) sensor onboard the NASA/NOAA Suomi National Polar-orbiting Partnership (SNPP) satellite.

The dataset provides wildfire records spanning 25 years, from 2000 to 2024. It includes a total of 1293 fire events, each with an area greater than 30 hectares, resulting in a cumulative burned area of approximately 1085 thousand hectares. The EFFIS wildfire dataset represents the most advanced and comprehensive information currently available in Europe [31], and it shows strong agreement with other satellite-derived products for overlapping periods [32].

Figure 3a illustrates the annual number of fire events in Greece, along with the corresponding total annual burned area. The year 2007 clearly stands out as the most severe, both in terms of the number of events and the extent of burned area. Figure 3b presents the burned area per each of Greece’s 13 administrative regions, alongside the percentage of burned area relative to each region’s total land area. Notably, the Region of Attica has the highest proportion of burned area (35%), while the regions with the largest absolute burned areas are the Peloponnese (approximately 268 kha), Central Greece (204 kha), and Western Greece (168 kha). The spatial distribution of burned areas is visualized in Figure 3c.

For the purposes of this study, the original vector dataset of burned areas was processed by selecting annual burned area data. These were then converted into yearly burned area raster datasets using the “Vector to Raster” algorithm in Quantum Geographic Information System (QGIS). The spatial resolution of the resulting rasters was 500 m. These raster datasets were subsequently used as label data for training the ensemble machine learning models.

2.3.2. Wildfires Factors

Wildfire risk is influenced by several key factors that determine both the likelihood of ignition and the potential for propagation. These factors can be broadly classified into four categories: topographic, human-related, land use/vegetation, and climatic [19,33,34,35,36,37,38,39]. These conditioning factors are fundamental for generating wildfire susceptibility maps and must be carefully considered during model design [37,40,41,42,43].

In line with previous studies [13,33,44,45,46,47,48,49,50,51,52,53,54,55], this study incorporates twelve wildfire conditioning factors: land use, distance to roads, distance to rivers, distance to settlements, elevation, Topographic Wetness Index (TWI), slope, aspect, surface roughness, grassland cover, dominant leaf type, and the number of days with extreme Fire Weather Index (FWI) values [56].

Table 1 provides detailed information on each wildfire conditioning factor layer, including data sources and the preprocessing steps used to generate raster datasets for the ensemble machine learning models.

All datasets underwent preprocessing in QGIS using a range of geoprocessing tools from different providers, including QGIS, GDAL [63,64], and the System for Automated Geoscientific Analyses (SAGA) [65,66]. These steps prepared the data as inputs for the machine learning models in GeoTIFF format. The preprocessing workflow included resampling all raster datasets to a common spatial resolution of 500 m, reprojecting them to the EPSG:2100 coordinate system (GGRS87/Greek Grid), and applying a land mask to exclude non-terrestrial values.

Additional processing was performed for specific layers. For the land-use dataset, the original raster was reclassified to reduce the number of unique land-use classes to twelve (Figure 4a). Proximity layers—such as distance to roads (Figure 4b), rivers (Figure 5a), and settlements (Figure 5b)—were generated using the SAGA Proximity algorithm with the corresponding vector layers.

Topographic variables including the Topographic Wetness Index (Figure 6b), slope (Figure 7a), aspect (Figure 7b), and surface roughness (Figure 8a) were derived from the digital elevation model (Figure 6a). These were computed using a combination of the SAGA “Terrain Analysis—Topographic Wetness Index” algorithm, the GDAL “Roughness” algorithm, and the SAGA “Terrain Analysis—Morphometry” algorithms for slope, aspect, and curvature.

Grassland cover (Figure 8b) and dominant leaf type (Figure 9a) layers were derived from the high-resolution (10 m) Copernicus Land Monitoring Service (CLMS) datasets. Finally, annual rasters of Fire Weather Index (FWI) extremes (Figure 9b), covering the years from 2000 to 2024, were generated from original NetCDF files using GDAL and Climate Data Operator (CDO) tools. These involved temporal and spatial slicing, followed by interpolation to match the target raster resolution.

3. Ensemble Methods

Ensemble methods are powerful machine learning techniques that combine multiple individual models, often referred to as weak learners, to create a single, more robust model with improved predictive performance. These methods have become increasingly popular due to their effectiveness in addressing complex, high-dimensional problems, making them indispensable tools across various applications, including classification, regression, image processing, and text analysis.

In this study, four widely used ensemble methods were employed and compared: Extreme Gradient Boosting (XGBoost), Gradient Boosting Machine (GBM), its lightweight variation Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost). A brief description of each method is provided below.

3.1. Extreme Gradient Boosting

Extreme Gradient Boosting (XGBoost) is a powerful ensemble learning method based on Decision Trees, designed to deliver both high performance and speed [67]. It builds upon the gradient boosting framework, which combines multiple weak learners—typically Decision Trees—to form a strong predictive model. In this iterative process, each new tree is trained to correct the errors of the previous ones, progressively optimizing overall performance.

XGBoost incorporates several enhancements over traditional gradient boosting, including L1 and L2 regularization, which help prevent overfitting, and a highly efficient split-finding algorithm that improves computational performance. These innovations make XGBoost particularly well suited for tabular data, where it consistently achieves high predictive accuracy in both classification and regression tasks.

In addition to its accuracy, XGBoost is highly regarded for its speed and scalability. It features optimized handling of sparse data and supports parallelized tree construction, enabling efficient processing of large datasets. Moreover, it supports distributed computing, allowing it to scale across multiple machines—an essential feature for big data applications.

These qualities have made XGBoost a popular choice in data science competitions and real-world applications, ranging from financial modeling to healthcare analytics. Its flexibility is further enhanced by a wide range of hyperparameters—such as learning rate, tree depth, and subsampling—that allow for fine-tuning and adaptability to different datasets and problem domains.

3.2. Gradient Boosting Machine—Light Gradient Boosting Machine

Gradient Boosting Machine (GBM) is an ensemble technique that combines the predictions of multiple weak learners, typically decision trees, to build a robust predictive model [68]. GBM operates in a sequential manner, where each new tree corrects the errors made by the previous trees, optimizing the overall model by minimizing a specified loss function. Unlike traditional bagging methods, such as Random Forests, GBM builds trees one at a time, and each tree is influenced by the errors of its predecessors, making it highly effective for complex patterns in data. While GBM is powerful and widely used, it can be computationally intensive and prone to overfitting, especially with deep trees, requiring regularization techniques like shrinkage (learning rate) and early stopping.

Light Gradient Boosting Machine (LightGBM) is a variant of GBM, designed by Microsoft to address the limitations of traditional gradient boosting with a focus on speed and scalability [69]. Unlike GBM, which uses level-wise growth of trees, LightGBM grows trees leaf-wise, allowing it to focus on the most complex areas of the data and resulting in faster and more accurate models. LightGBM also uses histogram-based decision rules and a unique sampling method to handle large datasets efficiently while preserving predictive accuracy. This approach makes LightGBM well suited for large-scale applications, as it requires less memory and computational power than standard GBM. Its flexibility, speed, and scalability have made it popular in fields such as finance, bioinformatics, and e-commerce.

3.3. Categorical Boosting

Gradient Boosting Machine (GBM) is an ensemble learning technique that combines the predictions of multiple weak learners—typically Decision Trees—to build a robust predictive model [70]. GBM operates in a sequential manner, where each new tree is trained to correct the errors of the previous ones, optimizing the model by minimizing a specified loss function. Unlike traditional bagging methods such as Random Forests, GBM builds trees one at a time, with each tree being influenced by the residual errors of its predecessors. This approach enables GBM to model complex patterns in data effectively.

However, while powerful and widely adopted, GBM can be computationally intensive and prone to overfitting, especially when deep trees are used. To mitigate these issues, regularization techniques such as shrinkage (learning rate) and early stopping are often applied to control model complexity and improve generalization.

Light Gradient Boosting Machine (LightGBM) is a variant of GBM developed by Microsoft, designed to address the performance and scalability limitations of traditional gradient boosting algorithms [71]. Unlike GBM, which uses level-wise tree growth, LightGBM adopts a leaf-wise growth strategy. This means it grows trees by expanding the leaf with the highest loss reduction, allowing the model to focus on the most complex areas of the data. As a result, LightGBM tends to be faster and more accurate than standard GBM on many tasks.

In addition, LightGBM employs histogram-based decision rules and an efficient sampling method, making it particularly well suited for large datasets. It achieves high speed, lower memory usage, and excellent scalability, making it a preferred choice for many real-world applications. Its flexibility and performance have led to its widespread adoption in domains such as finance, bioinformatics, and e-commerce.

4. Simulation Setup and Results

4.1. Simulation Setup

The performance of all ensemble methods was thoroughly evaluated using a wide range of model parameters through grid search to ensure a fair and unbiased comparison. All input variables were normalized prior to training. The dataset was intentionally constructed to be slightly imbalanced toward the absence of fire and was split into 80% for training and 20% for validation. Stratified sampling was used to preserve the proportion of fire and no-fire areas in both the training and validation sets. This approach ensures that each model had an equal opportunity to demonstrate its strengths under consistent conditions.

Furthermore, to provide the most suitable basis for comparison among models, the standard deviation of the Normalized Root Mean Square Error (std-NRMSE) was selected as the primary evaluation metric.

4.1.1. Evaluation Metrics

In statistical analysis and particularly in regression models, a common way of measuring the quality of the fit of a certain model is the Root Mean Square Error (RMSE) [72], also called Root Mean Square Deviation, given by

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y})}^{2}}{n}}

where

y_{i}

is the i-th actual value,

\hat{y}

is the predicted value by the selected model, and n is the size of the actual test set. If the predicted values are close enough to the actual values, the RMSE will be small, while in case where predicted and actual are substantially different (at least for some cases) the RMSE will be large. A value of zero would indicate a perfect fit to the data [73,74].

Although the models compared in this study do not use different response variables or transformations (such as square root, standardization, or log transformation), the decision was made to use the Normalized Root Mean Square Error (NRMSE) as the primary evaluation metric. This choice is due to NRMSE’s ability to be categorized into different performance levels, providing a more nuanced insight.

Among the various methods to calculate NRMSE from RMSE, using the mean, range (difference between maximum and minimum), interquartile range, or standard deviation, the standard deviation-based NRMSE (std-NRMSE) [72] was selected, defined as follows:

std-NRMSE = \frac{R M S E}{s t d (y)}

The std-NRMSE is the preferred option among the four versions discussed in Section 3 because it quantifies the ratio between the variation not explained by the regression and the overall variation in the actual values y. When the regression explains all the variation in y and the std-NRMSE is zero. If the regression explains only part of the variation, leaving some unexplained variation of a similar scale to the overall variation, the std-NRMSE will be approximately 1. A std-NRMSE value greater than 1 indicates that the model’s prediction error exceeds the natural variability of the variable itself, suggesting poor predictability. Therefore, a desirable std-NRMSE value is less than 1, ideally approaching zero, which reflects high predictive accuracy [74].

4.1.2. XGBoost

During the grid search optimization for XGBoost, we focused on hyperparameters that directly affect model complexity and generalization, including learning rate, max depth, and number of estimators (n_estimators). The learning rate controls the contribution of each tree to the final model, while max depth and n_estimators influence the model’s capacity to capture complex relationships. Fine-tuning the subsample parameter—which controls the fraction of data used for each tree—helped prevent overfitting by introducing diversity among trees.

Additionally, we optimized the regularization parameters reg_alpha, reg_lambda, and the gamma parameter to enhance regularization and improve loss reduction. The ranges explored for each hyperparameter were as follows: learning rate from 0.01 to 0.1, max depth of 3 and 5, number of estimators at 50, 100, and 200, reg_alpha values of 0, 1, and 10, reg_lambda values of 0, 1, and 10, and gamma values of 0, 0.5, and 1. This grid search resulted in a total of 324 models evaluated.

4.1.3. GBM

To maximize GBM’s predictive performance, the grid search process involved optimizing several hyperparameters, including learning rate, number of estimators (n_estimators), max depth, and the regularization parameter min_samples_split. The learning rate was particularly important for preventing overfitting, as it controls the incremental contribution of each tree—lower values generally lead to more conservative learning. The number of estimators and max depth allowed us to balance model complexity with interpretability, while min_samples_split regularized the trees by limiting how deep and complex they could grow. Additionally, we explored the subsample parameter to reduce overfitting and the max_features parameter to speed up training. The ranges of the hyperparameters tested were as follows: learning rate between 0.01, 0.1, and 0.2; max depth of 4, 5, and 6; number of estimators at 100 and 200; min_samples_split values of 2, 5, and 10; subsample values of 0.7, 0.9, and 1.0; and max_features set to ‘auto’, ‘sqrt’, and ‘log2’. This grid search resulted in a total of 486 models evaluated.

4.1.4. LGBM

The grid search for LightGBM focused on tuning parameters such as num_leaves, max depth, and the number of estimators (n_estimators). Both num_leaves and max depth control the maximum complexity of the trees, while n_estimators determines the number of boosting iterations. Additionally, we explored reg_lambda, boosting type, and learning rate to optimize regularization, select the boosting algorithm, and adjust the step size shrinkage applied at each iteration. The ranges tested for each hyperparameter were learning rate values of 0.01, 0.05, and 0.1; max depth values of 4, 5, and 6; n_estimators set at 100 and 200; reg_lambda values of 0, 0.01, and 0.1; num_leaves values of 63 and 127; and boosting types ‘gbdt’, ‘dart’, and ‘goss’. This resulted in a total of 324 models evaluated.

4.1.5. CatBoost

During the grid search for CatBoost, we focused on optimizing parameters such as depth, iterations, learning rate, and l2_leaf_reg. The depth parameter was crucial for balancing the model’s capacity to learn complex patterns while avoiding overfitting, whereas iterations determined the number of boosting rounds. The learning rate controlled the incremental pace of learning, and l2_leaf_reg (L2 regularization) helped minimize overfitting by penalizing leaf weights. Additionally, we explored random_strength to control the level of randomness when calculating feature splits and bagging_temperature to regulate randomness during each iteration. The ranges tested for these hyperparameters were learning rate values of 0.01, 0.1, and 0.2; depth values of 6, 8, and 10; iterations set at 200 and 500; l2_leaf_reg values of 1, 5, and 7; random_strength values of 1, 2, and 5; and bagging_temperature values of 0, 1, and 2. This resulted in a total of 486 models evaluated.

4.2. Simulation Results

The simulation results indicate varying performance across the four ensemble models based on the std-NRMSE metric. Table 2 presents the std-NRMSE values of the top-performing model selected from each ensemble method. Among the evaluated models, CatBoost achieved the lowest std-NRMSE value of 0.8147, indicating it provided the most accurate predictions for wildfire danger estimation. GBM followed closely with a std-NRMSE of 0.8330, demonstrating strong predictive performance despite its slightly higher error. While GBM is recognized for its powerful boosting framework, it can sometimes suffer from overfitting or suboptimal hyperparameter tuning, which may explain its marginally higher error. XGBoost and LightGBM produced similar results, with std-NRMSE values of 0.8451 and 0.8480, respectively. Using the best-performing model from each ensemble method, the corresponding wildfire danger estimation raster was generated.

Table 3 presents the feature importance scores for each ensemble model used in the study. Feature importance indicates the contribution of each input variable to the model’s predictions, with each cell showing the percentage importance of the corresponding feature for each model. The most influential features for each model are highlighted to identify the variables that have the greatest impact on model performance. For example, elevation and Fire Weather Index consistently rank highly across several models, while roughness and distance to river are significant in only two models. Overall, there is a notable similarity in feature importance between XGBoost and GBM, as well as between LightGBM and CatBoost.

Additionally, the aggregated importance of the four feature categories is shown. Topographic features emerge as the most important across all models, with their contribution ranging from 43.10% to 56.80%. The second most important feature group varies by model: land use for XGBoost, proximity for LightGBM and CatBoost, and climate for GBM.

5. Experimental Results

Figure 10 and Figure 11 present the wildfire susceptibility maps generated by the four ensemble methods examined: XGBoost (Figure 10a), GBM (Figure 10b), LightGBM (Figure 11a), and CatBoost (Figure 11b). A five-class color scale was applied to represent wildfire risk levels ranging from very low, low, medium, high, to very high.

From the four best models, two new ensemble maps were created: Ensemble Mean and Ensemble Max. The Ensemble Mean map averages the predictions from multiple models, providing a balanced assessment of wildfire risk, while the Ensemble Max map highlights areas of highest susceptibility by selecting the maximum predicted values. These two ensemble maps are presented in Figure 12, with the Ensemble Mean shown in Figure 12a and the Ensemble Max in Figure 12b, both using the same color scale as before.

Figure 13 shows the percentage of land area in Greece falling into each wildfire risk class. According to the Ensemble Mean model, 50% of the land is classified as low risk, while high-risk areas account for 21%. For the Ensemble Max model, these figures are 38% for low risk and 33% for high risk, respectively.

To assist wildfire management operators in prioritizing areas with greater fire risk, we created the maps shown in Figure 14. Using the zonal statistics tool in QGIS, these maps aggregate wildfire susceptibility for each prefecture in Greece (Nomenclature of territorial units for statistics, NUTS level 3), highlighting regions where proactive measures should be considered due to elevated fire risk. Notably, high-risk areas include the Regional Units of East and West Attica, Euboea, Lesbos, Chios, and Argolis. These regions are clearly visible in both the Ensemble Mean map (Figure 14a) and the Ensemble Max map (Figure 14b).

To evaluate the models’ performance, we calculated, using QGIS, the maximum wildfire susceptibility class within the boundaries of each recorded wildfire. Figure 15 presents the results of this analysis for the Ensemble Max model.

These results are also shown in Figure 16a for both the Ensemble Mean and Ensemble Max models. As observed, 83% of fire events occurred in areas classified as high or very high risk by the Ensemble Max model, while the corresponding figure is 70% for the Ensemble Mean model. Only 9% and 4% of fires took place in low or very low risk areas according to the Ensemble Mean and Ensemble Max models, respectively. Regarding the extent of the burned area, both models also performed well, as most of the burned areas fall within the high and very high-risk categories, as illustrated in Figure 16b.

6. Conclusions

The study on wildfire susceptibility mapping in Greece using ensemble machine learning techniques demonstrates significant advancements in predicting wildfire risk, highlighting the use of regression-based ensemble outputs over classification models commonly used in prior studies. We further demonstrated the applicability of similar approaches to country-level WFSMs, contrasting with previous examples that were primarily conducted at regional or local scales. By employing various ensemble methods, including Extreme Gradient Boosting and Categorical Boosting, alongside the development of Ensemble Mean and Ensemble Max models, this research successfully identified high-risk areas where 83% of wildfires between 2000 and 2024 occurred.

Wildfire susceptibility mapping is a crucial tool in real-world wildfire management, offering valuable insights into areas prone to wildfires and enabling proactive decision making. By identifying high-susceptibility zones, WFSMs allow authorities to prioritize these areas for prevention and mitigation measures, such as vegetation management, controlled burns, and infrastructure protection. It also supports resource allocation by guiding the strategic positioning of fire-fighting resources and enhancing community preparedness through improved early warning systems and evacuation planning. Furthermore, it aids land-use planning by influencing zoning policies to reduce wildfire risks in vulnerable regions and provides essential information for post-fire recovery efforts, such as reforestation and environmental monitoring. Additionally, it serves as a foundation for evidence-based policy development, informing strategies like fire-resistant building codes and buffer zones, while also supporting insurance companies in risk assessment and premium setting. As a dynamic tool, WFSMs can be updated to reflect the changing patterns of wildfire risk due to climate change, enabling long-term adaptation and evaluation of mitigation efforts. By integrating WFSMs into wildfire management practices, stakeholders can shift from reactive to proactive approaches, minimizing the social, economic, and ecological impacts of wildfires.

While the ensemble learning techniques used in this work (XGBoost, CatBoost, GBM, and LightGBM) have provided a robust foundation for wildfire danger estimation, the study is not without limitations. One key limitation is the absence of deep learning approaches, which may offer superior capabilities in capturing both spatial and temporal complexities, particularly when dealing with large volumes of high-dimensional data such as remote sensing imagery or climate time series. Additionally, the current models rely predominantly on static environmental and topographic variables, which may not fully reflect the dynamic conditions that influence wildfire behavior. Temporal generalization could also be a concern, as the models were validated on historical data and may underperform when exposed to future environmental conditions shaped by climate change.

To address these limitations, future work should explore the integration of advanced deep learning architectures such as hybrid Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) models, which are capable of learning both spatial features and long-range temporal dependencies. Incorporating dynamic variables like drought indices, fuel moisture content, and recent weather patterns could further improve predictive accuracy. Additionally, model interpretability and reliability could be enhanced through the use of explainability methods such as SHAP, while uncertainty-aware modeling techniques and temporal validation strategies would ensure more robust and future-proof applications. Expanding the approach to include exposure data would also transition the analysis from hazard mapping to comprehensive wildfire risk assessment, better supporting policy, and emergency response planning.

Overall, the findings suggest that ensemble machine learning methods are effective in wildfire susceptibility modeling, and that further integration of spatiotemporal data and advanced modeling techniques can significantly improve their utility for long-term wildfire resilience and management.

Author Contributions

Conceptualization, P.S. and T.V.; methodology, P.S.; software, T.V. and P.S.; validation, P.S., T.V. and D.I.; formal analysis, T.V.; investigation, D.I.; resources, D.T.; data curation, P.S.; writing—original draft preparation, P.S. and T.V.; writing—review and editing, D.I.; visualization, T.V. and P.S.; supervision, D.I.; project administration, D.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available because are part of an ongoing study. Requests to access the datasets should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Goleiji, E.; Hosseini, S.M.; Khorasani, N.; Monavari, S.M. Forest fire risk assessment—An integrated approach based on multicriteria evaluation. Environ. Monit. Assess. 2017, 189, 612. [Google Scholar] [CrossRef] [PubMed]
Hwang, C.-L.; Yoon, K. Methods for Multiple Attribute Decision Making. In Multiple Attribute Decision Making; Lecture Notes in Economics and Mathematical Systems; Springer: Berlin/Heidelberg, Germany, 1981; Volume 186, pp. 58–191. [Google Scholar] [CrossRef]
Ljubomir, G.; Pamučar, D.; Drobnjak, S.; Pourghasemi, H.R. Modeling the Spatial Variability of Forest Fire Susceptibility Using Geographical Information Systems and the Analytical Hierarchy Process. In Spatial Modeling in GIS and R for Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2019; pp. 337–369. [Google Scholar] [CrossRef]
Samui, P. Support vector machine applied to settlement of shallow foundations on cohesionless soils. Comput. Geotech. 2008, 35, 419–427. [Google Scholar] [CrossRef]
Oliveira, S.; Oehler, F.; San-Miguel-Ayanz, J.; Camia, A.; Pereira, J.M.C. Modeling Spatial Patterns of Fire Occurrence in Mediterranean Europe Using Multiple Regression and Random Forest. For. Ecol. Manag. 2012, 275, 117–129. [Google Scholar] [CrossRef]
Michael, Y.; Helman, D.; Glickman, O.; Gabay, D.; Brenner, S.; Lensky, I.M. Forecasting fire risk with machine learning and dynamic information derived from satellite vegetation index time-series. Sci. Total Environ. 2021, 764, 142844. [Google Scholar] [CrossRef]
Pourghasemi, H.R. GIS-based forest fire susceptibility mapping in Iran: A comparison between evidential belief function and binary logistic regression models. Scand. J. For. Res. 2016, 31, 80–98. [Google Scholar] [CrossRef]
Dang, T.T.; Cheng, Y.; Mann, J.; Hawick, K.; Li, Q. Fire Risk Prediction Using Multi-Source Data: A case study in Humberside area. In Proceedings of the 2019 25th International Conference on Automation and Computing (ICAC), Lancaster, UK, 5–7 September 2019; IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar] [CrossRef]
Rodrigues, M.; De La Riva, J. An insight into machine-learning algorithms to model human-caused wildfire occurrence. Environ. Model. Softw. 2014, 57, 192–201. [Google Scholar] [CrossRef]
Wittenberg, L.; Malkinson, D. Spatio-temporal perspectives of forest fires regimes in a maturing Mediterranean mixed pine landscape. Eur. J. For. Res. 2009, 128, 297–304. [Google Scholar] [CrossRef]
Tuyen, T.T.; Jaafari, A.; Yen, H.P.H.; Nguyen-Thoi, T.; Van Phong, T.; Nguyen, H.D.; Van Le, H.; Phuong, T.T.M.; Nguyen, S.H.; Prakash, I.; et al. Mapping forest fire susceptibility using spatially explicit ensemble models based on the locally weighted learning algorithm. Ecol. Inform. 2021, 63, 101292. [Google Scholar] [CrossRef]
Rosadi, D.; Andriyani, W. Prediction of forest fire using ensemble method. J. Phys. Conf. Ser. 2021, 1918, 042043. [Google Scholar] [CrossRef]
Sachdeva, S.; Bhatia, T.; Verma, A.K. GIS-based evolutionary optimized Gradient Boosted Decision Trees for forest fire susceptibility mapping. Nat. Hazards 2018, 92, 1399–1418. [Google Scholar] [CrossRef]
Bui, D.T.; Bui, Q.-T.; Nguyen, Q.-P.; Pradhan, B.; Nampak, H.; Trinh, P.T. A hybrid artificial intelligence approach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest fire susceptibility modeling at a tropical area. Agric. For. Meteorol. 2017, 233, 32–44. [Google Scholar] [CrossRef]
Tehrany, M.S.; Jones, S.; Shabani, F.; Martínez-Álvarez, F.; Bui, D.T. A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using LogitBoost machine learning classifier and multi-source geospatial data. Theor. Appl. Climatol. 2019, 137, 637–653. [Google Scholar] [CrossRef]
Zhou, F.; Pan, H.; Gao, Z.; Huang, X.; Qian, G.; Zhu, Y.; Xiao, F.; Chen, J. Fire Prediction Based on CatBoost Algorithm. Math. Probl. Eng. 2021, 2021, 9. [Google Scholar] [CrossRef]
Le, H.V.; Hoang, D.A.; Tran, C.T.; Nguyen, P.Q.; Tran, V.H.T.; Hoang, N.D.; Amiri, M.; Ngo, T.P.T.; Nhu, H.V.; Van Hoang, T.; et al. A new approach of deep neural computing for spatial prediction of wildfire danger at tropical climate areas. Ecol. Inform. 2021, 63, 101300. [Google Scholar] [CrossRef]
Zhang, G.; Wang, M.; Liu, K. Deep neural networks for global wildfire susceptibility modelling. Ecol. Indic. 2021, 127, 107735. [Google Scholar] [CrossRef]
Zhang, G.; Wang, M.; Liu, K. Forest Fire Susceptibility Modeling Using a Convolutional Neural Network for Yunnan Province of China. Int. J. Disaster Risk Sci. 2019, 10, 386–403. [Google Scholar] [CrossRef]
Jain, P.; Coogan, S.C.P.; Subramanian, S.G.; Crowley, M.; Taylor, S.; Flannigan, M.D. A review of machine learning applications in wildfire science and management. Environ. Rev. 2020, 28, 478–505. [Google Scholar] [CrossRef]
Barreto, J.S.; Armenteras, D. Open Data and Machine Learning to Model the Occurrence of Fire in the Ecoregion of “Llanos Colombo–Venezolanos”. Remote Sens. 2020, 12, 3921. [Google Scholar] [CrossRef]
Cao, Y.; Wang, M.; Liu, K. Wildfire Susceptibility Assessment in Southern China: A Comparison of Multiple Methods. Int. J. Disaster Risk Sci. 2017, 8, 164–181. [Google Scholar] [CrossRef]
Jaafari, A.; Zenner, E.K.; Panahi, M.; Shahabi, H. Hybrid artificial intelligence models based on a neuro-fuzzy system and metaheuristic optimization algorithms for spatial prediction of wildfire probability. Agric. For. Meteorol. 2019, 266–267, 198–207. [Google Scholar] [CrossRef]
Bot, K.; Borges, J.G. A Systematic Review of Applications of Machine Learning Techniques for Wildfire Management Decision Support. Inventions 2022, 7, 15. [Google Scholar] [CrossRef]
Ngo, P.T.T.; Panahi, M.; Khosravi, K.; Ghorbanzadeh, O.; Kariminejad, N.; Cerda, A.; Lee, S. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci. Front. 2021, 12, 505–519. [Google Scholar] [CrossRef]
Li, X.; Gao, H.; Zhang, M.; Zhang, S.; Gao, Z.; Liu, J.; Sun, S.; Hu, T.; Sun, L. Prediction of Forest Fire Spread Rate Using UAV Images and an LSTM Model Considering the Interaction Between Fire and Wind. Remote Sens. 2021, 13, 4325. [Google Scholar] [CrossRef]
Athanasiou, M.; Xanthopoulos, G. Fire behaviour of the large fires of 2007 in Greece. In Proceedings of the 6th International Conference on Forest Fire Research, Coimbra, Portugal, 15–18 November 2010. [Google Scholar]
Chalaris, M.; Chalaris, M.; Balatsos, P.; Karma, S.; Pappa, A.; Spiliopoulou, C.; Statheropoulos, M.; Theodorou, P. Forest fires in Greece during summer 2007: The data file of a case study. Int. For. Fire News IFFN 2007, 5, 2–17. [Google Scholar]
Economou, F.; Prodromidis, P.; Skintzi, G. Large Fire Disaster and the Regional Economy: The 2007 Case of the Peloponnese. South-East. Eur. J. Econ. 2019, 17, 7–31. [Google Scholar]
Eftychidis, G. Mega-Fires in Greece (2007). In Encyclopedia of Natural Hazards; Bobrowsky, P.T., Ed.; Encyclopedia of Earth Sciences Series; Springer: Dordrecht, The Netherlands, 2013; pp. 664–671. [Google Scholar] [CrossRef]
Gincheva, A.; Pausas, J.G.; Edwards, A.; Provenzale, A.; Cerdà, A.; Hanes, C.; Royé, D.; Chuvieco, E.; Mouillot, F.; Vissio, G.; et al. A monthly gridded burned area database of national wildland fire data. Sci. Data 2024, 11, 352. [Google Scholar] [CrossRef]
Turco, M.; Herrera, S.; Tourigny, E.; Chuvieco, E.; Provenzale, A. A comparison of remotely-sensed and inventory datasets for burned area in Mediterranean Europe. Int. J. Appl. Earth Obs. Geoinf. 2019, 82, 101887. [Google Scholar] [CrossRef]
Bahadori, N.; Razavi-Termeh, S.V.; Sadeghi-Niaraki, A.; Al-Kindi, K.M.; Abuhmed, T.; Nazeri, B.; Choi, S.-M. Wildfire Susceptibility Mapping Using Deep Learning Algorithms in Two Satellite Imagery Dataset. Forests 2023, 14, 1325. [Google Scholar] [CrossRef]
Gralewicz, N.J.; Nelson, T.A.; Wulder, M.A. Factors influencing national scale wildfire susceptibility in Canada. For. Ecol. Manag. 2012, 265, 20–29. [Google Scholar] [CrossRef]
Jaafari, A.; Pourghasemi, H.R. Factors Influencing Regional-Scale Wildfire Probability in Iran. In Spatial Modeling in GIS and R for Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2019; pp. 607–619. [Google Scholar] [CrossRef]
Janiec, P.; Gadal, S. A Comparison of Two Machine Learning Classification Methods for Remote Sensing Predictive Modeling of the Forest Fire in the North-Eastern Siberia. Remote Sens. 2020, 12, 4157. [Google Scholar] [CrossRef]
Moayedi, H.; Mehrabi, M.; Bui, D.T.; Pradhan, B.; Foong, L.K. Fuzzy-metaheuristic ensembles for spatial assessment of forest fire susceptibility. J. Environ. Manag. 2020, 260, 109867. [Google Scholar] [CrossRef]
Pastor, E. Mathematical models and calculation systems for the study of wildland fire behaviour. Prog. Energy Combust. Sci. 2003, 29, 139–153. [Google Scholar] [CrossRef]
Wu, Z.; He, H.S.; Yang, J.; Liang, Y. Defining fire environment zones in the boreal forests of northeastern China. Sci. Total Environ. 2015, 518–519, 106–116. [Google Scholar] [CrossRef] [PubMed]
Liang, H.; Zhang, M.; Wang, H. A Neural Network Model for Wildfire Scale Prediction Using Meteorological Factors. IEEE Access 2019, 7, 176746–176755. [Google Scholar] [CrossRef]
Nami, M.H.; Jaafari, A.; Fallah, M.; Nabiuni, S. Spatial prediction of wildfire probability in the Hyrcanian ecoregion using evidential belief function model and GIS. Int. J. Environ. Sci. Technol. 2018, 15, 373–384. [Google Scholar] [CrossRef]
Renard, Q.; Pélissier, R.; Ramesh, B.R.; Kodandapani, N. Environmental susceptibility model for predicting forest fire occurrence in the Western Ghats of India. Int. J. Wildland Fire 2012, 21, 368. [Google Scholar] [CrossRef]
Viedma, O.; Urbieta, I.R.; Moreno, J.M. Wildfires and the role of their drivers are changing over time in a large rural area of west-central Spain. Sci. Rep. 2018, 8, 17797. [Google Scholar] [CrossRef] [PubMed]
Adab, H. Landfire hazard assessment in the Caspian Hyrcanian forest ecoregion with the long-term MODIS active fire data. Nat. Hazards 2017, 87, 1807–1825. [Google Scholar] [CrossRef]
Benson, R.P.; Roads, J.O.; Weise, D.R. Chapter 2 Climatic and Weather Factors Affecting Fire Occurrence and Behavior. In Developments in Environmental Science; Elsevier: Amsterdam, The Netherlands, 2008; Volume 8, pp. 37–59. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Aryal, J. Forest Fire Susceptibility and Risk Mapping Using Social/Infrastructural Vulnerability and Environmental Variables. Fire 2019, 2, 50. [Google Scholar] [CrossRef]
Huang, S.; Tang, L.; Hupy, J.P.; Wang, Y.; Shao, G. A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing. J. For. Res. 2021, 32, 1–6. [Google Scholar] [CrossRef]
Kalantar, B.; Ueda, N.; Idrees, M.O.; Janizadeh, S.; Ahmadi, K.; Shabani, F. Forest Fire Susceptibility Prediction Based on Machine Learning Models with Resampling Algorithms on Remote Sensing Data. Remote Sens. 2020, 12, 3682. [Google Scholar] [CrossRef]
Leuenberger, M.; Parente, J.; Tonini, M.; Pereira, M.G.; Kanevski, M. Wildfire susceptibility mapping: Deterministic vs. stochastic approaches. Environ. Model. Softw. 2018, 101, 194–203. [Google Scholar] [CrossRef]
Mann, M.L.; Batllori, E.; Moritz, M.A.; Waller, E.K.; Berck, P.; Flint, A.L.; Flint, L.E.; Dolfi, E.; Biondi, F. Incorporating Anthropogenic Influences into Fire Probability Models: Effects of Human Activity and Climate Change on Fire Activity in California. PLoS ONE 2016, 11, e0153589. [Google Scholar] [CrossRef]
Narayanaraj, G.; Wimberly, M.C. Influences of forest roads on the spatial patterns of human- and lightning-caused wildfire ignitions. Appl. Geogr. 2012, 32, 878–888. [Google Scholar] [CrossRef]
Nur, A.; Kim, Y.; Lee, J.; Lee, C.-W. Spatial Prediction of Wildfire Susceptibility Using Hybrid Machine Learning Models Based on Support Vector Regression in Sydney, Australia. Remote Sens. 2023, 15, 760. [Google Scholar] [CrossRef]
Nur, A.S.; Kim, Y.J.; Lee, C.-W. Creation of Wildfire Susceptibility Maps in Plumas National Forest Using InSAR Coherence, Deep Learning, and Metaheuristic Optimization Approaches. Remote Sens. 2022, 14, 4416. [Google Scholar] [CrossRef]
Satir, O.; Berberoglu, S.; Donmez, C. Mapping regional forest fire probability using artificial neural network model in a Mediterranean forest ecosystem. Geomat. Nat. Hazards Risk 2016, 7, 1645–1658. [Google Scholar] [CrossRef]
Bui, D.T.; Hoang, N.-D.; Samui, P. Spatial pattern analysis and prediction of forest fire using new machine learning approach of Multivariate Adaptive Regression Splines and Differential Flower Pollination optimization: A case study at Lao Cai province (Viet Nam). J. Environ. Manag. 2019, 237, 476–487. [Google Scholar] [CrossRef]
Van Wagner, C.E. Development and structure of the Canadian Forest Fire Weather Index System. Canadian Forest Service, Petawawa National Forestry Institute, Forestry Technical Report. 1987. Available online: https://ostrnrcan-dostrncan.canada.ca/handle/1845/228434 (accessed on 15 May 2025).
European Environment Agency. CORINE Land Cover 2018 (Raster 100 m), Europe, 6-Yearly—Version 2020_20u1, May 2020; European Environment Agency: Copenhagen K, Denmark, 2019. [Google Scholar] [CrossRef]
European Environment Agency. Impervious Built-Up 2018 (Raster 10 m), Europe, 3-Yearly, August 2020; European Environment Agency: Copenhagen K, Denmark, 2020. [Google Scholar] [CrossRef]
European Environment Agency. EU-DEM. [GeoTIFF]. 2016. Available online: https://sdi.eea.europa.eu/catalogue/biodiversity/api/records/3473589f-0854-4601-919e-2e7dd172ff50 (accessed on 15 May 2025).
European Environment Agency. Grassland 2018 (Raster 10 m), Europe, 3-Yearly, August 2020; European Environment Agency: Copenhagen K, Denmark, 2020. [Google Scholar] [CrossRef]
European Environment Agency. Dominant Leaf Type 2018 (Raster 10 m), Europe, 3-Yearly, September 2020; European Environment Agency: Copenhagen K, Denmark, 2020. [Google Scholar] [CrossRef]
Copernicus Climate Change Service. Fire Danger Indicators for Europe from 1970 to 2098 Derived from Climate Projections; ECMWF: Bologna, Italy, 2020. [Google Scholar] [CrossRef]
Qin, C.-Z.; Zhu, L.-J. GDAL/OGR and Geospatial Data IO Libraries. Geogr. Inf. Sci. Technol. Body Knowl. 2020, 2020. Available online: https://gistbok-topics.ucgis.org/PD-05-033 (accessed on 15 May 2025). [CrossRef]
Rouault, E.; Warmerdam, F.; Schwehr, K.; Kiselev, A.; Butler, H.; Łoskot, M.; Szekeres, T.; Tourigny, E.; Landa, M.; Miara, I.; et al. GDAL. 6 November 2024. Available online: https://zenodo.org/records/15375292 (accessed on 15 May 2025). [CrossRef]
Conrad, O.; Bechtel, B.; Bock, M.; Dietrich, H.; Fischer, E.; Gerlitz, L.; Wehberg, J.; Wichmann, V.; Böhner, J. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 2015, 8, 1991–2007. [Google Scholar] [CrossRef]
Olaya, V.; Conrad, O. Chapter 12 Geomorphometry in SAGA. In Developments in Soil Science; Elsevier: Amsterdam, The Netherlands, 2009; Volume 33, pp. 293–308. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30: 31st Annual Conference on Neural Information Processing Systems (NIPS 2017); von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; pp. 3147–3155. [Google Scholar]
Yandex, T. CatBoost—Yandex Technologies. Available online: https://yandex.com/dev/catboost/ (accessed on 15 May 2025).
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. arXiv 2017, arXiv:1706.09516. [Google Scholar]
Otto, S.A. How to Normalize the RMSE. Available online: https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/ (accessed on 15 May 2025).
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer Texts in Statistics; Springer: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Otto, S.A.; Kadin, M.; Casini, M.; Torres, M.A.; Blenckner, T. A quantitative framework for selecting and validating food web indicators. Ecol. Indic. 2018, 84, 619–631. [Google Scholar] [CrossRef]

Figure 1. Wildfire susceptibility calculation workflow.

Figure 2. The study area, Greece, in the Southeast part of Europe.

Figure 3. (a) Wildfire statistics (number of events and total area burned) in Greece from 2000 to 2024 (a) per year, (b) per region. A: Attica, P: Peloponnese, WG: Western Greece, NA: North Aegean, CG: Central Greece, SA: South Aegean, II: Ionian Islands, EM-T: Eastern Macedonia—Thrace, T: Thessaly, C: Crete, E: Epirus, CM: Central Macedonia, WM: Western Macedonia, and (c) spatial distribution of burned areas based on the year of occurrence.

Figure 4. Spatial distribution of wildfire factors: (a) land use and (b) distance to roads.

Figure 5. Spatial distribution of wildfire factors: (a) distance to rivers and (b) distance to settlements.

Figure 6. Spatial distribution of wildfire factors: (a) elevation and (b) Topographic Wetness Index.

Figure 7. Spatial distribution of wildfire factors: (a) slope and (b) aspect.

Figure 8. Spatial distribution of wildfire factors: (a) roughness and (b) grassland.

Figure 9. Spatial distribution of wildfire factors: (a) dominant leaf type and (b) days with extreme values of Fire Weather Index.

Figure 10. Map visualization of the best wildfire susceptibility model from (a) XGBoost and (b) GBM.

Figure 11. Map visualization of the best wildfire susceptibility model from (a) LightGBM and (b) CatBoost.

Figure 12. Map visualization of ensemble models: (a) Mean and (b) Max.

Figure 13. Land percentage in different wildfire susceptibility classes.

Figure 14. Maps of median value of susceptibility calculated from (a) Ensemble Mean and (b) Ensemble Max, models, at prefecture level.

Figure 15. Thematic map of wildfire burned area based on the maximum susceptibility category of the Ensemble Max model.

Figure 16. Wildfires (a) occurrence percentage and (b) extent of total burned area, in different wildfire susceptibility classes (Max class is considered).

Table 1. Wildfire conditioning factors.

Wildfire Factor	Source	Processing Steps
Land Use	Copernicus CLMS, CORINE Land Cover for 2000, 2006, 2012, 2018 raster [57]	QGIS: reclassify by table, reproject to EPSG 2100, resampled to 500 m
Distance to Roads	OpenStreetMap Roads (vector dataset)	QGIS: SAGA proximity on road dataset, reproject to EPSG 2100, resampled to 500 m
Distance to Rivers	Rivers in Greece (vector dataset) from the geoportal of the Greek Ministry of Environment and Energy	QGIS: SAGA proximity on river dataset, reproject to EPSG 2100, resampled to 500 m
Distance to Settlements	Copernicus CLMS, high-resolution raster layer share of built-up (reference year 2018) [58]	QGIS: SAGA proximity on built-up dataset, reproject to EPSG 2100, resampled to 500 m
Elevation	EU-DEM v1.1 Digital Surface Model [59]	QGIS: reproject to EPSG 2100, resampled to 500 m, mask using GR regions vector dataset
TWI	The derived elevation raster dataset	QGIS: SAGA terrain analysis Topographic Wetness Index algorithm, reproject to EPSG 2100, resampled to 500 m
Slope	The derived elevation raster dataset	QGIS: SAGA terrain analysis—morphometry slope, aspect, curvature algorithm, reproject to EPSG 2100, resampled to 500 m
Aspect	The derived elevation raster dataset	QGIS: SAGA terrain analysis—morphometry slope, aspect, curvature algorithm, reproject to EPSG 2100, resampled to 500 m
Roughness	The derived elevation raster dataset	QGIS: GDAL roughness algorithm, reproject to EPSG 2100, resampled to 500 m
Grassland	Copernicus CLMS, high-resolution raster layer grassland [60]	QGIS: reproject to EPSG 2100, resampled to 500 m
Dominant Leaf Type	Copernicus CLMS, high-resolution raster layer dominant leaf type [61]	QGIS: reproject to EPSG 2100, resampled to 500 m
Number of days with very high fire danger	Copernicus C3S Climate Data Store, fire danger indicators for Europe from 1970 to 2098 derived from climate projections dataset [62]	CDO: sellonlatbox, seltime, remapbil, QGIS: reproject to EPSG 2100, resampled to 500 m, mask using GR regions vector dataset

Table 2. Std-NRMSE metric for all ensemble models (top-performing value is highlighted).

Models	std-NRMSE
XGBoost	0.8451
GBM	0.8330
LightGBM	0.8480
CatBoost	0.8147

Table 3. Feature importance per ensemble model (most important features per model are in bold).

Features	XGBoost	LightGBM	GBM	CatBoost
Land Use
Land use	18.30%	6.80%	12.09%	12.20%
Dominant Leaf Type	0.01%	0.01%	0.01%	0.01%
Grass	5.59%	1.09%	1.00%	1.79%
Land Use Sub-total	23.90%	7.90%	13.10%	14.00%
Topographic
Elevation	19.60%	14.50%	23.50%	16.20%
Slope	3.10%	7.10%	6.80%	4.30%
Aspect	2.40%	8.00%	3.40%	6.80%
Roughness	24.50%	10.20%	18.00%	10.30%
Topographic Weather Index	4.50%	7.00%	5.10%	5.50%
Topographic Sub-total	54.10%	46.80%	56.80%	43.10%
Proximity
Distance Built	4.70%	4.70%	3.00%	5.90%
Distance River	5.10%	15.60%	7.70%	13.00%
Distance Road	3.80%	5.10%	3.10%	5.30%
Proximity Sub-total	13.60%	25.40%	13.80%	24.20%
Climatic
Fire Weather Index	8.40%	19.90%	16.20%	18.60%
Climate Sub-total	8.40%	19.90%	16.20%	18.60%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Symeonidis, P.; Vafeiadis, T.; Ioannidis, D.; Tzovaras, D. Wildfire Susceptibility Mapping in Greece Using Ensemble Machine Learning. Earth 2025, 6, 75. https://doi.org/10.3390/earth6030075

AMA Style

Symeonidis P, Vafeiadis T, Ioannidis D, Tzovaras D. Wildfire Susceptibility Mapping in Greece Using Ensemble Machine Learning. Earth. 2025; 6(3):75. https://doi.org/10.3390/earth6030075

Chicago/Turabian Style

Symeonidis, Panagiotis, Thanasis Vafeiadis, Dimosthenis Ioannidis, and Dimitrios Tzovaras. 2025. "Wildfire Susceptibility Mapping in Greece Using Ensemble Machine Learning" Earth 6, no. 3: 75. https://doi.org/10.3390/earth6030075

APA Style

Symeonidis, P., Vafeiadis, T., Ioannidis, D., & Tzovaras, D. (2025). Wildfire Susceptibility Mapping in Greece Using Ensemble Machine Learning. Earth, 6(3), 75. https://doi.org/10.3390/earth6030075

Article Menu

Wildfire Susceptibility Mapping in Greece Using Ensemble Machine Learning

Abstract

1. Introduction

2. Methods and Data

2.1. Methodology

2.2. Study Area

2.3. Data

2.3.1. Previous Wildfires

2.3.2. Wildfires Factors

3. Ensemble Methods

3.1. Extreme Gradient Boosting

3.2. Gradient Boosting Machine—Light Gradient Boosting Machine

3.3. Categorical Boosting

4. Simulation Setup and Results

4.1. Simulation Setup

4.1.1. Evaluation Metrics

4.1.2. XGBoost

4.1.3. GBM

4.1.4. LGBM

4.1.5. CatBoost

4.2. Simulation Results

5. Experimental Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI