Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach

Gündüz, Halil İbrahim; Torun, Ahmet Tarık; Gezgin, Cemil

doi:10.3390/fire8040121

Open AccessArticle

Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach

by

Halil İbrahim Gündüz

^1,*

,

Ahmet Tarık Torun

²

and

Cemil Gezgin

¹

Department of Geomatics Engineering, Aksaray University, 68100 Aksaray, Turkey

²

Academy of Land Registry and Cadastre, Ankara Hacı Bayram Veli University, 06560 Ankara, Turkey

^*

Author to whom correspondence should be addressed.

Fire 2025, 8(4), 121; https://doi.org/10.3390/fire8040121

Submission received: 27 February 2025 / Revised: 17 March 2025 / Accepted: 20 March 2025 / Published: 21 March 2025

(This article belongs to the Section Fire Science Models, Remote Sensing, and Data)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study was conducted to precisely map burned areas in fire-prone forest regions of İzmir and analyze the spatial distribution of wildfires. Using Sentinel-2 satellite imagery, burn severity was first classified using the dNBR and dNDVI indices. Subsequently, machine learning (ML) algorithms—RF, XGBoost, LightGBM, and AdaBoost—were employed to classify burned and unburned areas. To enhance model performance, hyperparameter optimization was applied, and the results were evaluated using multiple accuracy metrics. This study found that the RF model achieved the highest performance, with an overall accuracy of 98.0% and a Kappa coefficient of 0.960. In comparison, classification based solely on spectral indices resulted in overall accuracies of 86.6% (dNBR) and 81.7% (dNDVI). A key contribution of this study is the integration of Explainable Artificial Intelligence (XAI) through SHapley Additive exPlanations (SHAP) analysis, which was used to interpret the influence of key spectral and environmental variables in burned area classification. SHAP analysis made the model decision processes transparent and identified dNBR, dNDVI, and SWIR/NIR bands as the most influential variables. Furthermore, spatial analyses confirmed that variations in spectral reflectance across fire-affected regions are critical for accurate burned area delineation, particularly in heterogeneous landscapes. This study provides a scientific framework for post-fire ecosystem restoration, fire management, and disaster strategies, offering decision-makers data-driven and effective intervention strategies.

Keywords:

XAI; SHAP; wildfires; burn severity; Sentinel-2; machine learning; dNBR; dNDVI

1. Introduction

As fundamental components of the global ecosystem, forests have vital functions such as maintaining biodiversity, absorption of carbon, preventing soil erosion, regulating the air and water cycle, and stabilizing the climate [1,2]. Furthermore, it has served as both an economic resource (e.g., wood, agricultural land) and a socio-cultural heritage for local communities and nations from ancient times to the present day. Forests, which cover more than 30% of the Earth’s land, provide habitats for millions of species, mitigate the urban heat island effect, and play a key role as a primary source in the global carbon cycle [3,4,5]. Consequently, protecting, properly managing, and restoring forest ecosystems will substantially support efforts to mitigate climate change effects. However, forests—which play a multifaceted role in maintaining the balance of the global climate and granting vital benefits to both the environment and humanity—are increasingly exposed to irreversible risks. These risks arise from escalating human activities (e.g., deforestation, urbanization, conversion to agricultural land) and climate changes induced by global warming (e.g., increased temperatures, drought, extreme weather events) and mostly and particularly wildfires.

In recent years, the rapid pace of global warming has amplified the frequency, intensity, and spread of forest fires, making these ecosystems even more vulnerable [6]. For instance, in recent years, large-scale wildfires in regions such as Australia [7], the Amazon [8], America [9], and Mediterranean areas, including Spain [10,11] and Greece [12], have resulted in the loss of over 7 million hectares of forest worldwide, the degradation of wildlife habitats, the release of significant amounts of carbon into the atmosphere, and adverse effects on both biodiversity and ecosystem functioning [13]. These wildfires, by rapidly releasing the carbon stored in forest biomass into the atmosphere, not only impact ecosystems and climate but also directly undermine the sustainability of both human and wildlife communities, leading to consequences such as the loss of agricultural lands, deteriorating air quality, and damage to local economies [14,15]. Turkey is among the countries most affected by wildfires, a natural disaster that frequently occurs during the summer months in Mediterranean regions and causes sudden and severe ecosystem destruction. Due to its Mediterranean climate characteristics, this region frequently experiences forest fires. More than half of its total forested areas fall within first- and second-degree fire-prone zones, and since 1937, approximately 2 million hectares have been affected by wildfires. As observed on a global scale, the impact of climate change in Turkey has led to meteorological conditions that favor wildfires, resulting in a dramatic increase in wildfire occurrences over the past decades. During the approximately 90-year period from 1937 to 2023, the average annual number of wildfires was 1409. However, in the last decade, this figure has nearly doubled, exceeding 2500 [16,17,18]. In Turkey, where the Mediterranean and Aegean coasts are particularly prone to fire disasters, İzmir has experienced over 700 wildfires in the past 15 years. Based on the number of wildfires and the total burned forest area, it ranks as the third most affected region in the country, making it one of the most fire-prone areas in Turkey. In İzmir, where all forested areas fall within the highest-risk (very high) fire susceptibility zones, ecological and climatic conditions play a major role in both wildfire ignition and spread. Over the past five years, major wildfires—including the Karabağlar-Tırazlı Fire (2019), Aliağa-Çatlıdere Fire (2023), Menderes-Oğlanarası Fire (2023), and Karşıyaka Fire (2024)—have resulted in the loss of thousands of hectares of forest area [19]. Amid the escalating global wildfire crisis, marked by a significant surge in fire frequency and severity over recent decades, the precise and timely identification of burned zones has become paramount to support ecosystem rehabilitation, inform post-fire recovery strategies, and enhance preparedness for mitigating future wildfire hazards.

Although burn severity was traditionally assessed through field observations, the dramatic rise in wildfire occurrences worldwide, along with the challenging and complex topography of many burned areas and their vast spatial extent, has introduced significant limitations in terms of accessibility and assessment speed [20,21]. Thus, due to advancements in technology over the years, UAVs, aircraft, and satellites have made wildfire detection more accessible [22]. However, among these, remote sensing (RS) techniques have been widely used as a primary tool for wildfire monitoring since the 1970s, owing to their ability to cover vast areas, provide continuous data, and offer significant benefits in terms of efficiency and financial resources [11,23,24]. In addition to commonly used satellites such as ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) [25,26] and MODIS/VIIRS (Moderate Resolution Imaging Spectroradiometer) [27,28] for determining post-fire vegetation impacts and mitigating potential damages, recent studies also frequently rely on Landsat mission products and multi-temporal satellite images from Sentinel-2A for burn area research [29,30,31,32]. Among these, Sentinel-2 twin satellites, due to their high spatial resolution (10 m for the visible (RGB) and near-infrared (NIR) bands and 20 m for the shortwave infrared (SWIR) bands), provide the ability to clearly delineate the boundaries of burned areas at the pixel level. In addition, their ability to measure biomass changes—critical for fire severity indices (e.g., dNBR)—and their short temporal repeatability enhance the accuracy of burn area detection compared to other satellite missions [33,34,35].

Depending on the severity of the fire and the features of the area, changes in brightness occur on the land surface due to the loss of soil, cover, and chlorophyll content. As a result of these changes in vegetation, alterations in the spectral reflectance of the affected areas are observed [36,37]. These ecological changes, referred to as burn severity in the burn area, as well as vegetation burn identification, can be made in the pre- and post-fire periods using the mathematical relationships of spectral indices calculated from remotely sensed images. In wildfire burn severity detection, the most common RS-based spectral indices—such as the Normalized Burn Ratio (NBR), Normalized Difference Vegetation Index (NDVI), and variations of these (e.g., Difference Normalized Burn Ratio (dNBR), Difference Normalized Difference Vegetation Index (dNDVI)—are fundamental indices that have widely used for many years [38,39,40,41]. The dNDVI, calculated based on the difference in NDVI values before and after a fire, is one of the fundamental and simple indices commonly used in fire severity analysis to measure changes in vegetation health [42]. It is effective in assessing the ecological impact of a fire by highlighting reductions in plant cover, canopy loss, and changes to the soil surface in the burn area [43,44]. On the other hand, since 2006, the dNBR has been used to obtain fire-related biomass loss by calculating the differences in reflectance values between the SWIR and NIR bands in pre- and post-fire images based on the differences in reflectance values between healthy and disrupted vegetation [45,46,47]. Spectral indices, such as the dNBR (differenced Normalized Burn Ratio) and dNDVI (differenced Normalized Difference Vegetation Index), offer advantages such as simplicity, low computational costs, and rapid detection of burned areas and fire severity. In particular, the dNBR stands out for its consistency with field observations in fire severity assessment [48,49]. However, these indices also have certain limitations that affect their overall effectiveness. They often struggle to accurately reflect the fire severity in heterogeneous ecosystems with diverse vegetation types and species compositions. Moreover, the dNBR has been reported to have low sensitivity in detecting high-intensity fires. Additionally, changes in solar illumination geometry can impact the accuracy of spectral index-based classifications, potentially leading to misinterpretations [47,50,51,52]. These limitations highlight the necessity of carefully interpreting the dNBR and dNDVI within an ecological context and, when needed, complementing them with alternative methods for a more accurate assessment of fire effects. To overcome these limitations of traditional methods, the integration of high-resolution RS data with machine learning (ML)-based hybrid approaches is increasingly preferred in wildfire studies. This approach provides more reliable and accurate results in burned area detection, making it a valuable tool for fire monitoring and assessment [53,54,55,56]. ML methods do not rely on predefined thresholds like spectral indices; instead, they learn from large datasets, allowing them to perform more robustly under varying fire conditions. Supervised ML classifiers, in particular, outperform both spectral index-based methods (e.g., dNBR, NDVI) and parametric classification approaches (e.g., nearest neighbor, maximum likelihood, naïve Bayes) due to their ability to learn nonlinear and complex patterns across multiple spectral bands and indices. Their superior capability in distinguishing heterogeneous landscapes enables a more flexible and accurate separation of burned and unburned pixels. When applied to high spatial resolution imagery, these techniques achieve classification accuracies exceeding 90%, demonstrating their effectiveness in wildfire damage assessment [22,57,58,59]. The advantages of ML methods become even more pronounced due to their variety of algorithms and flexible modeling capabilities. In this context, various approaches stand out among both supervised and unsupervised ML algorithms. Tree-based models—such as Adaptive Boosting (AdaBoost), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM)—as well as linear models (Logistic Regression, Support Vector Machines) and neural network-based approaches (Artificial Neural Networks, Deep Learning) are increasingly favored by researchers and decision-makers for assessing wildfire impacts, including burn scar detection [60,61].

Thus, in this context, this study aims to evaluate the effectiveness of ML-based approaches in wildfire damage assessment by integrating Sentinel-2 RS data with advanced classification techniques. Within the scope of this study, a comprehensive analysis was conducted on the wildfires that occurred between 15–18 August 2024 in the İzmir region, which is highly susceptible to recurring wildfires due to its climatic conditions and high summer temperatures. The main contributions of this study are summarized as follows:

i.: Integration of ML and Spectral Indices for Burned Area Classification: While the dNBR and dNDVI provide an initial assessment of burn severity, this study demonstrates that ML algorithms—RF, LightGBM, XGBoost, and AdaBoost—offer data-driven alternatives that are capable of capturing complex spectral variations beyond simple threshold-based classifications.
ii.: Optimized Model Performance through Hyperparameter Tuning: Unlike conventional approaches, this study applies hyperparameter optimization to improve the predictive accuracy of ML-based burned area detection, which is assessed using multiple evaluation metrics, including the overall accuracy (OA), Kappa coefficient (κ), and F1 score (FS).
iii.: Explainable AI (XAI) for Enhanced Interpretability: The integration of SHAP (SHapley Additive exPlanations) within the Explainable AI (XAI) framework allows for a transparent analysis of the most influential spectral and environmental factors in fire severity classification.
iv.: Scalable and Data-Driven Approach for Fire Management: By leveraging ML’s predictive capabilities alongside XAI-driven explanations, this study provides a robust, interpretable, and scalable methodology for post-fire ecosystem monitoring. The findings are expected to contribute to more effective fire management policies, post-fire recovery planning, and data-driven decision-making in fire-prone regions like İzmir.

2. Materials and Methods

2.1. Study Area

This study was conducted in Izmir, the most populous province and a port city in the Aegean Region of western Turkey. The province of Izmir is bordered by the Madra Mountains to the north, the Gulf of Kuşadası to the south, the Çeşme Peninsula to the west, and the provinces of Aydın and Manisa to the east. Situated between the 37°45′–39°15′ N latitudes and 26°15′–28°20′ E longitudes, Izmir is positioned at an elevation of 2 m above sea level (Figure 1). The province comprises 30 districts and 1298 neighborhoods. Izmir’s climate predominantly exhibits characteristics of the Mediterranean climate. The mountains extend perpendicularly to the sea, allowing maritime influence to reach inland areas through the plains. Consequently, the city experiences a typical Mediterranean climate, characterized by mild and rainy winters and hot and dry summers. However, as the elevation increases in the eastern regions, a transitional climate emerges, blending Mediterranean and continental climate features [62]. An analysis of the monthly average temperature curve of Izmir indicates a significant trend in temperature variations during the fire season, which spans from April to October. Throughout this period, temperatures show a steady increase from April, reaching their peak in August. The average temperature in August has been recorded at 43 °C. In addition to high temperatures, the lowest humidity levels, averaging 52%, are observed in July and August, further increasing the wildfire threat during the summer months [19].

The natural vegetation of Izmir primarily consists of drought-resistant, broad-leaved, needle-leaved, and evergreen tree and shrub species characteristic of the Mediterranean climate. The region is predominantly covered with Turkish red pine (Pinus brutia), stone pine (Pinus pinea), black pine (Pinus nigra), cypress (Cupressus sempervirens), Mediterranean maquis formations, and olive trees (Olea europaea). Izmir Province has a total forested area of 478,547 hectares, accounting for 40% of its total land area, and since all of its forests are located within the Mediterranean climate zone, they are under constant wildfire threat. The extensive forest covers approximately 29.8% of the country’s surface area and, combined with rising summer temperatures, has contributed to an increased frequency of forest fires in the region in recent years. Over the past decade, due to climatic characteristics and rising temperatures, wildfires have become the most prevalent hazard in Izmir. The extent of burned areas has shown a dramatic increase, rising from 303.13 ha in 2013 to 2227.35 ha in 2023 [63,64]. In İzmir, where all forested areas consist of first-degree (highly susceptible) wildfire-prone zones, numerous wildfire disasters have occurred throughout history, including the Kemalpaşa (Nif) Karabel Forest Fire (1918), Gaziemir Forest Fire (1985), Seferihisar Forest Fire (1998), Selçuk-Meryemana Fire (2006), Gaziemir Forest Fire (2008), Karabağlar-Tırazlı Forest Fire (2019), Aliağa-Çatlıdere Forest Fire (2023), and Menderes-Oğlanarası Forest Fire (2023). Considering the potential fire intensity and spread rate, the wildfire that broke out in Karşıyaka, a highly susceptible area, on 15 August 2024, reached some residential areas in the city center and was only brought under control after prolonged efforts; therefore, in this study, this wildfire was selected for comprehensive analyses.

2.2. Dataset

Sentinel-2 is a twin-satellite system operating in a sun-synchronous orbit, consisting of Sentinel-2A and Sentinel-2B, launched on 23 June 2015 and 7 March 2017, respectively. Both satellites have an imaging swath width of 290 km and contain a total of 13 spectral bands. These bands are categorized into three groups based on their spatial resolution: 10 m (visible and near-infrared bands), 20 m (red-edge and shortwave infrared (SWIR) bands sensitive to vegetation), and 60 m (other spectral bands) [65]. In particular, the near-infrared (NIR) and red-edge spectral bands offer a significant advantage in detecting burned areas due to their sensitivity to chlorophyll content and high spatial resolution [66]. In this study, Sentinel-2 bands 2–8 and 11–12 were used (Table 1). The selection of these bands was driven by their proven effectiveness in wildfire analysis. The NIR and SWIR bands were prioritized due to their strong sensitivity to vegetation moisture and burn severity. The red-edge bands were included to capture subtle vegetation stress variations, particularly useful in identifying low- and moderate-severity burn areas. Meanwhile, the visible spectrum bands were used for the vegetation index calculations, such as the NDVI and dNDVI, enhancing the classification process. Conversely, bands that were primarily used for atmospheric correction, such as the coastal aerosol band, water vapor band, and cirrus band, were excluded, as they provide limited value in distinguishing post-fire land surface characteristics.

Sentinel-2 data can be accessed free of charge through the Copernicus Open Access Hub [67] and Google Earth Engine (GEE). Satellite imagery is provided at two different levels: Level-1C, which contains top-of-atmosphere reflectance data, and Level-2A, which includes surface reflectance data after atmospheric corrections. In this study, the Level-2A satellite imagery, which already has atmospheric corrections applied, was used for the analysis. To accurately assess the post-fire pixel reflectance changes, images taken 15 days before and 15 days after the fire were analyzed.

2.3. Methods

In this study, a two-stage classification approach was employed: (i) burn severity classification using spectral indices (dNBR and dNDVI) and (ii) burned area detection using ML. The six-step methodology followed in this study included the following: (i) preprocessing of the Sentinel-2 images, (ii) calculations of the dNBR and dNDVI indices for the burn severity classification, (iii) classification of burn severity using predefined spectral index thresholds, (iv) application of ML algorithms to distinguish burned and unburned areas, (v) model accuracy assessment, and (vi) the determination of variable importance using SHAP analysis (Figure 2).

2.3.1. Image Preprocessing

In this study, Sentinel-2 Level-2A surface reflectance products were obtained from the COPERNICUS/S2_SR dataset using Google Earth Engine (GEE). All images were initially filtered based on the study area, time period, and a cloud cover threshold of less than 5%. Then, the clouds and cloud shadows were masked using the QA60 band and the Scene Classification Layer (SCL). During the preprocessing stage, the Normalized Difference Water Index (NDWI) was calculated to filter water bodies. Subsequently, the NBR and dNBR were derived to analyze the impact of the wildfire. The NDVI was used to assess vegetation changes, and the dNDVI was computed by comparing the pre- and post-fire NDVI values. The mathematical expressions for each index are provided below. Additionally, fire severity classifications based on the NBR thresholds proposed by Key and Benson [45] and the NDVI-based burn severity ranges defined by Morante-Carballo et al. [39] are presented in Table 2. While Key and Benson’s dNBR thresholds were initially developed for boreal forests, multiple studies have validated their applicability in Mediterranean ecosystems, including Turkey, confirming their reliability in post-fire severity classification. Similarly, the NDVI-based burn severity classification has also been widely applied in Mediterranean landscapes, demonstrating its effectiveness in assessing post-fire vegetation recovery and burn impact [12,33,44].

N D W I = \frac{G R E E N - N I R}{G R E E N + N I R}

(1)

N B R = \frac{N I R - S W I R}{N I R + S W I R}

(2)

d N B R = N B R_{p r e} - N B R_{p o s t}

(3)

N D V I = \frac{N I R - R E D}{N I R + R E D}

(4)

d N D V I = {N D V I}_{p r e} - {N D V I}_{p o s t}

(5)

2.3.2. Creation of Training and Test Samples

Following the completion of the image preprocessing stages, 4200 geographically balanced sample points were identified on Google Earth to analyze the severity and spatial distribution of the forest fire. Of these points, 2100 were classified as “burned areas” and 2100 as “unburned areas”. As part of the sampling strategy, a criterion was adopted, ensuring that the selected training pixels represent at least 0.25% of the study area. For classifying fire severity, areas were labeled using the threshold values dNBR > 0.1 and dNDVI > 0.07. However, to ensure the reliability of these thresholds and their effectiveness in distinguishing burned and unburned areas, a detailed accuracy assessment was conducted, as described in Section 2.3.6. This dataset also served as the foundation for training the ML model and assessing its accuracy. To objectively assess the model’s generalization capability, the dataset was split into 80% training and 20% testing. Additionally, to mitigate potential bias from class imbalance, “burned” and “unburned” area samples were equally distributed. This balanced sampling approach not only optimized the training process but also enhanced the reliability of classification performance. The model’s performance analysis was conducted using accuracy, precision, and recall metrics. Due to the balanced dataset and strict threshold values, the developed model’s adaptability to real-world data was quantitatively validated.

2.3.3. Machine Learning Algorithms

ML is a scientific discipline focused on designing and developing algorithms that enable computers to learn from data. It aims to recognize complex patterns and derive insights through data-driven analysis. Since manually analyzing large datasets is often impractical, ML provides an efficient alternative for predictive modeling. Rooted in the concept that machines, similar to humans acquiring expertise through observation, can extract experiential knowledge from raw data [68,69,70], ML enables adaptive decision-making based on past information. This study employs tree-based ML algorithms, with the following sections providing brief descriptions of the AdaBoost, LightGBM, RF, and XGBoost.

AdaBoost, developed by Freund and Schapire [71], is one of the first boosting algorithms and stands out as a leading method. The algorithm aims to create a strong model by iteratively weighting weak learners. During the training process, misclassified examples are re-assessed with higher weights, ensuring that the model focuses on difficult cases. AdaBoost updates weak classifiers by re-weighting data points in each iteration and combines these classifiers according to their weights for the final prediction [71,72]. The algorithm offers advantages such as high accuracy and model interpretability, but it also has limitations due to its sensitivity to noisy data and tendency to overfit. In studies related to forest fires, it is preferred for its strong predictive capacity and adaptive learning ability [73].

LightGBM, introduced by Ke et al. [74], is a tree-based gradient-boosting algorithm designed to improve efficiency and scalability. It reduces computational costs by grouping continuous variables and employs a level-wise tree growth strategy. The algorithm incorporates two key techniques: Gradient-Based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). GOSS selects a subset of data to lower computation costs, while EFB groups mutually exclusive features to enhance efficiency. These innovations enable LightGBM to achieve faster training times and higher accuracy. It has been reported to perform approximately 20% more efficiently than alternative boosting methods on large datasets [74,75,76,77].

The RF algorithm, developed by Breiman [78], is an ensemble learning method that combines bagging and feature selection techniques. Each decision tree in the forest is trained on a distinct bootstrapped subset of the data, reducing overfitting and enhancing generalization. This random sampling increases model diversity and lowers variance. At each node, a random subset of features is selected for splitting, minimizing the correlation between trees and improving robustness against noise. Predictions are aggregated via majority voting (classification) or averaging (regression), reducing individual tree errors and enhancing the overall accuracy. RF is highly effective in predicting wildfire severity by integrating spectral, topographic, and meteorological variables. It classifies burn severity and assesses vegetation recovery using indices like the dNDVI and dNBR. While RF handles noisy data well and performs strongly with diverse variables, its computational efficiency declines with extremely large datasets. Nevertheless, its interpretability and resistance to overfitting make it a preferred choice for ecological and geospatial applications [79,80,81,82,83,84,85].

XGBoost, developed by Chen and Guestrin [86], is a decision tree-based gradient-boosting algorithm that iteratively improves predictions by correcting the errors left by previous trees. The outputs of all trees are weighted and aggregated for the final prediction. To prevent overfitting and enhance efficiency, XGBoost employs row-based and feature-based random subsampling, improving both training speed and model robustness [86]. It outperforms traditional methods in accuracy, speed, and scalability, making it a preferred choice for classification and regression tasks in scientific research [87,88,89]. XGBoost’s high performance has led to widespread adoption in data science platforms. However, it requires meticulous hyperparameter tuning and longer training times for large datasets [86].

2.3.4. Hyperparameters Tuning

Hyperparameters are user-defined settings that shape the training process of a ML model. These values remain constant throughout training and are not directly involved in the learning process. Hyperparameter optimization refers to the process of improving a ML model’s performance by fine-tuning these parameters. Properly optimized hyperparameters can lead to higher accuracy compared to default settings, positively contributing to the model’s efficiency [90,91]. Although hyperparameters can be manually adjusted, this approach becomes unsustainable for models dealing with large datasets due to the high computational cost and the inefficiency of trial-and-error tuning.

In this study, hyperparameter tuning was performed using Optuna, an automated framework known for its efficiency, flexibility, and advanced search capabilities. Tuning was conducted exclusively on the training set, employing an 80/20 split with five-fold cross-validation to optimize model performance while keeping the test set strictly reserved for final evaluation. Unlike traditional grid search or random search methods, which require extensive computational resources and often explore the hyperparameter space inefficiently, Optuna employs Bayesian optimization with the Tree-structured Parzen Estimator algorithm. This approach significantly reduces the number of trials needed to reach optimal hyperparameters by focusing the search on promising regions of the parameter space. One of the key advantages of Optuna is its ability to implement early pruning mechanisms, dynamically stopping underperforming trials before full completion. This feature is particularly beneficial for complex ML models, as it prevents unnecessary computational overheads and accelerates convergence. Additionally, Optuna allows for automated hyperparameter tuning with trial-based learning, adapting the search strategy based on previous iterations, which enhances efficiency and precision in parameter selection. Thus, the selection of Optuna was driven by its ability to balance computational efficiency with high-performance optimization, making it an ideal choice for refining ML models in wildfire classification tasks [92,93]. The optimized hyperparameters for different models in this study are as follows:

AdaBoost: {n_estimators: 461}, {min_samples_split: 7}, {min_samples_leaf: 7} {eta (learning_rate): 0.02787}.

LightGBM: {n_estimators: 352}, {eta (learning_rate): 0.01166}, {max_depth: 10}, {subsample: 0.7};

RF: {n_estimators: 556}, {min_samples_split: 3}, {max_depth: 12}, {min_samples_leaf: 3};

XGBoost: {n_estimators: 650}, {eta (learning_rate): 0.01019}, {max_depth: 15}, {subsample: 0.7}.

2.3.5. SHapley Additive exPlanations

ML methods have achieved the capability to produce robust and accurate predictions for diverse problems. However, they often fail to provide transparent and interpretable insights into the mechanisms underlying these predictions. This limitation has driven researchers to investigate the explainability of ML models. In this context, SHAP emerges as a prominent framework for ensuring fair and consistent explanations of model outputs. SHAP analysis evaluates the impact of variables in ML models by leveraging the principles of Shapley values from game theory. Simply put, Shapley values quantify the relative contribution of each input variable to the model output, thereby identifying the influence of individual features [94].

Shapley values originate from cooperative game theory and aim to fairly measure the contribution of each “player” (i.e., variables) to the collective outcome. Initially proposed by Lloyd Shapley in 1953, this method was developed to address scenarios where a group of players collaboratively achieves a reward, ensuring an equitable distribution of the reward among participants [95]. Adapted to ML, the SHAP framework distributes the impact of features on model outputs according to the equitable distribution principle, thereby attributing “fair shares” of influence to each variable [56].

For linear models experiencing multicollinearity, Shapley regression values serve as feature importances. This approach necessitates retraining the model on all possible subsets from the set S ⊆ F, where F contains all available features. The approach evaluates the significance of each feature by quantifying its effect on model prediction when incorporated. Two models are trained: one,

f_{s \cup \{i\}}

, includes the feature, and the other,

f_{s}

, does not. The predictions from the two models are compared on the current input, which is the difference between two functions:

f_{s \cup \{i\}} (x_{s \cup \{i\}}) - f_{s} (x_{s})

, where

x_{s}

represents the input feature values in set S. The computation of the preceding differences is carried out for all possible subsets S, which are subsets of the set F excluding the feature i. The Shapley values are subsequently calculated and applied as feature attributions. These values represent a calculation of the weighted average of all potential differences [94]. The Shapley value is defined by the following formulation:

φ_{i} = \sum_{S \subseteq F \ {i}} \frac{|S|! (|F| - |S| - 1)!}{|F|!} [f_{s \cup \{i\}} (x_{s \cup \{i\}}) - f_{s} (x_{s})]

(6)

Here, N denotes the set of all features, S represents any subset of features that excludes feature i, and ∣S∣ indicates the number of features within subset S. The term inside the square brackets

[f_{s \cup \{i\}} (x_{s \cup \{i\}}) - f_{s} (x_{s})]

corresponds to the marginal contribution of feature i, which is defined as the difference between the model output obtained using only the subset S and the output when S is combined with feature i [96].

2.3.6. Accuracy Assessment

Accuracy analysis is a crucial step in studies utilizing RS techniques, ensuring the reliability of methodologies and results. Rigorous accuracy assessments are essential to validate findings. In this study, a confusion matrix was employed to evaluate the accuracy of both ML-based wildfire detection and spectral index-based classification. The confusion matrix consists of four key components: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN), which serve as the foundation for various performance metrics. To comprehensively assess the performance of the ML models and the effectiveness of dNBR and dNDVI in burned area classification, multiple validation metrics were employed. These include overall accuracy (OA), Kappa coefficient (κ), precision (PCC), recall (RC), balanced accuracy (BA), false alarm (FA), specificity (SP), missed detection (MD), and F1 score (FS) (Table 3). OA is a measure of how often a model’s classifications are correct, calculated by determining the proportion of instances correctly classified across all classes. κ modifies the original agreement by taking into account the effect of chance, resulting in a measure of the correspondence between forecasted and actual class labels. PCC or positive predictive value, as well as RC and sensitivity, were evaluated at the class level. Precision assesses the accuracy of the predicted wildfires by comparing the number of correctly identified fires to the total number of fires an algorithm has forecast, whereas RC evaluates a model’s capacity to identify actual wildfires by calculating the proportion of successfully detected fires out of all real wildfires. SP, which measures the accurate identification of non-wildfire instances, and BA, the arithmetic mean of RA and SP, were used to counteract the biases resulting from the class imbalance. FS, calculated as the harmonic mean of precision and recall, offered a reliable measure for evaluating performance on datasets with an imbalance in their classes. The FA rate and MD rate were key factors in assessing operational risks, with the FA rate signifying mistaken wildfire warnings and the MD rate indicating undetected actual wildfires. Additionally, the ROC (Receiver Operating Characteristic) curve and AUC (Area Under the Curve) were utilized to evaluate classification performance, particularly for imbalanced datasets and binary classification tasks. The ROC curve illustrates the model’s ability to differentiate between classes at varying thresholds, while the AUC condenses this performance into a single numerical score [97,98]. These metrics collectively enabled a comprehensive assessment of the model’s predictive accuracy, classification reliability, and real-world applicability in wildfire detection. However, while the ROC curve and AUC reveal the overall discriminative power of a model, additional analyses are required to determine whether the differences in classification performance between the two models are statistically significant. For this purpose, the McNemar test has been applied. The McNemar test is a non-parametric test specifically used for comparing dependent observations in binary classification problems. It is used to assess whether the differences in performance between two different ML models working on the same dataset are due to randomness. The test is calculated using a 2 × 2 contingency table, evaluating whether there are asymmetric changes. The fundamental assumption of this test is that the observations are dependent; in other words, both models must be evaluated on the same samples. The computation is based on the values of the discordant cells in the contingency table, which represent the changes between previous and subsequent states. The McNemar test statistic is assessed for significance using the chi-square distribution and is calculated using the following formula [95,99]:

X^{2} = \frac{{(b - c)}^{2}}{b + c}

(7)

In the equation given above, b represents the number of samples that were correctly classified in the first measurement but incorrectly classified in the second measurement, while c denotes the number of samples that were misclassified in the first measurement but correctly classified in the second measurement. If the values of b and c are very similar, it may be inferred that the observed changes are random, leading to the conclusion that there is no statistically significant difference. However, if the obtained p-value is below the predetermined significance level (typically 0.05), it is accepted that there is a statistically significant change between the two conditions [100].

O A = \frac{(T N + T P)}{N}

(8)

P C C = \frac{T P}{(T N + F P)}

(9)

F S = \frac{2 x T P}{(2 x T P + F P + F N)}

(10)

F A = \frac{F P}{(T N + F P)}

(11)

M D = \frac{F N}{(T P + F N)}

(12)

R C = \frac{T P}{(T P + F N)}

(13)

S P = \frac{T N}{(T N + F P)}

(14)

B A = \frac{T N}{(T N + F P)}

(15)

κ = \frac{ρ_{0} - ρ_{e}}{1 - ρ_{e}}

(16)

3. Results

3.1. Index-Based Results

In this study, the spread and severity of forest fires in the study area were mapped using the dNBR and dNDVI indices. Burn severity classification was performed based on spectral threshold values widely accepted in the literature (dNBR: >0.66 high, dNDVI: >0.45 very high), and the detailed distribution of classes is presented in Table 2. A morphological processing technique utilizing focal mode was applied to refine the generated maps and reduce the “salt-and-pepper” effect; after this stage, we obtained the results presented in Figure 3.

Figure 3 reveals the differences in post-fire classifications of the dNBR and dNDVI indices. The dNBR index identifies high-severity burn areas and their surrounding transition zones more distinctly, while the dNDVI shows greater uncertainty and spatial distribution in classifying low-severity and moderate-severity burn areas. According to the dNBR index, the percentage of unburned areas is approximately 75.90%, whereas in the dNDVI index, this percentage is 82.70%. The total percentage of burned areas is about 24.10% in the dNBR, while it is 17.30% in the dNDVI. These rates indicate that the dNDVI classifies some low- and moderate-severity burn areas as unburned, leading to a smaller total burned area estimation compared to the dNBR. Regarding the fire severity classification, the dNBR identifies a total of 13.95% of the area in the low-moderate and moderate-high burn severity categories, whereas the dNDVI detects 8.63% in the low- and moderate-severity categories. When examining the generated maps spatially, the dNBR appears to identify moderate- and high-severity burn areas with a broader distribution, while dNDVI detects a more limited extent in these categories. To conduct a thorough evaluation of these differences, both indices were assessed based on the criteria outlined in Section 2.3.6, with the findings illustrated in Figure 4 and Figure 5.

The confusion matrices presented in Figure 4a,b enabled the derivation of metrics such as OA, κ, PCC, BA, FS, FA, MD, RC, and SP, which are summarized in Figure 4c. Additionally, Figure 5 illustrates the ROC curves and AUC values. The dNBR index’s OA of 0.866 and κ of 0.731 indicate that it produces reliable outcomes regardless of random classification. Furthermore, FA: 17.4% and MD: 9.5% are at relatively low levels, demonstrating that the dNBR minimizes both false positives and false negatives, ensuring a more precise classification. In contrast, the dNDVI shows a lower performance, with an OA of 0.817 and a κ of 0.634. Particularly, higher FA (17.9%) and MD (18.7%) rates compared to the dNBR reveal that the dNDVI exhibits more inaccuracies in distinguishing the fire severity levels. Furthermore, the AUC values in Figure 5 indicate that the dNBR (AUC = 0.865) has greater separation ability than the dNDVI (AUC = 0.817). A detailed analysis of the confusion matrices for both indices also shows that the dNBR index performs more effectively in classifying both fire and non-fire samples. Specifically, the dNBR index correctly identifies 1914 out of 2100 fire samples and 1722 out of 2100 non-fire samples, delivering higher performance in both classes (91.14% accuracy for fire and 82.0% for non-fire). Finally, the McNemar test was conducted to assess the statistical significance of the classification differences between the fire severity maps derived from the dNBR and dNDVI indices. The McNemar test yielded a chi-square statistic of approximately 89.9, which is well above the critical threshold of 3.84. This situation indicates that the differences between the dNBR and dNDVI indices are statistically significant (p < 0.0001), and the variations in classification outcomes are unlikely to be due to random chance. When all these findings are evaluated together, they reveal that the dNBR method performs better than the dNDVI in terms of misclassification. Specifically, the dNBR proves to be a more reliable index for fire detection, as it demonstrates a lower false positive rate and fewer missed fire cases.

3.2. ML-Based Results

The secondary objective of this study is to classify burned and unburned areas using four different ML algorithms (AdaBoost, LightGBM, RF, and XGBoost) and to analyze their spatial distributions. To improve the spatial coherence of the generated maps and reduce random noise (salt-and-pepper effect), a morphological operation utilizing the focal mode was applied. This process has contributed to making the wildfire spread patterns more distinct, ensuring that the predicted areas align more accurately with field conditions, and the obtained results have been visualized in Figure 6.

Figure 6 comparatively displays the fire-spread maps utilized using different ML algorithms. Significant differences in the spatial distribution and continuity of predicted burn areas, boundary clarity, and noise effects were observed among the models. The RF model detected burn area boundaries more clearly, while the LightGBM and XGBoost algorithms exhibited gradual transitions in the decision boundaries, resulting in smoother and more continuous classifications. However, AdaBoost revealed uncertain boundaries, especially in low- and moderate-severity burn areas. Additionally, small-scale pixelation (also known as the “salt-and-pepper” effect) was more pronounced in the predictions of AdaBoost and XGBoost, leading to higher spatial variability in fire zones of low to moderate severity. On the other hand, LightGBM demonstrated moderate spatial consistency, while RF better preserved spatial continuity, comprehensively mapping large burn areas. The maps also revealed variations in the predicted size of fire-spread regions. RF and LightGBM mapped burn areas with broader spatial coverage, whereas AdaBoost and XGBoost predicted narrower burn zones. Specifically, in high-severity burn areas, RF and LightGBM identified these regions more distinctly, while AdaBoost and XGBoost produced fragmented and limited predictions. When considering topographic features and environmental variables, the models exhibited distinct tendencies. RF and LightGBM predicted the fire spread across a wider range of slopes, whereas AdaBoost and XGBoost predominantly mapped burn areas in low-slope regions. Furthermore, AdaBoost and XGBoost misclassified certain high-vegetation-density and urban areas as burned, whereas RF and LightGBM generated fewer false positives in these regions. In order to comprehensively evaluate these differences and determine the optimal model, the performance metrics of each model (OA, κ, PCC, BA, etc.) have been compared and presented in Figure 7 and Figure 8.

Figure 7 presents a comparative analysis of the performance of four ML classifiers across various evaluation metrics. The results reveal that RF (OA = 0.980, κ = 0.960) and XGBoost (OA = 0.979, κ = 0.957) achieved the highest scores in terms of the OA and κ, demonstrating superior classification performance. In contrast, AdaBoost exhibited the lowest performance for these metrics (OA = 0.921, κ = 0.843). Similarly, for BA and PCC, RF (BA = 0.980, PCC = 0.980) and XGBoost (BA = 0.979, PCC = 0.979) outperformed other models, while AdaBoost (BA = 0.921, PCC = 0.921) again yielded the lowest values. Regarding the error rates, the RF model showed the lowest FA and MD (FA = 0.021 MD = 0.019), indicating higher reliability and a lower likelihood of erroneous predictions. Conversely, AdaBoost recorded the highest FA and MD (FA = 0.093, MD = 0.064), suggesting greater susceptibility to false positives. For SP and RC, RF (SP = 0.980, RC = 0.979) and XGBoost (SP = 0.979, RC = 0.976) achieved the highest scores, highlighting their strong ability to distinguish between positive and negative classes. While LightGBM performed competitively overall, its error metrics (FA = 0.026, MD = 0.026) were slightly higher than those of RF and LightGBM. An examination of the AUC values in Figure 8 reveals that the RF and XGBoost algorithms achieved the highest AUC value of 0.998, indicating their superior discriminative power. Similarly, LightGBM followed closely with an AUC of 0.997, demonstrating nearly identical classification performance. In contrast, the AdaBoost algorithm exhibited a slightly lower performance with an AUC value of 0.977, falling behind the other three models. While these results highlight certain performance differences among the classifiers, the statistical significance of these differences was evaluated using the McNemar test, and the findings are summarized in Table 3.

The McNemar test results, presented in Table 3, indicate that the accuracy differences among all algorithms, except for the AdaBoost method, are not statistically significant. Accordingly, it can be concluded that the predictive performances of the RF, LightGBM, and XGBoost models are statistically superior to that of the AdaBoost model. However, the McNemar test did not reveal a statistically significant difference among the RF, LightGBM, and XGBoost models, suggesting that their classification performance is quite similar and that additional factors should be considered when selecting a model. In this context, despite the absence of a statistical difference, when additional criteria such as the OA, error rates, and AUC value are taken into account, the RF model was selected as the most suitable model in this study due to its lowest error rates and highest AUC value. Beyond classification performance, the computational efficiency of each ML model was also evaluated in terms of processing speed and resource usage. Among the tested algorithms, RF and LightGBM exhibited the most efficient training and inference performances, requiring less computational time while maintaining high accuracy. In contrast, XGBoost and AdaBoost demonstrated longer processing durations, particularly during training, due to their iterative boosting mechanisms. Regarding resource utilization, XGBoost and LightGBM were more memory-intensive due to their gradient-boosting architecture, while RF maintained a balance between computational efficiency and memory consumption. AdaBoost, despite requiring more processing time during inference, exhibited relatively lower RAM usage. These findings suggest that RF and LightGBM offer an optimal trade-off between computational efficiency and classification accuracy, making them well-suited for large-scale wildfire detection tasks.

3.3. Analysis of SHAP-Based Feature Importance

Figure 9a presents a detailed SHAP summary plot explaining the decision mechanism of the model based on the spectral bands provided by Sentinel-2 imagery and calculated indices (the dNBR and dNDVI). After training the RF classifier, the most influential features in the prediction process were identified and ranked by importance. In the plot, the X-axis represents SHAP values, while the Y-axis lists the input features. Each point illustrates the impact of a feature on the model’s output, with the blue hues indicating low feature values and the pink hues representing high values.

The detailed analysis of the model outputs, presented in Figure 9a, shows that the dNBR and dNDVI, designed to highlight fire-induced vegetation changes, are among the most effective factors in detecting burned areas. The high positive SHAP values of the dNBR indicate its critical role in distinguishing burned areas. When the dNBR values are high, the model is more likely to classify areas as burned, whereas lower values lead to greater variability in the predictions. Similarly, as the dNDVI values decrease, the SHAP effects transition from negative to positive. These findings suggest that changes in vegetation cover are a decisive factor in the model’s predictions, and lower dNDVI values are more strongly associated with burned areas. Additionally, the Sentinel-2 spectral bands (B6, B7, B8A, B5, and B8) were identified as key inputs for the model. Furthermore, B8A and B5 influence the model in varying ways, with the SHAP contributions shifting at specific threshold values. Furthermore, the B8A and B5 variables also have a certain impact on the model, with the SHAP contributions varying in different directions at specific threshold values. This situation suggests that the NIR bands (B8 and B8A) play a crucial role in detecting post-fire vegetation changes at specific spectral thresholds. Lower B8A values are generally more strongly linked to burned areas, but beyond a certain point, their SHAP contributions show a negative effect. Conversely, bands such as B4, B11, and B2 have a relatively lower impact on model performance. In particular, B4 shows limited sensitivity in post-fire spectral changes and provides a minimal contribution to the model’s predictions. Similarly, B11 and B2 have lower average SHAP values, indicating their lesser role in classification. These findings are supported by Figure 9b, which ranks the variables based on their average SHAP values. The dNBR is clearly the most influential variable, followed by the dNDVI, B6, and B7. The model’s prediction process highlights the dominance of the dNBR and dNDVI, with spectral bands in the SWIR and NIR regions playing significant roles. While the SHAP summary graphs provide valuable insights into the variables influencing the model’s decisions, they do not directly show how individual variable values affect predictions. For this reason, dependency plots in Figure 9 were used to examine the effects of the six most influential variables in greater detail. This analysis not only enhances the understanding of the model’s decision-making mechanism but also clarifies how the influence of each variable varies across specific thresholds (Figure 10).

Figure 10 presents the analysis of the SHAP dependency plots for the variables used in the model predictions. These plots provide a detailed view of how each variable influences the prediction outcomes. Figure 10a shows the impact of the dNBR variable on the model predictions. As the dNBR values increase, a noticeable rise in the SHAP values is observed. This situation indicates that higher dNBR values are strongly associated with burned areas, and the model uses this variable as a distinguishing feature. Specifically, the SHAP contributions are negative at low dNBR values, but once a certain threshold is exceeded, positive effects become more dominant. Figure 10b presents the effect of the dNDVI variable. While low dNDVI values produce a negative SHAP effect, after a certain point, the SHAP values become positive. This situation shows that a decreasing dNDVI is more strongly associated with burned areas. The reduction or degradation of vegetation stands out as a factor that facilitates the model’s fire detection. Figure 10c depicts the impact of the B6 (SWIR1) band on the model’s predictions. The scatter plot reveals a distinct nonlinear relationship: at lower B6 values, the SHAP effect is positive, but as B6 increases beyond a certain threshold, the SHAP values become negative. This suggests that lower B6 values contribute positively to the model’s predictions, whereas higher values have a suppressing effect. Figure 10d illustrates the effect of the B7 band on the model. The SHAP contributions appear more variable for low B7 values but exhibit a negative trend beyond a certain threshold. While low B7 values show a stronger association with burned areas, high B7 values are considered less effective by the model. Figure 10e reveals the impact of the B8A variable on the model predictions. Low B8A values have a positive effect on the SHAP contributions, but beyond a certain threshold, the SHAP values decline quickly. Significant portions of the data are distributed within a specific range where the SHAP effects are positive or neutral. However, at higher B8A values, the negative SHAP effect intensifies, indicating that the model interprets this variable differently depending on specific threshold values. This suggests that beyond a certain spectral intensity, B8A becomes less effective in distinguishing burned areas, as the model assigns it lower importance. Similarly, Figure 10f shows the effect of the B5 variable on the SHAP values. At low B5 values, the SHAP contributions are predominantly positive or neutral, but beyond a specific threshold, the SHAP values shift quickly in a negative direction. This pattern suggests that the model utilizes a distinct threshold for B5 to separate burned from unburned areas. Lower B5 values are more strongly associated with burned regions, while higher B5 values are linked to unburned areas, leading to a negative influence on model predictions. These findings highlight the critical role of B5 in fire severity classification, particularly at specific spectral thresholds. Overall, the analysis underscores that the predictive variables influence the model’s decisions in different ways across various thresholds. Key features such as the dNBR, dNDVI, B6, B7, B8A, and B5 exhibit strong correlations with the burned areas, reinforcing their importance in classification. The SHAP dependency plots provide valuable insights into the model’s sensitivity to each variable, revealing how these factors contribute to distinguishing burned from unburned regions. In particular, specific spectral bands and indices play a crucial role in distinguishing between burned and unburned areas.

4. Discussion

The primary objective of this study was to assess the effectiveness of the dNBR and dNDVI indices in detecting and classifying post-fire burned areas in fire-prone Izmir while examining the contributions of ML-based approaches to this process. Statistical analyses, confusion matrices, and accuracy metrics formed the basis of the evaluation, revealing that the dNBR (OA: 0.866, κ: 0.731) provides higher accuracy than the dNDVI (OA: 0.817, κ: 0.634). The McNemar test results confirmed that this difference is statistically significant. Comparative analyses demonstrated that the dNBR delineates the burn area boundaries more distinctly, providing more reliable results in mapping moderate- and high-severity fire areas, and this finding is consistent with previous studies in the literature [44,101,102]. However, relying solely on spectral indices for burn patch detection does not achieve sufficient accuracy, particularly in classifying heterogeneous and complex burned areas [59,103,104]. Therefore, this study employed ML-based approaches to develop more robust classification models by integrating multiple spectral features alongside spectral indices for burn scar detection. Performance evaluations of the ML algorithms revealed that models such as RF, LightGBM, XGBoost, and AdaBoost achieved higher accuracies than the index-based methods. Among them, the RF model outperformed all others across the accuracy metrics (OA: 0.980, κ: 0.960, AUC: 0.998), followed closely by XGBoost (OA: 0.979, κ: 0.957, AUC: 0.998) and LightGBM (OA: 0.974, κ: 0.948, AUC: 0.997). In contrast, AdaBoost exhibited a lower accuracy (OA: 0.921, κ: 0.843, AUC: 0.977) and higher error rates. These findings are consistent with studies by Bar et al. [34], Lee et al. [105], and Iban and Aksu [56], which identified RF as one of the most reliable models for post-fire burned area detection. To determine whether the performance differences among ML algorithms were statistically significant, the McNemar test was applied, showing no significant differences between RF, XGBoost, and LightGBM. This suggests that these models exhibit similar fire pattern identification capabilities. A SHAP analysis was conducted to interpret the model’s decision-making processes, revealing that the dNBR and dNDVI were the most influential variables, with the SWIR and NIR bands playing critical roles. However, the SHAP results also highlighted that the impact of spectral bands varies across thresholds, indicating model sensitivity to spectral data distributions beyond the indices alone. These findings underscore the advantage of ML-based approaches in integrating spectral features for higher accuracy in post-fire mapping.

Therefore, this study offers scientific data on forests, which are one of the fundamental components of the global ecosystem and play a critical role in maintaining climate balance, thereby providing important ecological and socio-economic insights. Given the increasing frequency and intensity of wildfires on a global scale due to global warming, improving post-fire assessment methods in terms of accuracy and reliability is a crucial necessity for both disaster management and land planning processes today. Accurately mapping burn severity with high precision is essential for the effective management of wildfires, ecosystem protection, and post-disaster recovery planning. The high-accuracy ML models used in this study, with their precision and reliability in assessment processes, can help identify priority areas for post-fire ecosystem rehabilitation. In this context, these findings are considered important for post-fire land management, reforestation initiatives, and the sustainability of natural ecosystems. Additionally, environmental factors directly shape post-fire ecosystem dynamics and influence recovery processes. Key parameters such as soil moisture, vegetation regeneration time, and climatic variables play a critical role in the post-fire ecosystem recovery. The spectral analysis techniques and ML models utilized in this study offer powerful tools for comprehensively understanding ecosystem changes before and after wildfires. By assessing factors such as vegetation regrowth potential, soil nutrient depletion, and habitat degradation, these methods contribute to a deeper understanding of wildfire impacts. Furthermore, this study provides a basis for minimizing future fire risks, enhancing emergency response strategies, and evaluating erosion hazards. It also provides insights into climate change, carbon emissions, and fire severity, thereby supporting forest conservation strategies, environmental sustainability, and biodiversity protection. In this context, the results of this study function as a regional framework for decision-makers and local authorities in designing long-term plans for post-fire ecosystem management. Moreover, integrating RS and ML techniques into fire prevention and management policies will enable authorities to develop faster and more effective response strategies. These approaches are essential for creating wildfire risk maps, identifying vulnerable areas, and ensuring the long-term protection of ecosystems and biodiversity.

5. Conclusions

This study demonstrates the effectiveness of integrating Sentinel-2 RS data with ML techniques for detecting and classifying post-fire burn areas in İzmir, a region highly susceptible to wildfires. The findings indicate that while traditional spectral indices, such as the dNBR and dNDVI, provide a foundational approach for identifying burned areas, their classification accuracy is limited, particularly in heterogeneous landscapes. To address these limitations, ML models, including RF, XGBoost, LightGBM, and AdaBoost, were employed, significantly improving the accuracy of the burned and unburned area classifications. As part of XAI, a SHAP analysis was utilized to quantitatively assess the contribution of different spectral features to the model’s decision-making process, improving both transparency and interpretability. This analysis identified the key variables that influence burn severity predictions, increasing model reliability and comprehensibility. The results emphasize the growing wildfire risks driven by global warming and emphasize the necessity of data-driven approaches in post-fire assessment and disaster management. Such methodologies play a critical role in post-fire recovery processes, providing significant advantages in ecosystem restoration, identifying high-risk areas, and designing wildfire protection strategies. However, effective wildfire management requires not only post-fire assessments but also proactive pre-fire risk analyses. ML-based analyses in pre-fire stages facilitate the early detection of fire-prone areas, enabling the implementation of preventative strategies to mitigate wildfire impacts or even prevent their occurrence. Additionally, these data-driven models enhance response planning efficiency, providing decision-makers with the tools to implement proactive wildfire prevention and management strategies. They offer practical applications for local governments and environmental policymakers, enhancing their ability to develop targeted and effective mitigation measures. Despite its contributions, this study is constrained by the current conditions and the spatial variables used. To improve fire spread and burn severity modeling, incorporating additional environmental factors—such as land surface temperature, soil moisture, wind speed, and direction—could lead to a more comprehensive understanding of wildfire dynamics. Furthermore, integrating higher-resolution satellite imagery and data fusion techniques that combine optical and radar data may enhance classification robustness by reducing the errors caused by spectral similarities between land cover types. The quality of training datasets also plays a critical role in classification performance. Expanding field-validated training datasets can reduce model uncertainties, leading to more reliable and generalizable results. Ultimately, this study provides a robust and innovative framework for wildfire burned area detection by integrating the RS and ML techniques while enhancing the transparency of ML outcomes through SHAP-driven XAI. By enhancing classification techniques and resolving modeling uncertainties, future research can aid in creating more precise and scalable burned area assessment models, thereby supporting wildfire studies and decision-making in fire-prone regions.

Author Contributions

Conceptualization, H.İ.G. and C.G.; methodology, H.İ.G., A.T.T. and C.G.; software, H.İ.G.; validation, H.İ.G. and C.G.; writing—original draft preparation, H.İ.G., A.T.T. and C.G.; writing—review and editing, H.İ.G., A.T.T. and C.G.; visualization, H.İ.G., A.T.T. and C.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the author.

Acknowledgments

The author expresses their sincere gratitude to the academic editors and reviewers for their valuable comments and constructive suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Barrios, E.; Valencia, V.; Jonsson, M.; Brauman, A.; Hairiah, K.; Mortimer, P.E.; Okubo, S. Contribution of Trees to the Conservation of Biodiversity and Ecosystem Services in Agricultural Landscapes. Int. J. Biodivers. Sci. Ecosyst. Serv. Manag. 2018, 14, 1–16. [Google Scholar]
Psistaki, K.; Tsantopoulos, G.; Paschalidou, A.K. An Overview of the Role of Forests in Climate Change Mitigation. Sustainability 2024, 16, 6089. [Google Scholar] [CrossRef]
Thompson, I.; Mackey, B.; McNulty, S.; Mosseler, A. Forest Resilience, Biodiversity, and Climate Change: A Synthesis of the Biodiversity/Resiliende/Stability Relationship in Forest Ecosystems; Thompson, I.D., Ed.; CBD technical series; Secretariat of the Convention on Biological Diversity: Montreal, QC, Canada, 2009; ISBN 978-92-9225-137-6.
Gago, E.J.; Roldan, J.; Pacheco-Torres, R.; Ordóñez, J. The City and Urban Heat Islands: A Review of Strategies to Mitigate Adverse Effects. Renew. Sustain. Energy Rev. 2013, 25, 749–758. [Google Scholar] [CrossRef]
Hisano, M.; Searle, E.B.; Chen, H.Y.H. Biodiversity as a Solution to Mitigate Climate Change Impacts on the Functioning of Forest Ecosystems. Biol. Rev. Camb. Philos. Soc. 2018, 93, 439–456. [Google Scholar] [CrossRef]
Kinoshita, A.M.; Chin, A.; Simon, G.L.; Briles, C.; Hogue, T.S.; O’Dowd, A.P.; Gerlak, A.K.; Albornoz, A.U. Wildfire, Water, and Society: Toward Integrative Research in the “Anthropocene”. Anthropocene 2016, 16, 16–27. [Google Scholar] [CrossRef]
Bowman, D.M.J.S.; Williamson, G.J.; Gibson, R.K.; Bradstock, R.A.; Keenan, R.J. The Severity and Extent of the Australia 2019–20 Eucalyptus Forest Fires Are Not the Legacy of Forest Management. Nat. Ecol. Evol. 2021, 5, 1003–1010. [Google Scholar] [CrossRef]
dos Reis, M.; de Alencastro Graça, P.M.L.; Yanai, A.M.; Ramos, C.J.P.; Fearnside, P.M. Forest Fires and Deforestation in the Central Amazon: Effects of Landscape and Climate on Spatial and Temporal Dynamics. J. Environ. Manag. 2021, 288, 112310. [Google Scholar] [CrossRef]
Cova, G.; Kane, V.R.; Prichard, S.; North, M.; Cansler, C.A. The Outsized Role of California’s Largest Wildfires in Changing Forest Burn Patterns and Coarsening Ecosystem Scale. For. Ecol. Manag. 2023, 528, 120620. [Google Scholar] [CrossRef]
Fernández-Guisuraga, J.M.; Suárez-Seoane, S.; García-Llamas, P.; Calvo, L. Vegetation Structure Parameters Determine High Burn Severity Likelihood in Different Ecosystem Types: A Case Study in a Burned Mediterranean Landscape. J. Environ. Manag. 2021, 288, 112462. [Google Scholar] [CrossRef]
Balde, B.; Vega-Garcia, C.; Gelabert, P.J.; Ameztegui, A.; Rodrigues, M. The Relationship between Fire Severity and Burning Efficiency for Estimating Wildfire Emissions in Mediterranean Forests. J. For. Res. 2023, 34, 1195–1206. [Google Scholar] [CrossRef]
Dosiou, A.; Athinelis, I.; Katris, E.; Vassalou, M.; Kyrkos, A.; Krassakis, P.; Parcharidis, I. Employing Copernicus Land Service and Sentinel-2 Satellite Mission Data to Assess the Spatial Dynamics and Distribution of the Extreme Forest Fires of 2023 in Greece. Fire 2024, 7, 20. [Google Scholar] [CrossRef]
Artés, T.; Oom, D.; de Rigo, D.; Durrant, T.H.; Maianti, P.; Libertà, G.; San-Miguel-Ayanz, J. A Global Wildfire Dataset for the Analysis of Fire Regimes and Fire Behaviour. Sci. Data 2019, 6, 296. [Google Scholar] [CrossRef]
Yue, C.; Ciais, P.; Cadule, P.; Thonicke, K.; van Leeuwen, T.T. Modelling the Role of Fires in the Terrestrial Carbon Balance by Incorporating SPITFIRE into the Global Vegetation Model ORCHIDEE—Part 2: Carbon Emissions and the Role of Fires in the Global Carbon Balance. Geosci. Model Dev. 2015, 8, 1321–1338. [Google Scholar] [CrossRef]
Rozario, P.F.; Madurapperuma, B.D.; Wang, Y. Remote Sensing Approach to Detect Burn Severity Risk Zones in Palo Verde National Park, Costa Rica. Remote Sens. 2018, 10, 1427. [Google Scholar] [CrossRef]
Tepley, A.J.; Thomann, E.; Veblen, T.T.; Perry, G.L.W.; Holz, A.; Paritsis, J.; Kitzberger, T.; Anderson-Teixeira, K.J. Influences of Fire–Vegetation Feedbacks and Post-Fire Recovery Rates on Forest Landscape Vulnerability to Altered Fire Regimes. J. Ecol. 2018, 106, 1925–1940. [Google Scholar] [CrossRef]
General Directory of Forestry. Forest Fire Fighting Activities 2023 Evaluation Report; General Directory of Forestry: Ankara, Turkey, 2023. Available online: https://www.ogm.gov.tr (accessed on 23 December 2024).
Güngöroğlu, C.; Özkara, Z.U.; Tutmaz, V. Türkiye’de Orman Yangın Yönetimi: Sorunlar ve Çözüm Önerileri. Memleket. Siyaset. Önetim. 2024, 19, 517–570. [Google Scholar] [CrossRef]
Izmir Governorship Provincial Disaster and Emergency Directorate. Izmir Provincial Disaster Risk Reduction Plan; Izmir Governorship Provincial Disaster and Emergency Directorate: Istanbul, Turkey, 2024; 326p. [Google Scholar]
Cardil, A.; Mola-Yudego, B.; Blázquez-Casado, Á.; González-Olabarria, J.R. Fire and Burn Severity Assessment: Calibration of Relative Differenced Normalized Burn Ratio (RdNBR) with Field Data. J. Environ. Manag. 2019, 235, 342–349. [Google Scholar] [CrossRef]
Boroujeni, S.P.H.; Razi, A.; Khoshdel, S.; Afghah, F.; Coen, J.L.; O’Neill, L.; Fule, P.; Watts, A.; Kokolakis, N.-M.T.; Vamvoudakis, K.G. A Comprehensive Survey of Research towards AI-Enabled Unmanned Aerial Systems in Pre-, Active-, and Post-Wildfire Management. Inf. Fusion 2024, 108, 102369. [Google Scholar] [CrossRef]
Dixon, D.J.; Callow, J.N.; Duncan, J.M.A.; Setterfield, S.A.; Pauli, N. Regional-Scale Fire Severity Mapping of Eucalyptus Forests with the Landsat Archive. Remote Sens. Environ. 2022, 270, 112863. [Google Scholar] [CrossRef]
Chuvieco, E.; Aguado, I.; Salas, J.; García, M.; Yebra, M.; Oliva, P. Satellite Remote Sensing Contributions to Wildland Fire Science and Management. Curr. For. Rep. 2020, 6, 81–96. [Google Scholar] [CrossRef]
Kurbanov, E.; Vorobev, O.; Lezhnin, S.; Sha, J.; Wang, J.; Li, X.; Cole, J.; Dergunov, D.; Wang, Y. Remote Sensing of Forest Burnt Area, Burn Severity, and Post-Fire Recovery: A Review. Remote Sens. 2022, 14, 4714. [Google Scholar] [CrossRef]
Lanorte, A.; Danese, M.; Lasaponara, R.; Murgante, B. Multiscale Mapping of Burn Area and Severity Using Multisensor Satellite Data and Spatial Autocorrelation Analysis. Int. J. Appl. Earth Obs. Geoinf. 2013, 20, 42–51. [Google Scholar] [CrossRef]
İban, M.C.; Şahin, E. Monitoring Burn Severity and Air Pollutants in Wildfire Events Using Remote Sensing Data: The Case of Mersin Wildfires in Summer 2021. Gümüşhane Üniversitesi Fen Bilim. Derg. 2022, 12, 487–497. [Google Scholar] [CrossRef]
Urbanski, S.P.; Salmon, J.M.; Nordgren, B.L.; Hao, W.M. A MODIS Direct Broadcast Algorithm for Mapping Wildfire Burned Area in the Western United States. Remote Sens. Environ. 2009, 113, 2511–2526. [Google Scholar] [CrossRef]
Libonati, R.; DaCamara, C.C.; Pereira, J.M.C.; Peres, L.F. Retrieving Middle-Infrared Reflectance for Burned Area Mapping in Tropical Environments Using MODIS. Remote Sens. Environ. 2010, 114, 831–843. [Google Scholar] [CrossRef]
Krylov, A.; McCarty, J.L.; Potapov, P.; Loboda, T.; Tyukavina, A.; Turubanova, S.; Hansen, M.C. Remote Sensing Estimates of Stand-Replacement Fires in Russia, 2002–2011. Environ. Res. Lett. 2014, 9, 105007. [Google Scholar] [CrossRef]
Chu, T.; Guo, X.; Takeda, K. Remote Sensing Approach to Detect Post-Fire Vegetation Regrowth in Siberian Boreal Larch Forest. Ecol. Indic. 2016, 62, 32–46. [Google Scholar] [CrossRef]
Howe, A.A.; Parks, S.A.; Harvey, B.J.; Saberi, S.J.; Lutz, J.A.; Yocom, L.L. Comparing Sentinel-2 and Landsat 8 for Burn Severity Mapping in Western North America. Remote Sens. 2022, 14, 5249. [Google Scholar] [CrossRef]
Henry, M.C.; Maingi, J.K. Evaluating Landsat- and Sentinel-2-Derived Burn Indices to Map Burn Scars in Chyulu Hills, Kenya. Fire 2024, 7, 472. [Google Scholar] [CrossRef]
Mallinis, G.; Mitsopoulos, I.; Chrysafi, I. Evaluating and Comparing Sentinel 2A and Landsat-8 Operational Land Imager (OLI) Spectral Indices for Estimating Fire Severity in a Mediterranean Pine Ecosystem of Greece. GIScience Remote Sens. 2018, 55, 1–18. [Google Scholar] [CrossRef]
Bar, S.; Parida, B.R.; Pandey, A.C. Landsat-8 and Sentinel-2 Based Forest Fire Burn Area Mapping Using Machine Learning Algorithms on GEE Cloud Platform over Uttarakhand, Western Himalaya. Remote Sens. Appl. Soc. Environ. 2020, 18, 100324. [Google Scholar] [CrossRef]
Mashhadi, N.; Alganci, U. Determination of Forest Burn Scar and Burn Severity from Free Satellite Images: A Comparative Evaluation of Spectral Indices and Machine Learning Classifiers. Int. J. Environ. Geoinform. 2021, 8, 488–497. [Google Scholar] [CrossRef]
Boer, M.M.; Macfarlane, C.; Norris, J.; Sadler, R.J.; Wallace, J.; Grierson, P.F. Mapping Burned Areas and Burn Severity Patterns in SW Australian Eucalypt Forest Using Remotely-Sensed Changes in Leaf Area Index. Remote Sens. Environ. 2008, 112, 4358–4369. [Google Scholar] [CrossRef]
Robichaud, P.R.; Lewis, S.A.; Laes, D.Y.M.; Hudak, A.T.; Kokaly, R.F.; Zamudio, J.A. Postfire Soil Burn Severity Mapping with Hyperspectral Image Unmixing. Remote Sens. Environ. 2007, 108, 467–480. [Google Scholar] [CrossRef]
Chen, D.; Fu, C.; Hall, J.V.; Hoy, E.E.; Loboda, T.V. Spatio-Temporal Patterns of Optimal Landsat Data for Burn Severity Index Calculations: Implications for High Northern Latitudes Wildfire Research. Remote Sens. Environ. 2021, 258, 112393. [Google Scholar] [CrossRef]
Morante-Carballo, F.; Bravo-Montero, L.; Carrión-Mero, P.; Velastegui-Montoya, A.; Berrezueta, E. Forest Fire Assessment Using Remote Sensing to Support the Development of an Action Plan Proposal in Ecuador. Remote Sens. 2022, 14, 1783. [Google Scholar] [CrossRef]
Gupta, P.; Shukla, A.K.; Shukla, D.P. Sentinel 2 Based Burn Severity Mapping and Assessing Post-Fire Impacts on Forests and Buildings in the Mizoram, a North-Eastern Himalayan Region. Remote Sens. Appl. Soc. Environ. 2024, 36, 101279. [Google Scholar] [CrossRef]
Tiengo, R.; Merino-De-Miguel, S.; Uchôa, J.; Guiomar, N.; Gil, A. Burned Areas Mapping Using Sentinel-2 Data and a Rao’s Q Index-Based Change Detection Approach: A Case Study in Three Mediterranean Islands’ Wildfires (2019–2022). Remote Sens. 2025, 17, 830. [Google Scholar] [CrossRef]
Gitelson, A.A. Remote Estimation of Crop Fractional Vegetation Cover: The Use of Noise Equivalent as an Indicator of Performance of Vegetation Indices. Int. J. Remote Sens. 2013, 34, 6054–6066. [Google Scholar] [CrossRef]
Miller, J.D.; Thode, A.E. Quantifying Burn Severity in a Heterogeneous Landscape with a Relative Version of the Delta Normalized Burn Ratio (dNBR). Remote Sens. Environ. 2007, 109, 66–80. [Google Scholar] [CrossRef]
Yilmaz, O.S.; Acar, U.; Sanli, F.B.; Gulgen, F.; Ates, A.M. Mapping Burn Severity and Monitoring CO Content in Türkiye’s 2021 Wildfires, Using Sentinel-2 and Sentinel-5P Satellite Data on the GEE Platform. Earth Sci. Inform. 2023, 16, 221–240. [Google Scholar] [CrossRef] [PubMed]
Key, C.H.; Benson, N.C. Landscape assessment (LA). In FIREMON: Fire Effects Monitoring and Inventory System; Gen. Tech. Rep. RMRS-GTR-164-CD; US Department of Agriculture, Forest Service, Rocky Mountain Research Station: Fort Collins, CO, USA, 2006; p. LA-1-55. [Google Scholar]
Boucher, J.; Beaudoin, A.; Hébert, C.; Guindon, L.; Bauce, É. Assessing the Potential of the Differenced Normalized Burn Ratio (dNBR) for Estimating Burn Severity in Eastern Canadian Boreal Forests. Int. J. Wildland Fire 2017, 26, 32. [Google Scholar] [CrossRef]
Chen, D.; Loboda, T.V.; Hall, J.V. A Systematic Evaluation of Influence of Image Selection Process on Remote Sensing-Based Burn Severity Indices in North American Boreal Forest and Tundra Ecosystems. ISPRS J. Photogramm. Remote Sens. 2020, 159, 63–77. [Google Scholar] [CrossRef]
Lutes, D.C.; Keane, R.E.; Caratti, J.F.; Key, C.H.; Benson, N.C.; Sutherland, S.; Gangi, L.J. FIREMON: Fire Effects Monitoring and Inventory System; U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station: Fort Collins, CO, USA, 2006; p. RMRS-GTR-164. [Google Scholar]
Mathews, L.E.H.; Kinoshita, A.M. Urban Fire Severity and Vegetation Dynamics in Southern California. Remote Sens. 2020, 13, 19. [Google Scholar] [CrossRef]
French, N.H.F.; Kasischke, E.S.; Hall, R.J.; Murphy, K.A.; Verbyla, D.L.; Hoy, E.E.; Allen, J.L. Using Landsat Data to Assess Fire and Burn Severity in the North American Boreal Forest Region: An Overview and Summary of Results. Int. J. Wildland Fire 2008, 17, 443. [Google Scholar] [CrossRef]
Veraverbeke, S.; Verstraeten, W.W.; Lhermitte, S.; Goossens, R. Illumination Effects on the Differenced Normalized Burn Ratio’s Optimality for Assessing Fire Severity. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 60–70. [Google Scholar] [CrossRef]
Lewis, S.A.; Hudak, A.T.; Robichaud, P.R.; Morgan, P.; Satterberg, K.L.; Strand, E.K.; Smith, A.M.S.; Zamudio, J.A.; Lentile, L.B. Indicators of Burn Severity at Extended Temporal Scales: A Decade of Ecosystem Response in Mixed-Conifer Forests of Western Montana. Int. J. Wildland Fire 2022, 26, 755–771. [Google Scholar] [CrossRef]
Mithal, V.; Nayak, G.; Khandelwal, A.; Kumar, V.; Nemani, R.; Oza, N.C. Mapping Burned Areas in Tropical Forests Using a Novel Machine Learning Framework. Remote Sens. 2018, 10, 69. [Google Scholar] [CrossRef]
Quintano, C.; Fernández-Manso, A.; Roberts, D.A. Enhanced Burn Severity Estimation Using Fine Resolution ET and MESMA Fraction Images with Machine Learning Algorithm. Remote Sens. Environ. 2020, 244, 111815. [Google Scholar] [CrossRef]
Tonbul, H.; Colkesen, I.; Kavzoglu, T. Pixel- and Object-Based Ensemble Learning for Forest Burn Severity Using USGS FIREMON and Mediterranean Condition dNBRs in Aegean Ecosystem (Turkey). Adv. Space Res. 2022, 69, 3609–3632. [Google Scholar] [CrossRef]
İban, M.C.; Aksu, O. SHAP-Driven Explainable Artificial Intelligence Framework for Wildfire Susceptibility Mapping Using MODIS Active Fire Pixels: An In-Depth Interpretation of Contributing Factors in Izmir, Türkiye. Remote Sens. 2024, 16, 2842. [Google Scholar] [CrossRef]
Meddens, A.J.H.; Kolden, C.A.; Lutz, J.A. Detecting Unburned Areas within Wildfire Perimeters Using Landsat and Ancillary Data across the Northwestern United States. Remote Sens. Environ. 2016, 186, 275–285. [Google Scholar] [CrossRef]
Ramo, R.; Chuvieco, E. Developing a Random Forest Algorithm for MODIS Global Burned Area Classification. Remote Sens. 2017, 9, 1193. [Google Scholar] [CrossRef]
Collins, L.; Griffioen, P.; Newell, G.; Mellor, A. The Utility of Random Forests for Wildfire Severity Mapping. Remote Sens. Environ. 2018, 216, 374–384. [Google Scholar] [CrossRef]
Kanwal, R.; Rafaqat, W.; Iqbal, M.; Weiguo, S. Data-Driven Approaches for Wildfire Mapping and Prediction Assessment Using a Convolutional Neural Network (CNN). Remote Sens. 2023, 15, 5099. [Google Scholar] [CrossRef]
Singha, C.; Swain, K.C.; Moghimi, A.; Foroughnia, F.; Swain, S.K. Integrating Geospatial, Remote Sensing, and Machine Learning for Climate-Induced Forest Fire Susceptibility Mapping in Similipal Tiger Reserve, India. For. Ecol. Manag. 2024, 555, 121729. [Google Scholar] [CrossRef]
Republic of Turkiye Ministry of Environment, Urbanization and Climate Change. Izmir Province 2023 Environmental Status Report; Republic of Türkiye Ministry of Environment, Urbanization and Climate Change: Ankara, Turkey, 2024.
Fire Statistics. Available online: https://itfaiye.izmir.bel.tr/tr/IstatislikDetay/8686/9/2024 (accessed on 15 February 2025).
The General Directorate of Forestry. Available online: https://www.ogm.gov.tr/en/e-library/official-statistics (accessed on 17 March 2025).
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Sánchez-Espinosa, A.; Schröder, C. Land Use and Land Cover Mapping in Wetlands One Step Closer to the Ground: Sentinel-2 versus Landsat 8. J. Environ. Manag. 2019, 247, 484–498. [Google Scholar] [CrossRef]
Ecosystem, C.D.S. Copernicus Data Space Ecosystem|Europe’s Eyes on Earth. Available online: https://dataspace.copernicus.eu/ (accessed on 27 December 2024).
Bilgilioğlu, S.S.; Yılmaz, H.M. Comparison of Different Machine Learning Models for Mass Appraisal of Real Estate. Surv. Rev. 2023, 55, 32–43. [Google Scholar] [CrossRef]
Michalski, R.S.; Kodratoff, Y. 1—RESEARCH IN MACHINE LEARNING: Recent Progress, Classification of Methods, and Future Directions. In Machine Learning; Kodratoff, Y., Michalski, R.S., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1990; pp. 3–30. ISBN 978-0-08-051055-2. [Google Scholar]
Mitchell, T.M. Machine Learning, 1st ed.; McGraw-Hill, Inc.: New York, NY, USA, 1997; ISBN 978-0-07-042807-2. [Google Scholar]
Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Malashin, I.; Tynchenko, V.; Gantimurov, A.; Nelyub, V.; Borodulin, A. Boosting-Based Machine Learning Applications in Polymer Science: A Review. Polymers 2025, 17, 499. [Google Scholar] [CrossRef] [PubMed]
Wu, X.; Zhang, G.; Yang, Z.; Tan, S.; Yang, Y.; Pang, Z. Machine Learning for Predicting Forest Fire Occurrence in Changsha: An Innovative Investigation into the Introduction of a Forest Fuel Factor. Remote Sens. 2023, 15, 4208. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Candido, C.; Blanco, A.C.; Medina, J.; Gubatanga, E.; Santos, A.; Ana, R.S.; Reyes, R.B. Improving the Consistency of Multi-Temporal Land Cover Mapping of Laguna Lake Watershed Using Light Gradient Boosting Machine (LightGBM) Approach, Change Detection Analysis, and Markov Chain. Remote Sens. Appl. Soc. Environ. 2021, 23, 100565. [Google Scholar] [CrossRef]
İban, M.C.; Bilgilioğlu, S.S. Snow Avalanche Susceptibility Mapping Using Novel Tree-Based Machine Learning Algorithms (XGBoost, NGBoost, and LightGBM) with eXplainable Artificial Intelligence (XAI) Approach. Stoch. Environ. Res. Risk Assess. 2023, 37, 2243–2270. [Google Scholar] [CrossRef]
Üstüner, M.; Balık Şanlı, F. Çok zamanlı polarimetrik SAR verileri ile tarımsal ürünlerin sınıflandırılması. J. Geod. Geoinf. 2020, 7, 1–10. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Wang, L.; Zhou, X.; Zhu, X.; Dong, Z.; Guo, W. Estimation of Biomass in Wheat Using Random Forest Regression Algorithm and Remote Sensing Data. Crop J. 2016, 4, 212–219. [Google Scholar] [CrossRef]
Sahin, E.K.; Colkesen, I.; Kavzoglu, T. A Comparative Assessment of Canonical Correlation Forest, Random Forest, Rotation Forest and Logistic Regression Methods for Landslide Susceptibility Mapping. Geocarto Int. 2020, 35, 341–363. [Google Scholar] [CrossRef]
Bilotta, G.; Meduri, G.M.; Genovese, E.; Bibbò, L.; Barrile, V. Safeguarding the Aspromonte Forests: Random Forests and Markov Chains as Forecasting Models for Predicting Land Transformations. Forests 2025, 16, 290. [Google Scholar] [CrossRef]
Çömert, R.; Matci, D.K.; Avdan, U. Object Based Burned Area Mapping with Random Forest Algorithm. Int. J. Eng. Geosci. 2019, 4, 78–87. [Google Scholar] [CrossRef]
Eker, R.; Aydın, A. Predicting Potential Fire Severity in Türkiye’s Diverse Forested Areas: A SHAP-Integrated Random Forest Classification Approach. Stoch. Environ. Res. Risk Assess. 2024, 38, 4607–4628. [Google Scholar] [CrossRef]
Ismailoglu, I.; Musaoglu, N. Burn Severity Assessment with Different Remote Sensing Products for Wildfire Damage Analysis. In Proceedings of the Earth Observing Systems XXVIII, SPIE, San Diego, CA, USA, 4 October 2023; Volume 12685, pp. 220–226. [Google Scholar]
Gündüz, H.İ. Land-Use Land-Cover Dynamics and Future Projections Using GEE, ML, and QGIS-MOLUSCE: A Case Study in Manisa. Sustainability 2025, 17, 1363. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Lin, N.; Zhang, D.; Feng, S.; Ding, K.; Tan, L.; Wang, B.; Chen, T.; Li, W.; Dai, X.; Pan, J.; et al. Rapid Landslide Extraction from High-Resolution Remote Sensing Images Using SHAP-OPT-XGBoost. Remote Sens. 2023, 15, 3901. [Google Scholar] [CrossRef]
Militino, A.F.; Goyena, H.; Pérez-Goya, U.; Ugarte, M.D. Logistic Regression versus XGBoost for Detecting Burned Areas Using Satellite Images. Environ. Ecol. Stat. 2024, 31, 57–77. [Google Scholar] [CrossRef]
Shao, Z.; Ahmad, M.N.; Javed, A. Comparison of Random Forest and XGBoost Classifiers Using Integrated Optical and SAR Features for Mapping Urban Impervious Surface. Remote Sens. 2024, 16, 665. [Google Scholar] [CrossRef]
Hazer, A.; Bozdağ, A.; Atasever, Ü.H. Hiper-optimize edilmiş makine öğrenim teknikleri ile taşınmaz değerlemesi, Yozgat Kenti örneği. Geomatik 2024, 9, 299–312. [Google Scholar] [CrossRef]
Yang, L.; Shami, A. On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 2623–2631. [Google Scholar]
Lai, J.-P.; Lin, Y.-L.; Lin, H.-C.; Shih, C.-Y.; Wang, Y.-P.; Pai, P.-F. Tree-Based Machine Learning Models with Optuna in Predicting Impedance Values for Circuit Analysis. Micromachines 2023, 14, 265. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Kavzoğlu, T.; Şahin, E.K.; Çölkesen, İ. Heyelan Duyarlılık Analizinde Ki-Kare Testine Dayalı Faktör Seçimi. In Proceedings of the V. Remote Sensing and Geographic Information Systems Symposium (UZAL-CBS 2014), Istanbul, Turkey, 14–17 October 2014. [Google Scholar]
Lv, B.; Gong, H.; Dong, B.; Wang, Z.; Guo, H.; Wang, J.; Wu, J. An Explainable XGBoost Model for International Roughness Index Prediction and Key Factor Identification. Appl. Sci. 2025, 15, 1893. [Google Scholar] [CrossRef]
Bradley, A.P. The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Foody, G.M. Thematic Map Comparison. Photogramm. Eng. Remote Sens. 2004, 70, 627–633. [Google Scholar] [CrossRef]
Phan, T.N.; Kuch, V.; Lehnert, L.W. Land Cover Classification Using Google Earth Engine and Random Forest Classifier—The Role of Image Composition. Remote Sens. 2020, 12, 2411. [Google Scholar] [CrossRef]
Sobrino, J.A.; Llorens, R.; Fernández, C.; Fernández-Alonso, J.M.; Vega, J.A. Relationship between Soil Burn Severity in Forest Fires Measured In Situ and through Spectral Indices of Remote Detection. Forests 2019, 10, 457. [Google Scholar] [CrossRef]
Mohammad, L.; Bandyopadhyay, J.; Sk, R.; Mondal, I.; Nguyen, T.T.; Lama, G.F.C.; Anh, D.T. Estimation of Agricultural Burned Affected Area Using NDVI and dNBR Satellite-Based Empirical Models. J. Environ. Manag. 2023, 343, 118226. [Google Scholar] [CrossRef]
Gibson, R.; Danaher, T.; Hehir, W.; Collins, L. A Remote Sensing Approach to Mapping Fire Severity in South-Eastern Australia Using Sentinel 2 and Random Forest. Remote Sens. Environ. 2020, 240, 111702. [Google Scholar] [CrossRef]
Hu, X.; Ban, Y.; Nascetti, A. Uni-Temporal Multispectral Imagery for Burned Area Mapping with Deep Learning. Remote Sens. 2021, 13, 1509. [Google Scholar] [CrossRef]
Lee, K.; Kim, B.; Park, S. Evaluating the Potential of Burn Severity Mapping and Transferability of Copernicus EMS Data Using Sentinel-2 Imagery and Machine Learning Approaches. GIScience Remote Sens. 2023, 60, 2192157. [Google Scholar] [CrossRef]

Figure 1. Study area map.

Figure 2. Burn severity classification workflow.

Figure 3. Spatial distribution of burn severity: (a) dNBR threshold and (b) dNDVI threshold.

Figure 4. Confusion matrices of the spectral indices (a) dNBR and (b) dNDVI. (c) Comparison of the accuracy assessment for the burn severity maps derived from the dNBR and dNDVI.

Figure 5. ROC curves.

Figure 6. The burned areas detected using different ML algorithms: (a) AdaBoost, (b) LightGBM, (c) RF, and (d) XGBoost are masked by the red color.

Figure 7. Performance comparison of ML models.

Figure 8. Comparison of classifier ROC curves.

Figure 9. (a) SHAP summary plot of variables for the RF classifier and (b) feature importance ranking based on absolute SHAP values.

Figure 10. SHAP dependence plots of (a) dNBR (b) dNDVI, (c) B6, (d) B7, (e) B8A and (f) B5.

Table 1. The Sentinel-2 bands and specifications used in the study.

Band Name	Spatial Resolution (m)	Central Wavelength (µm)
Band 2—Blue	10	0.490
Band 3—Green	10	0.560
Band 4—Red	10	0.665
Band 5—Red Edge 1	20	0.705
Band 6—Red Edge 2	20	0.740
Band 7—Red Edge 3	20	0.783
Band 8—NIR	10	0.842
Band 8A—NIR Narrow	20	0.865
Band 11—SWIR 1	20	1.610
Band 12—SWIR 2	20	2.190

Table 2. The dNDVI and dNBR burn severity classes.

Indices	Classes	Threshold Value
dNBR	Unburned	<0.1
	Low	0.1–0.26
	Moderate Low	0.27–0.43
	Moderate High	0.44–0.65
	High	>0.66
dNDVI	Unburned	<0.07
	Very Low	0.08–0.13
	Low	0.13–0.20
	Moderate	0.20–0.33
	High	0.33–0.44
	Very High	>0.45

Table 3. McNemar’s test results for classification performance.

	AdaBoost	LightGBM	RF	XGBoost
AdaBoost	-	-	-	-
LightGBM	30.25	-	-	-
RF	40.69	3.57	-	-
XGBoost	37.16	4.00	0.2	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gündüz, H.İ.; Torun, A.T.; Gezgin, C. Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach. Fire 2025, 8, 121. https://doi.org/10.3390/fire8040121

AMA Style

Gündüz Hİ, Torun AT, Gezgin C. Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach. Fire. 2025; 8(4):121. https://doi.org/10.3390/fire8040121

Chicago/Turabian Style

Gündüz, Halil İbrahim, Ahmet Tarık Torun, and Cemil Gezgin. 2025. "Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach" Fire 8, no. 4: 121. https://doi.org/10.3390/fire8040121

APA Style

Gündüz, H. İ., Torun, A. T., & Gezgin, C. (2025). Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach. Fire, 8(4), 121. https://doi.org/10.3390/fire8040121

Article Menu

Post-Fire Burned Area Detection Using Machine Learning and Burn Severity Classification with Spectral Indices in İzmir: A SHAP-Driven XAI Approach

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Dataset

2.3. Methods

2.3.1. Image Preprocessing

2.3.2. Creation of Training and Test Samples

2.3.3. Machine Learning Algorithms

2.3.4. Hyperparameters Tuning

2.3.5. SHapley Additive exPlanations

2.3.6. Accuracy Assessment

3. Results

3.1. Index-Based Results

3.2. ML-Based Results

3.3. Analysis of SHAP-Based Feature Importance

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI