Machine Learning and Deep Learning for Wildfire Prediction: A Systematic and Bibliometric Review of Methods, Data Practices, and Reproducibility (2020–2025)

Galván Lara, Kevin Manuel; Miquelajauregui, Yosune; Enriquez Ocaña, Luis Fernando; Meling-López, Alf Enrique; Neger, Christoph; Abatzoglou, John; Galicia, Leopoldo; Hinojo, César; Jiménez-Guzmán, Graciela; Rodríguez Alcantar, Edelmira

doi:10.3390/fire9050204

Open AccessSystematic Review

Machine Learning and Deep Learning for Wildfire Prediction: A Systematic and Bibliometric Review of Methods, Data Practices, and Reproducibility (2020–2025)

by

Kevin Manuel Galván Lara

¹

,

Yosune Miquelajauregui

^2,*

,

Luis Fernando Enriquez Ocaña

³

,

Alf Enrique Meling-López

³,

Christoph Neger

⁴

,

John Abatzoglou

⁵,

Leopoldo Galicia

⁴

,

César Hinojo

³

,

Graciela Jiménez-Guzmán

⁶ and

Edelmira Rodríguez Alcantar

⁷

¹

Posgrado en Biociencias, Departamento de Investigaciones Científicas y Tecnológicas, Universidad de Sonora, Blvd. Luis Encinas y Rosales S/N, Colonia Centro, Hermosillo 83000, Sonora, Mexico

²

Laboratorio Nacional de Ciencias de la Sostenibilidad, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

³

Departamento de Investigaciones Científicas y Tecnológicas, Universidad de Sonora, Blvd. Luis Encinas y Rosales S/N, Col. Centro, Hermosillo 83000, Sonora, Mexico

⁴

Instituto de Geografía, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

⁵

Management of Complex Systems (MCS), University of California, Merced, 5200 Lake Rd, Merced, CA 95343, USA

⁶

Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico

⁷

Departamento de Matemáticas, Universidad de Sonora, Blvd. Luis Encinas y Rosales S/N, Colonia Centro, Hermosillo 83000, Sonora, Mexico

^*

Author to whom correspondence should be addressed.

Fire 2026, 9(5), 204; https://doi.org/10.3390/fire9050204

Submission received: 25 February 2026 / Revised: 6 April 2026 / Accepted: 7 April 2026 / Published: 15 May 2026

(This article belongs to the Special Issue Machine Learning (ML) and Deep Learning (DL) Applications in Wildfire Science: Principles, Progress and Prospects (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

Wildfire prediction using machine learning (ML) and deep learning (DL) has expanded rapidly, yet synthesis regarding algorithmic configurations, data practices, and transparency remains limited. This systematic review characterizes ML/DL applications in wildfire prediction (2020–2025) using a PRISMA-EcoEvo framework across 341 peer-reviewed studies, with detailed analysis of 110 articles from 2024. Publication output increased steadily, concentrated geographically in China and the United States. Methodologically, ensemble tree-based methods (26.7%) and deep learning architectures (59.4%) coexist, reflecting adaptation to diverse data modalities. Input data are dominated by vegetation/fuel characteristics (44.7%) and historical fire labels (41.2%), while socioeconomic variables remain marginal (1.2%). Evaluation practices distinguish classification and regression tasks, yet metric heterogeneity constrains cross-study comparability. Critically, only 7.7% of studies provided publicly accessible code, with a significant association between algorithm family and code availability (χ² = 78, p = 0.0012). Collectively, wildfire ML/DL research demonstrates technical advancement but remains geographically concentrated and constrained by limited transparency. Strengthening reporting standards, metric-task alignment, dataset documentation, and open-code practices is essential to translate computational innovation into globally robust, reproducible wildfire decision-support systems.

Keywords:

wildfire prediction; machine learning; deep learning; systematic review; algorithm taxonomy; evaluation metrics; reproducibility

1. Introduction

Forest fires have become an escalating global concern, now affecting approximately one-third of the world’s forests, equivalent to about 36% of all forested land and nearly a third of Earth’s terrestrial surface [1,2]. On average, between 3.3 and 6.3 million km² burn each year [3], and this annual burn rate is expected to increase due to climate change [4]. Beyond the widespread loss of ecosystem cover, forest fires also impose severe human and economic costs. The scale and intensity of wildfire impacts underscore the urgent need for comprehensive fire management and mitigation strategies, supported by data-driven methodologies and robust fire prediction and management decision-support tools. These systems must not only identify the underlying drivers of fire activity, including biophysical and anthropogenic ignition sources, but also account for uncertainty in fire behavior and response outcomes, supporting the prediction, assessment, and evaluation of management strategies through informed, data-driven decision-making.

The dynamics of wildfire are influenced by a complex interplay of climate, topography, vegetation, and human activity, operating across multiple temporal and spatial scales [5,6,7]. Among these drivers, climatic variables play a critical role [2,8]. Elevated temperatures, low humidity, reduced precipitation, strong winds, and increased solar radiation influence ignition probability, spread rate, and fire intensity. Topography further modulates fire behavior and risk [6]. Higher elevations and proximity to water bodies generally reduce fire risk due to cooler microclimates and higher humidity levels, whereas steep slopes and aspects facing south facilitate rapid and intense fire ignition and spread [9]. Vegetation characteristics also play a pivotal role, with ecosystems featuring high canopy bulk densities, closed canopies, and abundant dry biomass being more prone to frequent and intense fires [10,11]. Additionally, human activities, including population density, road networks, and agricultural burning, significantly elevate ignition risk, while socioeconomic factors, such as regional development and the availability of fire prevention and suppression infrastructure, influence the capacity to manage and mitigate wildfire impacts [11,12].

Historically, fire regime characterization and behavior prediction have relied on physical models based on combustion thermodynamics [13] and empirical meteorological indices. While effective for identifying broad correlations, these traditional methods often assume linearity and variable independence [13], oversimplifying the complex, nonlinear interactions inherent in wildfire processes and lacking the capacity to integrate high-dimensional socioeconomic factors [14]. To address these gaps, modern approaches have emerged: Multi-Criteria Decision Making (MCDM) methods like Analytic Hierarchy Process (AHP) and fuzzy logic integrate expert judgment with quantitative data across criteria such as topography and climate to produce vulnerability maps [15]. Machine learning (ML), described as the search for useful representations and rules over an input dataset performed within a predefined space and guided by a feedback signal, and deep learning (DL), a specific subfield emphasizing the learning of successive layers of increasingly meaningful representations, support resource allocation [16]; e.g., Wildfire Assessment Model; [17]) and early warning through models like IOFireNet and Long-Short Term Memory (LSTM) [18]. Simulation metamodels, often using artificial neural networks, emulate costly physics-based simulations to rapidly evaluate management scenarios [19]. Explainable AI (XAI) tools like SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) enhance trust by clarifying model predictions [20]. Finally, operational Decision Support Systems (DSS) integrate these components for real-time tactical response [21]. Despite these advances, structural incompatibilities persist, as physical models require precise combustion parameters, while socioeconomic variables, such as Gross Domestic Product (GDP) or road distance, are indirect ignition proxies that cannot be directly ingested [22,23,24,25].

To address the limitations of physical and empirical models, wildfire management is increasingly adopting data-driven architectures and decision support tools capable of integrating multiscale environmental and human-related data to enhance prevention, detection, and response, especially under the uncertainties posed by global change. Geographic Information Systems (GIS) integrated with (MCDA) support dynamic fire risk mapping by synthesizing diverse environmental and socioeconomic variables, facilitating the strategic allocation of firefighting resources to areas of highest risk [21]. Machine learning (ML) and deep learning (DL) models have demonstrated substantial potential to improve wildfire forecasting and enable more dynamic, data-driven resource allocation by capturing complex, nonlinear relationships among environmental, anthropogenic, and physical variables. Trained on historical, environmental, and fuel-related datasets, often derived from satellite imagery and sensor networks, these models can be applied to previously unseen regions, generalize across diverse and non-uniform input conditions, and improve predictive accuracy over time through continuous learning from new data and past forecasting errors [26]. Defined as analytical and computational frameworks, decision-making tools enable authorities to manage the multifaceted uncertainties typical of wildfire scenarios [27]. According to [27,28]. However, the development and validation of such decision-support systems are not evenly distributed across regions.

A significant geographic bias persists in wildfire research, with scientific output heavily concentrated in data-rich, high-income nations, particularly the United States, China, France and other countries, despite these regions representing only a small fraction of global burned area [29]. Empirical evidence supports this imbalance. For instance, about 15% of wildfire publications focus on the western United States, which accounts for just 0.5% of global burned area [30]. Conversely, high burn regions like Africa and Siberia remain severely underrepresented in predictive modeling, even though Africa and South America together contribute over 70% of the world’s burned area [31,32,33]. This imbalance stems from limited technological infrastructure, scarce localized high-quality datasets, and a lack of AI expertise in the Global South, leading to models trained on Mediterranean or boreal ecosystems that may not generalize to tropical or savanna fire regimes [29].

Although ML and DL approaches are increasingly applied in wildfire research, there remains a limited synthesis of how methodological configurations, data practices, and transparency standards are distributed across the literature. Addressing this gap is essential for understanding the structure of the current research landscape and its implications for reproducibility and methodological transferability.

This study systematically characterizes recent (2020–2025) applications of Machine Learning (ML) and Deep Learning (DL) in wildfire prediction through a structured literature review following the PRISMA-EcoEvo protocol. Specifically, the review aims to:

(1) Characterize the temporal and geographic evolution of ML/DL-based wildfire prediction research between 2020 and 2025; (2) examine how predictive tasks, algorithm families, input data types, and evaluation metrics are configured across the literature; and (3) evaluate transparency and reproducibility by examining completeness and the availability of open-source materials.

2. Materials and Methods

This study employed a systematic literature review following the PRISMA EcoEvo protocol [34], (Figure 1, to identify peer-reviewed studies applying Machine Learning (ML) or Deep Learning (DL) techniques to wildfire prediction. Searching was conducted in the Scopus and Web of Science (WoS) Core Collection databases using the Boolean query (“WILDFIRE” OR “FOREST FIRE”) AND (“MACHINE LEARNING” OR “DEEP LEARNING”) AND (“PREDICTION” OR “FORECAST”) AND (“DECISION MAKING” OR “DECISION SUPPORT”). The search was limited to English-language publications dated between 1 January 2020 and 31 December 2025.

This systematic review followed transparent and reproducible research practices and was prospectively registered in the Open Science Framework (OSF). The study protocol, including the search strategy, eligibility criteria, and data extraction framework, is publicly available to ensure methodological transparency and facilitate reproducibility. The registration can be accessed at the following link: https://osf.io/tc9s2/overview, accessed on 26 March 2026. This registration documents the analytical workflow and supports the traceability of decisions made during the review process.

Studies were included if they (i) explicitly reported the ML/DL algorithm(s) used, (ii) incorporated geospatial or remote sensing input data, and (iii) presented quantitative evaluation metrics (e.g., accuracy, AUC, F1 score) for predicting wildfire occurrence, behavior, or risk in natural or Wildland–Urban Interface (WUI) environments. Reviews, editorials, abstracts, operational tools without a core predictive modeling component, urban structural fire studies, and studies relying solely on non-spatial theoretical or mathematical formulations were excluded. Screening involved duplicate removal followed by title, abstract, and full-text assessment according to predefined criteria. The complete selection process is documented in Figure 1.

From the 1730 records initially retrieved through database searches, 1335 were excluded during title and abstract screening because they did not meet the inclusion criteria of this review. Excluded studies comprised articles unrelated to wildfire phenomena, studies addressing wildfire detection or monitoring without predictive modeling components, works focused on non-ML/DL approaches, methodological contributions lacking application to wildfire prediction tasks, and publications outside the thematic scope defined by the review protocol. A second abstract-level relevance screening was then conducted to ensure consistency with the study objectives, leading to the removal of additional records that did not explicitly address wildfire occurrence or risk prediction using data-driven approaches. The full screening procedure is summarized in the PRISMA flow diagram, and detailed inclusion and exclusion criteria are described in the PRISMA-EcoEvo methodological framework presented in the Section 2 Methods section.

The resulting corpus of 341 studies served as the foundation for multiple analytical approaches: bibliometric analyses of temporal trends and geographic distribution, word clouds for textual pattern representation created using the WordCloud library, and an examination of relationships between frequently used algorithms and the primary country of study to explore potential regional patterns. Building on this comprehensive analysis, a subset of 110 articles published in 2024 (approximately 30% of the total corpus) was examined in greater methodological detail to assess algorithm classification, input data domains, evaluation metric alignment, and code availability practices.

To standardize algorithm reporting and reduce nomenclatural heterogeneity, predictive models were categorized using a structured taxonomy comprising nine families: (1) Tree-Based and Ensemble Methods; (2) Deep Learning—Convolutional Neural Networks (CNNs); (3) Deep Learning—Specialized, Hybrid, or Novel Architectures; (4) Classical or Statistical Models; (5) Deep Learning—Feedforward Networks; (6) Deep Learning—Recurrent or Temporal Models; (7) Deep Learning—Generative Models; (8) Deep Learning—Transformers and Vision Transformers; and (9) Support Vector Machines and Related Methods. Hybrid approaches were classified according to their dominant architectural component (Table 1).

To identify and quantify the primary categories of input data used for model training, predictor variables were grouped into standardized thematic domains: meteorological, topographical, remote sensing, and socioeconomics. Meteorological sources were further distinguished between global reanalysis products (e.g., ERA5) and local station data. Each data source was additionally flagged as public (open access) or private/local to contextualize accessibility and reproducibility potential.

To examine reported evaluation metrics and their alignment with predictive task types, metrics were harmonized to consolidate synonymous terminology (e.g., Sensitivity, Recall, and True Positive Rate grouped under “Recall”) and categorized according to task type. Classification metrics (e.g., AUC, F1 Score) were distinguished from regression metrics (e.g., RMSE, MAE) to avoid analytical confounding between occurrence/risk and spread/burned area models.

Complete mappings of algorithm families, input data domains, and evaluation metrics are provided in the Supplementary Material to ensure methodological transparency and facilitate replication.

To evaluate transparency and reproducibility practices, each article in the methodological subset was manually screened for the presence of accessible open-source code repositories (e.g., GitHub, institutional repositories). Code availability was recorded as a binary variable (“Yes”/“No”), and a study was considered reproducible only if a functional link to executable code was provided in the manuscript or Supplementary Materials. Only publicly accessible and verifiable repositories were considered valid for classification. A χ² test was conducted to evaluate the association between code sharing and algorithm type.

The reviewed literature was additionally categorized (Table 2) according to predictive task type based on title and abstract screening. Most studies focused on wildfire occurrence or risk prediction (75.7%), followed by burned-area or severity estimation (10.3%) and detection or monitoring-related predictive workflows (9.1%). Only a small fraction addressed wildfire spread or propagation modeling (1.5%). This distribution confirms that the dominant focus of recent ML/DL wildfire prediction research lies in occurrence and susceptibility modeling rather than fire dynamics simulation.

All analyses were conducted using Python (v3.11.12) within the Google Colab environment. Data processing, bibliometric analyses, statistical tests, and reproducibility assessments were implemented using reproducible scripts. Visualizations were generated with established libraries, including Matplotlib: 3.10.8, Altair (Vega-Altair): 6.0.0, and Plotly (Python): 6.0.0. Geospatial analyses and choropleth maps were produced using GeoPandas 1.1.3 to process vector geometries and overlay country-level boundaries.

To ensure full methodological transparency, the complete codebase, processed datasets, and visualization scripts are publicly available in a Zenodo repository (bevins93/bibliometric_global_wf: Machine and Deep Learning in Wildfire Prediction: A Systematic and Bibliometric Analysis of Methods, Data, and Reproducibility (2020–2025)), enabling replication of the bibliometric and geospatial workflows presented in this study.

3. Results

3.1. Temporal Trends and Geographic Distribution of Research Output (2020–2025)

A total of 341 studies published between 2020 and 2025 were included in the final corpus. Annual publication counts increased from 2 studies in 2020 to 5 in 2021, 20 in 2022, 24 in 2023, 33 in 2024, and 57 in 2025 (Figure 2), indicating a steady upward trend over the study period.

Research output was geographically concentrated in a limited number of countries. China accounted for more than 30 studies, followed by the United States with approximately 25–28 publications. Australia and Brazil each contributed between 10 and 15 studies. Several European countries, including Germany, France, and the United Kingdom, as well as India and South Korea, reported between 5 and 10 studies, whereas most other countries contributed fewer than five publications (Figure 3).

3.2. Algorithm Families and Methodological Adoption Patterns

Algorithms were classified into structured families to enable standardized comparison across studies (Figure 4). Tree-based and ensemble methods (e.g., Random Forest, XGBoost) accounted for 26.7% of the analyzed studies. Deep learning architectures collectively represented 59.4% of implementations. Within this category, Convolutional Neural Networks (CNNs) accounted for 15.1%, followed by specialized, hybrid, or novel deep learning architectures (12.8%) and feedforward neural networks (10.5%). Classical statistical models (e.g., Support Vector Machines, logistic regression) represented 11.6% of studies, while recurrent neural networks (e.g., LSTM) and generative architectures accounted for 9.3% and 7.0%, respectively.

3.3. Geographic Variation in Algorithm Family Adoption

To explore geographic variation in methodological adoption, algorithm families were cross-tabulated with country-level publication counts. Figure 5 displays the relative distribution of algorithm families across countries within the analyzed subset. Studies affiliated with China reported higher frequencies of tree-based and ensemble methods, particularly Random Forest and XGBoost. In contrast, studies affiliated with the United States reported greater use of deep learning architectures, including Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks.

Geographic variation in methodological adoption revealed clear differences between the two most represented countries in the reviewed literature, China and the United States. Studies affiliated with China showed a higher relative frequency of tree-based and ensemble learning approaches, particularly Random Forest and XGBoost, reflecting a strong emphasis on structured predictor integration and tabular environmental datasets. In contrast, studies affiliated with the United States demonstrated greater adoption of deep learning architectures, especially Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, consistent with workflows leveraging spatiotemporal remote-sensing inputs and sequential environmental predictors. This geographic divergence suggests complementary methodological tendencies rather than competing paradigms, highlighting how national research infrastructures, data availability, and application contexts shape algorithm selection within the global wildfire prediction research landscape. These patterns are consistent with the global distribution of research output shown in Figure 3, where China and the United States together account for the largest share of publications in ML/DL wildfire prediction between 2020 and 2025.

3.4. Input Data Domains Used for Model Training

Predictor variables were grouped into standardized thematic categories to enable comparison across studies (Figure 6). Vegetation and fuel-related variables accounted for 44.7% of reported input domains, including vegetation indices, fuel moisture content, and biomass-related measures. Fire history and labeled datasets represented 41.2% of inputs, encompassing historical fire perimeters, ignition records, and temporal occurrence data. Climatic and meteorological variables comprised 7.1% of reported inputs, while remote sensing-derived variables accounted for 5.9%. Human and socioeconomic variables represented 1.2% of input domains within the analyzed subset.

3.5. Evaluation of Metrics and Alignment with Predictive Task Types

Reported evaluation metrics were categorized according to predictive task type (Figure 7). For classification tasks (e.g., fire occurrence or risk prediction), metrics related to precision, recall, and F1-score accounted for 26.7% of reported evaluations. Regression tasks (e.g., burned area or fire spread prediction) most frequently employed regression error metrics, representing 25% of evaluations, with RMSE and MAE commonly reported. Computational performance metrics accounted for 21.7% of reported measures, while segmentation or spatial overlap metrics represented 11.7%. Accuracy-based metrics comprised 10%, and threshold-independent metrics accounted for 5%.

3.6. Transparency and Reproducibility Practices

Among the 110 studies analyzed in detail, 7.7% provided publicly accessible open-source code repositories for their algorithmic implementations, whereas 92.3% did not report accessible code (Figure 8).

A chi-square (χ²) test was conducted to examine the association between algorithm family and code availability. The test indicated a statistically significant association (χ² = 78.0, df = 44, p = 0.0012), suggesting that the distribution of code availability varied across algorithm categories.

Code-sharing frequencies differed among studies employing tree-based and ensemble methods (26.7% of implementations) and deep learning architectures (59.4% of implementations), as illustrated in Figure 8. These results describe the observed patterns of code reporting within the analyzed subset.

4. Discussion

We conducted a systematic review to characterize recent (2020–2025) applications of Machine Learning (ML) and Deep Learning (DL) in wildfire prediction, examining temporal growth, geographic concentration, methodological configurations, evaluation practices, and transparency patterns.

4.1. Temporal Trends and Geographic Distribution

This study highlights the rapid growth in scientific production related to ML and DL applications in wildfire analysis over the past five years (Figure 2). Rather than representing a simple increase in publication volume, this acceleration signals the consolidation of wildfire prediction as a computational research domain within applied environmental sciences. This trend aligns with broader developments in the literature, as the use of ML techniques [13,32,45,46,47,48] and DL approaches [20,38,39,49] has steadily increased.

Several factors may contribute to this rise, including technological advances in DL architectures [20,28,38,39], improvements in computational capabilities that enable large-scale data processing [19] and increasing availability of high-resolution environmental datasets [21,42]. The expanded accessibility of meteorological and satellite data has further facilitated data-driven wildfire prediction modeling. Importantly, this convergence of computational power and environmental data availability reduces barriers to entry for model development but does not necessarily ensure comparability or methodological rigor across studies.

The geographic concentration of wildfire prediction research in high-income countries such as China, the United States, France, Italy, India, Portugal, Greece, and Canada reflects disparities in research infrastructure and data ecosystems. This distribution reflects differences in research infrastructure, data availability, and computational resources. In addition, this concentration has epistemological implications: models developed predominantly in temperate or boreal systems may implicitly encode region-specific fire regimes, potentially limiting transferability to tropical, savanna, or understudied ecosystems. However, it is important to acknowledge that this pattern is partially shaped by linguistic bias in Scopus and WoS, which favor English-language publications. Consequently, research from Latin America (e.g., Mexico, Chile) or Francophone Africa produced in local languages may be underrepresented, despite high fire incidence in those regions.

Beyond reflecting disparities in research infrastructure, this geographic concentration has important implications for the representativeness and operational relevance of wildfire prediction models at the global scale. Empirical evidence indicates that approximately 15% of wildfire-related predictive studies focus on the western United States alone, despite this region accounting for only about 0.5% of the global burned area, while regions such as Africa and large portions of South America, which together contribute more than 70% of global burned area, remain substantially underrepresented in predictive modeling efforts. This imbalance suggests that current methodological advances are disproportionately calibrated using temperate and Mediterranean-type fire regimes, potentially limiting their transferability to tropical and savanna-dominated ecosystems where ignition dynamics, vegetation structure, and human–fire interactions differ substantially. As a consequence, geographic bias does not only affect publication distribution but may also shape the implicit assumptions embedded within model architectures, training datasets, and evaluation strategies. In practice, this can reinforce structural gaps in decision-support capacity across fire-prone regions of the Global South, where predictive tools are most urgently needed but remain least represented in the literature. Addressing this imbalance, therefore, represents not only a question of scientific equity but also a prerequisite for improving the robustness and global applicability of ML/DL-based wildfire prediction systems.

4.2. Algorithm Families and Methodological Patterns

From a methodological perspective, Random Forest (RF) remains widely used, particularly in countries such as China and Australia, suggesting both the robustness of this algorithm and its suitability for predominantly tabular environmental datasets [21,35,36,50]. Its continued prominence may be attributed to computational efficiency, interpretability, and strong performance with structured regional datasets [36,43].

Its widespread use may also reflect its relative ease of implementation, interpretability, and solid performance with smaller or moderately sized datasets, making it a practical choice for many research groups [36,44].

In contrast, deep learning (DL) methods, such as Convolutional Neural Networks (CNNs), Artificial Neural Networks (ANNs), and Long Short-Term Memory (LSTM) networks, are increasingly represented, their adoption remains less dominant. These techniques are typically employed in more specialized contexts, such as satellite image classification (e.g., CNN, U Net) and time series prediction (e.g., LSTM, ConvLSTM), where their strengths in capturing spatial or temporal patterns are most advantageous [17]. However, their higher data and computational requirements, as well as interpretability challenges, may partly explain their more limited and targeted application.

The geographic stratification observed in algorithmic adoption further complicates evaluation practices. China’s stronger reliance on ensemble methods, often associated with classification-oriented risk modeling, and the United States’ greater emphasis on deep learning architectures, frequently applied to spatial or regression-based tasks, reflect distinct regional data ecosystems and operational priorities. As a result, metric selection is not only task-dependent but also shaped by local research infrastructures and data availability. This fragmentation does not necessarily guarantee uniform standardization; rather, it reinforces the need for explicitly articulated, context-sensitive evaluation frameworks. However, when divergent metric practices are combined with the near absence of open-source code in leading research hubs, comparability and validation become severely constrained. In particular, the limited adoption of metrics tailored to imbalanced datasets (e.g., rare high-severity fires) and spatially explicit predictions, despite their documented importance [31,40], amplifies reproducibility gaps and restricts the transferability of models across regions. Strengthening the field, therefore, requires not homogenization, but transparent reporting standards and reproducible implementations that clearly link evaluation criteria to operational goals while enabling cross-context validation.

Our hierarchical analysis reveals that while Deep Learning (DL) architectures like CNNs dominate image-based tasks (e.g., active fire detection from satellite imagery), Ensemble Tree-based methods (Random Forest, XGBoost) remain the state of the art for tabular risk prediction, a resilience driven by their computational efficiency and robustness on structured, sample-limited regional fire records. Critically, this methodological duality is not merely technical but geographically stratified, as evidenced by the heatmap in Figure 5, which exposes a stark bifurcation between research hubs: China’s overwhelming adoption of ensemble methods, with RF and XGBoost usage intensity exceeding 3.5 on a scale from 0 to 4.0, contrasts sharply with the United States’ stronger emphasis on DL frameworks (CNNs, LSTMs), reflecting divergent data ecosystems, China’s reliance on historical tabular datasets versus the U.S.’s access to high resolution satellite and spatial data.

4.3. Input Data Domains

The dominance of vegetation/fuel characteristics and historical fire labels indicates prioritization of biophysical predictors. This emphasis reflects both ecological theories, where fuel load and ignition history are central drivers, and practical data availability, as such datasets are more consistently archived than socioeconomic variables.

Climatic and meteorological variables, although categorized separately, often function as auxiliary predictors integrated within broader environmental datasets. Remote sensing data is particularly central to DL-based studies, reinforcing the coupling between data modality and architecture.

The limited incorporation of socioeconomic variables does not necessarily indicate conceptual neglect; rather, it may reflect the fragmentation of wildfire research between biophysical spread modeling and risk-oriented WUI frameworks. However, the underrepresentation of socioeconomic data constrains the integration of vulnerability and exposure dimensions into predictive modeling.

A key implication is that current wildfire ML/DL research remains predominantly hazard-centric rather than fully risk-integrated, which may limit its alignment with decision-support systems that require socio-environmental synthesis.

4.4. Evaluation Metrics and Task Alignment

Performance evaluation in wildfire prediction research exhibits a nuanced distribution of metrics, with Precision, Recall, and F1 metrics dominating at 26.7% (Figure 7), reflecting the field’s strong emphasis on binary risk prediction and early warning systems, where minimizing false negatives is critical for disaster response. Notably, regression error metrics (e.g., RMSE, MAE) constitute a substantial 25% of evaluations, challenging the notion of “less attention” to continuous variables such as burned areas and spread dynamics. This comparable prevalence of regression and classification metrics suggests that wildfire prediction research increasingly integrates both discrete risk detection and continuous spatial modeling, reflecting the dual operational demands of early warning and resource allocation. Computational performance metrics (21.7%) and segmentation or spatial overlap measures (11.7%) further highlight emerging priorities for model efficiency and geospatial accuracy, while accuracy-based metrics and threshold-independent approaches play supplementary roles. Overall, performance evaluation exhibits clear differentiation between classification and regression tasks, reflecting fundamentally distinct predictive objectives within the field. Precision, Recall, and F1 metrics dominate classification-oriented studies, consistent with applications focused on fire occurrence and risk prediction, whereas regression error metrics remain central to modeling continuous outcomes such as burned area and spread dynamics.

Critically, the underrepresentation of metrics tailored for imbalanced datasets (e.g., rare high-severity fires) and spatially explicit predictions, despite their documented importance [31,40], exacerbates reproducibility challenges, particularly when paired with the 92.3% code non-sharing rate observed in high-impact studies. When metric selection is not explicitly aligned with predictive task type or decision-making context, reported model performance may be difficult to interpret operationally or compare across studies. To advance the field, community-wide adoption of context-aware evaluation standards, prioritizing metrics like spatial overlap measures for geospatial tasks and recall-sensitive frameworks for risk prediction, is essential to align methodological rigor with the dual imperatives of scientific validation and real-world operational utility.

4.5. Transparency and Reproducibility

Our analysis reveals a field at a critical inflection point: while wildfire prediction research has matured methodologically, with ensemble methods (26.7%) and deep learning architectures (59.4%) driving task-specific advances, and evaluation practices have diversified to address both classifications, using Precision, Recall, and F1 metrics (26.7%) and regression challenges (25.0% error metrics), this technical sophistication exists in stark tension with a profound reproducibility crisis. Only 7.7% of studies shared code, while 92.3% operated as closed “black boxes” (Figure 8), a disparity that intensifies precisely where innovation concentrates. The geographic stratification of algorithmic adoption, China’s ensemble dominance versus U.S. deep learning specialization, compounds this challenge: regional methodological preferences, though contextually rational, fragment validation pathways when paired with near universal code non-sharing. This gap represents not merely a reporting omission, but a structural constraint on cumulative knowledge building in computational science, where reproducibility depends on executable artifacts rather than narrative descriptions alone.

When methodological specialization coexists with limited transparency, cross-regional validation becomes constrained, potentially reinforcing regional silos in model development.

This geographic fragmentation, however, amplifies the reproducibility crisis previously identified: 92.3% of studies withhold code (Figure 5), with leading nations like China and the U.S., where complex, context-specific models dominate, contributing disproportionately to this gap. The absence of open-source practices in these high-impact regions creates a paradox where methodological innovation, such as China’s ensemble-driven precision in risk prediction or the U.S.’s DL powered spatial modeling, coexists with systemic barriers to validation and scalability. This underscores the urgent need for context-aware standardization: rather than a universal “best practice”, the field requires evaluation frameworks that account for regional data constraints, environmental heterogeneity, and task-specific priorities [29,35,41,51]. Future progress may hinge on hybrid architectures that merge the interpretability of ensembles with DL’s spatial–temporal modeling capabilities, but such integration demands not only technical innovation but also wide community commitments to transparency, ensuring that the geographic and methodological pluralism driving wildfire prediction research becomes a catalyst for robust, globally applicable solutions rather than a source of fragmentation.

Critically, the chi-square association between algorithm type and code availability (χ² = 78.0, df = 44, p = 0.0012) reveals a statistically significant dependency, indicating that code availability is not independent of the algorithm employed. Notably, algorithms drive recent advances in the field, such as deep learning and hybrid architectures, and exhibit distinct patterns of repository disclosure, underscoring structural differences in reproducibility practices across methodological approaches. This variation may reflect differences in implementation complexity, institutional norms, or intellectual property considerations associated with certain architectures.

Beyond code availability alone, transparency in wildfire prediction research should be understood as a structural property of the modeling pipeline rather than a binary reporting decision. Reproducibility depends not only on executable scripts, but also on access to preprocessing workflows, feature engineering logic, training–validation partition strategies, hyperparameter selection procedures, and metadata describing spatial and temporal sampling assumptions. When these components remain implicit, models may appear technically replicable while remaining practically non-transferable across regions or datasets. In this sense, the current transparency gap reflects a broader infrastructural limitation affecting interoperability between research groups rather than a simple absence of repositories. Strengthening structural transparency, therefore, requires moving from isolated code release practices toward fully documented computational pipelines that enable cross-context validation, facilitate benchmark construction, and support operational adoption by agencies responsible for wildfire preparedness and response.

This paradox undermines the field’s dual mandate: scientific rigor and operational utility. Without transparent implementations, even metrics-rich evaluations, such as spatial overlap measures and recall-sensitive frameworks, become unverifiable assertions rather than actionable benchmarks. In high-stakes domains such as wildfire management, reproducibility is not merely an academic norm but a prerequisite for reliability. Models deployed without transparent validation pathways risk undermining stakeholder trust and limiting operational adoption.

Moving forward, reproducibility must transition from an optional virtue to a structural requirement. Journals and funding agencies should mandate code deposition for publication, while the community must develop context-aware open benchmarks that respect regional data constraints yet enable cross-validation. Only through such integration, where methodological pluralism coexists with uncompromising transparency, can wildfire prediction evolve from fragmented academic exercises into globally interoperable, lifesaving decision support systems. The path forward demands not less innovation, but innovation anchored in verifiability: because in wildfire management, unvalidated predictions are not merely scientifically incomplete, they are operationally dangerous.

4.6. Implications and Limitations

It is important to acknowledge that the patterns identified in this study may be partially influenced by search bias and methodological constraints. Although the Scopus and Web of Science (WoS) databases provide broad coverage of the peer-reviewed literature, they may omit relevant region-specific publications or studies published in languages other than English. As such, certain geographic regions or local approaches may be underrepresented. Database selection and language restrictions may therefore shape the observed geographic distribution and methodological patterns identified in this review.

Furthermore, while we conducted a subset analysis to explore specific objectives, such as open-source code availability, this approach may introduce additional bias by emphasizing studies that met our selection criteria more narrowly. The detailed coding of 110 articles enabled fine-grained methodological analysis but necessarily narrows interpretive scope relative to the full corpus. Despite these limitations, the study offers robust and valuable insights, strengthened by the rigorous application of the PRISMA Eco-Evo protocol for systematic literature reviews. The transparent reporting of search criteria, inclusion thresholds, and coding procedures enhances the reproducibility and interpretability of the synthesis itself. Our subset analysis involved a detailed review of 110 articles, particularly assessing aspects such as algorithm usage, performance evaluation, and code availability, thereby providing a solid empirical foundation for our conclusions.

Our findings carry important implications for various stakeholders involved in wildfire prediction research and its application. For researchers, the results highlight the urgent need to incorporate a broader range of data sources, particularly by including underrepresented socioeconomic and spatial factors. Additionally, the observed heterogeneity in evaluation metrics underscores the necessity of standardizing or more explicitly justifying metric selection to enhance comparability and reproducibility across studies. Addressing these methodological gaps will strengthen the reliability and impact of future research.

For practitioners and decision makers, this review offers a comprehensive overview of the predominant tools, algorithms, and data types currently employed in wildfire prediction. Understanding the relationship between algorithm families and data modalities (e.g., tabular vs. spatial inputs) is essential for selecting models that are context-appropriate rather than technically novel but operationally misaligned. This synthesis can inform the selection of models tailored to specific contexts and operational needs, ultimately improving the effectiveness of wildfire risk management and mitigation strategies.

From the viewpoint of those who provide research funding and develop policies, the geographic and thematic disparities revealed in our analysis point to a clear need for targeted investment aimed at promoting research equity and fostering innovation in underrepresented regions. Moreover, funding agencies and policymakers should consider implementing incentives that encourage transparency, reproducibility, and the adoption of open science practices. Such measures would facilitate broader collaboration, accelerate scientific progress, and enhance the overall robustness of wildfire prediction research. Advancing wildfire prediction, therefore, requires not only technical advancement but systemic alignment between data ecosystems, evaluation standards, and transparency norms across regions. These insights emphasize that advancing wildfire prediction is not solely a technical challenge but also requires concerted efforts across research, practice, and policy domains to build inclusive, transparent, and effective scientific frameworks.

For future research, we recommend shifting focus from purely algorithmic novelty to operational validity. True ‘Decision-making tools’ require not just high accuracy, but interpretability, uncertainty quantification, and, crucially, reproducibility. Future reviews should explicitly assess the operational readiness level (TRL) of these models and evaluate whether methodological advances translate into deployable decision-support systems.

5. Conclusions

The application of Machine Learning (ML) and Deep Learning (DL) to wildfire prediction expanded substantially between 2020 and 2025, with research output concentrated in a limited number of high-income countries, particularly China, the United States, and Australia. Methodologically, ensemble tree-based methods (e.g., Random Forest) and deep learning architectures (e.g., CNNs, ANNs) coexist, reflecting adaptation to distinct data modalities and modeling objectives rather than a simple replacement of traditional ML approaches.

In terms of data practices, studies predominantly rely on meteorological and topographical variables, with increasing but still limited integration of remote sensing inputs, while socioeconomic variables remain comparatively underrepresented. This distribution suggests that wildfire ML/DL research continues to prioritize biophysical predictors over human-system variables, potentially constraining applicability across diverse socio-environmental contexts.

Evaluation practices demonstrate differentiation between classification-oriented tasks (e.g., occurrence and risk prediction) and regression-based modeling (e.g., burned area and spread), yet metric heterogeneity limits cross-study comparability. Finally, the limited availability of open-source code represents a structural constraint on reproducibility and cumulative knowledge building.

Collectively, these findings indicate that wildfire ML/DL research is advancing technically but remains geographically concentrated and structurally constrained by transparency limitations. Strengthening reporting standards, dataset documentation, metric justification, and open code practices will be essential for translating methodological innovation into robust, globally applicable wildfire decision-support systems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fire9050204/s1, Table S1: List of studies included in the bibliometric review; Table S2: Algorithm abbreviation dictionary used in the wildfire prediction literature. And used in Figure 5.

Author Contributions

Conceptualization, K.M.G.L. and Y.M.; methodology, K.M.G.L. and Y.M.; software, K.M.G.L.; validation, Y.M. and L.F.E.O.; formal analysis, K.M.G.L.; investigation, K.M.G.L. and J.A.; resources, Y.M.; data curation, K.M.G.L. and J.A.; writing—original draft preparation, K.M.G.L. and Y.M.; writing—review and editing, L.F.E.O., A.E.M.-L., C.N., L.G. and C.H.; visualization, K.M.G.L.; supervision, Y.M.; project administration, Y.M.; funding acquisition, Y.M., G.J.-G. and E.R.A. helped on the development on this research topic. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by DGAPA/PAPIIT, grant number IN208724, “Evaluación de la regeneración post-incendio bajo regímenes de incendios variables en bosques templados de Jalisco, México: una herramienta analítica para diseñar estrategias que fortalezcan la resiliencia socioambiental forestal ante el cambio climático”, awarded to Yosune Miquelajauregui. Furthermore, the first author gratefully acknowledges the scholarship support provided by SECIHTI (Secretaría de Educación, Ciencia, Tecnología e Innovación). The APC was funded by the corresponding author’ institutional affiliation.

Data Availability Statement

To ensure full methodological transparency, the complete codebase, processed datasets, and visualization scripts used in this study are publicly available in the following Zenodo repository: https://github.com/bevins93/bibliometric_global_wf, accessed on 20 March 2026. This repository includes all the data and notebooks required to replicate the bibliometric and geospatial workflows presented in this research.

Acknowledgments

This work constitutes partial fulfillment of the academic requirements for Kevin Manuel Galván-Lara within the Posgrado en Biociencias at the Universidad de Sonora. The authors gratefully acknowledge the administrative and technical support provided by the Maestría en Ciencia de Datos of the Universidad de Sonora and the Laboratorio Nacional de Ciencias de la Sostenibilidad (LANSCO) at the Institute of Ecology, UNAM. Special thanks are extended to the Alianza para Promover el Desarrollo de Capacidades Digitales en México and the Universidad Nacional Autónoma de México (UNAM) for their support in fostering technological innovation projects with an emphasis on Artificial Intelligence (AI).

Conflicts of Interest

The authors declare no conflicts of interest. The authors identify no personal circumstances or interests that may be perceived as inappropriately influencing the representation or interpretation of the reported research results. Furthermore, the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Bowman, D.M.J.S.; Balch, J.K.; Artaxo, P.; Bond, W.J.; Carlson, J.M.; Cochrane, M.A.; D’Antonio, C.M.; DeFries, R.S.; Doyle, J.C.; Harrison, S.P.; et al. Fire in the Earth System. Science 2009, 324, 481–484. [Google Scholar] [CrossRef]
Archibald, S.; Nickless, A.; Govender, N.; Scholes, R.J.; Lehsten, V. Climate and the Inter-annual Variability of Fire in Southern Africa: A Meta-analysis Using Long-term Field Data and Satellite-derived Burnt Area Data. Glob. Ecol. Biogeogr. 2010, 19, 794–809. [Google Scholar]
Tansey, K.; Grégoire, J.; Defourny, P.; Leigh, R.; Pekel, J.; van Bogaert, E.; Bartholomé, E. A New, Global, Multi-annual (2000–2007) Burnt Area Product at 1 Km Resolution. Geophys. Res. Lett. 2008, 35, L01401. [Google Scholar] [CrossRef]
Pechony, O.; Shindell, D.T. Driving Forces of Global Wildfires over the Past Millennium and the Forthcoming Century. Proc. Natl. Acad. Sci. USA 2010, 107, 19167–19170. [Google Scholar] [CrossRef]
Wagner, C.E. Van Conditions for the Start and Spread of Crown Fire. Can. J. For. Res. 1977, 7, 23–34. [Google Scholar] [CrossRef]
Whelan, R.J. The Ecology of Fire; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
Ryan, K. Dynamic Interactions between Forest Structure and Fire Behavior in Boreal Ecosystems. Silva Fenn. 2002, 36, 13–39. [Google Scholar] [CrossRef]
Johnson, E.A.; Miyanishi, K.; Weir, J.M.H. Wildfires in the Western Canadian Boreal Forest: Landscape Patterns and Ecosystem Management. J. Veg. Sci. 1998, 9, 603–610. [Google Scholar] [CrossRef]
Holsinger, L.; Parks, S.A.; Miller, C. Weather, Fuels, and Topography Impede Wildland Fire Spread in Western US Landscapes. For. Ecol. Manag. 2016, 380, 59–69. [Google Scholar] [CrossRef]
Miquelajauregui, Y.; Cumming, S.G.; Gauthier, S. Modelling Variable Fire Severity in Boreal Forests: Effects of Fire Intensity and Stand Structure. PLoS ONE 2016, 11, e0150073. [Google Scholar] [CrossRef]
Andela, N.; Morton, D.C.; Giglio, L.; Chen, Y.; van der Werf, G.R.; Kasibhatla, P.S.; DeFries, R.S.; Collatz, G.J.; Hantson, S.; Kloster, S.; et al. A Human-Driven Decline in Global Burned Area. Science 2017, 356, 1356–1362. [Google Scholar] [CrossRef]
Montoya, L.E.; Corona-Núñez, R.O.; Campo, J.E. Fires and Their Key Drivers in Mexico. Int. J. Wildland Fire 2023, 32, 651–664. [Google Scholar] [CrossRef]
Liu, Z.; Zhou, K.; Yao, Q.; Reszka, P. An Interpretable Machine Learning Model for Predicting Forest Fire Danger Based on Bayesian Optimization. Emerg. Manag. Sci. Technol. 2024, 4, e025. [Google Scholar] [CrossRef]
Choi, S.; Son, M.; Kim, C.; Kim, B. A Forest Fire Prediction Model Based on Meteorological Factors and the Multi-Model Ensemble Method. Forests 2024, 15, 1981. [Google Scholar] [CrossRef]
Tahri, M.; Badr, S.; Mohammadi, Z.; Kašpar, J.; Berčák, R.; Holuša, J.; Surový, P.; Marušák, R.; Yousfi, N. New Forest Fire Assessment Model Based on Artificial Neural Network and Analytic Hierarchy Process or Fuzzy-Analytic Hierarchy Process Methodology for Fire Vulnerability Map. Eng. Appl. Artif. Intell. 2024, 138, 109399. [Google Scholar] [CrossRef]
Chollet, F.; Watson, M. Deep Learning with Python, 3rd ed.; Simon and Schuster: New York, NY, USA, 2025. [Google Scholar]
Liz-López, H.; Huertas-Tato, J.; Pérez-Aracil, J.; Casanova-Mateo, C.; Sanz-Justo, J.; Camacho, D. Spain on Fire: A Novel Wildfire Risk Assessment Model Based on Image Satellite Processing and Atmospheric Information. Knowl.-Based. Syst. 2024, 283, 111198. [Google Scholar]
Sengottaiyan, N.; Ananthi, J.; Rajesh Sharma, R.; Hamsanandhini, S.; Akey, S.; Chinnaiyan, R.; Gemeda, K.A. Heuristic Forest Fire Detection Using the Deep Learning Model with Optimized Cluster Head Selection Technique. J. Comput. Netw. Commun. 2024, 2024, 6569596. [Google Scholar] [CrossRef]
Radford, D.A.G.; Maier, H.R.; van Delden, H.; Zecchin, A.C.; Jeanneau, A. Predicting Burn Probability: Dimensionality Reduction Strategies Enable Accurate and Computationally Efficient Metamodeling. J. Environ. Manag. 2024, 371, 123086. [Google Scholar] [CrossRef]
Al-Bashiti, M.K.; Nguyen, D.; Naser, M.Z.; Kaye, N.B. Predicting Wildfire Ember Hot-Spots on Gable Roofs via Deep Learning. Fire 2024, 7, 153. [Google Scholar] [CrossRef]
Alawode, G.L.; Gelabert, P.J.; Rodrigues, M. A Spatially Explicit Containment Modelling Approach for Escaped Wildfires in a Mediterranean Climate Using Machine Learning. Geomat. Nat. Hazards Risk 2025, 16, 2447514. [Google Scholar] [CrossRef]
Li, X.; Xue, Y.G.; Kong, F.; Li, Z.; Li, G. Analysis and Prediction of the River Levee Settlement Derived from Shield Tunneling Considering the Excavation Face Stability. Acta Geotech. 2024, 19, 3161–3184. [Google Scholar] [CrossRef]
Guo, Y.; Hai, Q.; Bayarsaikhan, S. Utilizing Deep Learning and Spatial Analysis for Accurate Forest Fire Occurrence Forecasting in the Central Region of China. Forests 2024, 15, 1380. [Google Scholar] [CrossRef]
Wang, N.; Zhao, S.; Wang, S. A Novel Clustering-Based Resampling with Cost-Sensitive Boosting Method to Model and Map Wildfire Susceptibility. Reliab. Eng. Syst. Saf. 2024, 242, 109742. [Google Scholar] [CrossRef]
Hai, T.; Sayed, B.T.; Majdi, A.; Zhou, J.; Sagban, R.; Band, S.S.; Mosavi, A. An Integrated GIS-Based Multivariate Adaptive Regression Splines-Cat Swarm Optimization for Improving the Accuracy of Wildfire Susceptibility Mapping. Geocarto Int. 2023, 38, 2167005. [Google Scholar] [CrossRef]
Roslin, A.H.; Muhammad, N.; Kadir, E.A.; Maharani, W.; Daud, H. Forecasting Locations of Forest Fires in Indonesia Through Nonparametric Predictive Inference with Parametric Copula: A Case Study. J. Q. Meas. Anal. 2025, 21, 237–251. [Google Scholar] [CrossRef]
Freitas, K.M.; Juvanhol, R.S.; Pinheiro, C.J.G.; de Moura Meneses, A.A. Prediction of Forest Fire Susceptibility Using Machine Learning Tools in the Triunfo Do Xingu Environmental Protection Area, Amazon, Brazil. J. S. Am. Earth Sci. 2025, 153, 105366. [Google Scholar] [CrossRef]
Sun, X.; Li, N.; Chen, D.; Chen, G.; Sun, C.; Shi, M.; Gao, X.; Wang, K.; Hezam, I.M. A Forest Fire Prediction Model Based on Cellular Automata and Machine Learning. IEEE Access 2024, 12, 55389–55403. [Google Scholar] [CrossRef]
Mambile, C.; Kaijage, S.; Leo, J. Application of Deep Learning in Forest Fire Prediction: A Systematic Review. IEEE Access 2024, 12, 190554–190581. [Google Scholar] [CrossRef]
Yu, Y.; Liu, L.; Chang, Z.; Li, Y.; Shi, K. Detecting Forest Fires in Southwest China From Remote Sensing Nighttime Lights Using the Random Forest Classification Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10759–10769. [Google Scholar] [CrossRef]
Mambile, C.; Kaijage, S.; Leo, J. Deep Learning Models for Enhanced Forest-Fire Prediction at Mount Kilimanjaro, Tanzania: Integrating Satellite Images, Weather Data and Human Activities Data. Nat. Hazards Res. 2024, 5, 335–347. [Google Scholar] [CrossRef]
Yu, S.; Singh, M. Deep Learning-Based Remote Sensing Image Analysis for Wildfire Risk Evaluation and Monitoring. Fire 2025, 8, 19. [Google Scholar] [CrossRef]
Ji, Y.; Wang, D.; Li, Q.; Liu, T.; Bai, Y. Global Wildfire Danger Predictions Based on Deep Learning Taking into Account Static and Dynamic Variables. Forests 2024, 15, 216. [Google Scholar] [CrossRef]
O’Dea, R.E.; Lagisz, M.; Jennions, M.D.; Koricheva, J.; Noble, D.W.A.; Parker, T.H.; Gurevitch, J.; Page, M.J.; Stewart, G.; Moher, D.; et al. Preferred Reporting Items for Systematic Reviews and Meta-analyses in Ecology and Evolutionary Biology: A PRISMA Extension. Biol. Rev. 2021, 96, 1695–1722. [Google Scholar] [CrossRef]
Bian, R.; Chen, K.; Li, G.; Wang, Z.; Qiu, Y.; Bai, H.; Kong, W. Evaluation of Three Algorithms and Forest Fire Risk Prediction in Zhejiang Province of China. Forests 2024, 15, 2146. [Google Scholar] [CrossRef]
Thi Hang, H.; Mallick, J.; Alqadhi, S.; Bindajam, A.A.; Abdo, H.G. Exploring Forest Fire Susceptibility and Management Strategies in Western Himalaya: Integrating Ensemble Machine Learning and Explainable AI for Accurate Prediction and Comprehensive Analysis. Environ. Technol. Innov. 2024, 35, 103655. [Google Scholar] [CrossRef]
Li, J.; Tang, H.; Li, X.; Dou, H.; Li, R. LEF-YOLO: A Lightweight Method for Intelligent Detection of Four Extreme Wildfires Based on the YOLO Framework. Int. J. Wildland Fire 2023, 33, WF23044. [Google Scholar] [CrossRef]
Li, J.; Wan, J.; Sun, L.; Hu, T.; Li, X.; Zheng, H. Intelligent Segmentation of Wildfire Region and Interpretation of Fire Front in Visible Light Images from the Viewpoint of an Unmanned Aerial Vehicle (UAV). ISPRS J. Photogramm. Remote Sens. 2025, 220, 473–489. [Google Scholar] [CrossRef]
Son, R.; Stacke, T.; Gayler, V.; Nabel, J.E.M.S.; Schnur, R.; Alonso, L.; Requena-Mesa, C.; Winkler, A.J.; Hantson, S.; Zaehle, S.; et al. Integration of a Deep-Learning-Based Fire Model Into a Global Land Surface Model. J. Adv. Model. Earth Syst. 2024, 16, e2023MS003710. [Google Scholar] [CrossRef]
Mofokeng, O.D.; Adelabu, S.A.; Jackson, C.M. An Integrated Grassland Fire-Danger-Assessment System for a Mountainous National Park Using Geospatial Modelling Techniques. Fire 2024, 7, 61. [Google Scholar] [CrossRef]
Peng, Y.; Su, H.; Sun, M.; Li, M. Reconstructing Historical Forest Fire Risk in the Non-Satellite Era Using the Improved Forest Fire Danger Index and Long Short-Term Memory Deep Learning—A Case Study in Sichuan Province, Southwestern China. For. Ecosyst. 2024, 11, 100170. [Google Scholar] [CrossRef]
Sykas, D.; Zografakis, D.; Demestichas, K. Deep Learning Approaches for Wildfire Severity Prediction: A Comparative Study of Image Segmentation Networks and Visual Transformers on the EO4WildFires Dataset. Fire 2024, 7, 374. [Google Scholar] [CrossRef]
Yang, M. A Study on Factors Influencing Cost Management in Green Building Construction. J. Build. Mater. Sci. 2025, 7, 153–174. [Google Scholar] [CrossRef]
Yang, J.; Jiang, H.; Wang, S.; Ma, X. A Multi-Scale Deep Learning Algorithm for Enhanced Forest Fire Danger Prediction Using Remote Sensing Images. Forests 2024, 15, 1581. [Google Scholar] [CrossRef]
Ahajjam, A.; Allgaier, M.; Chance, R.; Chukwuemeka, E.; Putkonen, J.; Pasch, T. Enhancing Prediction of Wildfire Occurrence and Behavior in Alaska Using Spatio-Temporal Clustering and Ensemble Machine Learning. Ecol. Inform. 2025, 85, 102963. [Google Scholar] [CrossRef]
Ahn, H.K.; Jung, H.; Lim, C.H. Can Ensemble Techniques and Large-Scale Fire Datasets Improve Predictions of Forest Fire Probability Due to Climate Change?—A Case Study from the Republic of Korea. Forests 2024, 15, 503. [Google Scholar] [CrossRef]
Aziz, N.F.A.; Ya’acob, N.; Yusof, A.L.; Kassim, M. A Review of Wildfire Studies Using Machine Learning Applications. J. Adv. Res. Appl. Mech. 2024, 114, 13–32. [Google Scholar] [CrossRef]
Wang, X.; Yao, W. GNSS-R-Based Wildfire Detection: A Novel and Accurate Method. Eur. J. Remote Sens. 2024, 57, 2413993. [Google Scholar] [CrossRef]
Dong, Z.; Zheng, C.; Zhao, F.; Wang, G.; Tian, Y.; Li, H. A Deep Learning Framework: Predicting Fire Radiative Power from the Combination of Polar-Orbiting and Geostationary Satellite Data during Wildfire Spread. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 10827–10841. [Google Scholar] [CrossRef]
Masoudian, E.; Mirzaei, A.; Bagheri, H. Assessing Wildfire Susceptibility in Iran: Leveraging Machine Learning for Geospatial Analysis of Climatic and Anthropogenic Factors. Trees For. People 2025, 19, 100774. [Google Scholar] [CrossRef]
Tian, Y.; Wu, Z.; Cui, S.; Hong, W.; Wang, B.; Li, M. Assessing Wildfire Susceptibility and Spatial Patterns in Diverse Forest Ecosystems across China: An Integrated Geospatial Analysis. J. Clean. Prod. 2025, 490, 144800. [Google Scholar] [CrossRef]

Figure 1. PRISMA flow diagram of the study selection process. From 1753 records identified, 23 duplicates were removed. After screening 1730 records by title and abstract, 1391 were excluded. A total of 341 studies met the inclusion criteria and were included in the final review. A complete list of reviewed studies is available in Table S1.

Figure 2. Annual publication counts of ML/DL-based wildfire prediction studies (2020–2025). Data represents the number of eligible peer-reviewed articles identified through systematic searches in Scopus and Web of Science following PRISMA-EcoEvo guidelines.

Figure 3. Geographic distribution of ML/DL wildfire prediction studies (2020–2025). The map displays the number of eligible peer-reviewed publications by country based on the primary affiliation of the first or corresponding author. Color intensity indicates publication counts within the final corpus (n = 341).

Figure 4. Distribution of algorithm families used in wildfire ML/DL studies (subset analysis, 2024). Percentages represent the proportion of studies within the methodological subset (n = 110) classified according to the structured taxonomy of algorithm families. Hybrid approaches were assigned based on their dominant architectural component.

Figure 5. Country-level distribution of algorithm families in wildfire ML/DL studies. Algorithm families were cross-tabulated with country affiliations and normalized to allow comparison across countries with different total publication counts. Color intensity represents relative frequency scores on a 0–4 scale. Data corresponds to the methodological subset analyzed in detail (n = 110). A complete dictionary of algorithm abbreviations used throughout the analysis is provided in Table S2 (Supplementary Material) to ensure methodological transparency and facilitate cross-study comparison.

Figure 6. Distribution of input data domains used for model training in wildfire ML/DL studies (subset analysis, 2024). Bars represent the percentage of studies within the methodological subset (n = 110) reporting each standardized input data category. Predictor variables were grouped into thematic domains to enable cross-study comparison.

Figure 7. Relative frequency of standardized evaluation metric categories in wildfire ML/DL research. Reported metrics were consolidated into predefined categories to reduce terminological heterogeneity. Values indicate the percentage of studies in the analyzed subset (n = 110) reporting each metric type.

Figure 8. Code availability among analyzed wildfire ML/DL studies. Studies were classified according to whether a publicly accessible and verifiable open-source repository was provided. Values indicate the percentage of studies in the methodological subset (n = 110) reporting accessible code.

Table 1. Algorithm classification framework used to standardize methodological reporting across wildfire ML/DL studies.

Algorithm Family	Algorithm	Core Characteristics	Typical Application Context	Author
Tree-Based & Ensemble Methods	Random Forest (RF), XGBoost, Gradient Boosting, Decision Trees	Non-parametric models combining multiple decision trees; robust to multicollinearity; strong performance on tabular data.	Risk prediction, tabular environmental datasets.	[27,35,36]
Deep Learning—Convolutional Neural Networks (CNNs)	CNN, U-Net, ResNet, YOLO	Convolutional layers extracting spatial features; effective for image and raster data.	Satellite imagery, spatial fire detection, smoke recognition.	[32]
Deep Learning—Specialized/Hybrid Architectures	Hybrid CNN-LSTM, CNN-ASPP, custom architectures	Architecture combining multiple DL components or novel model designs (e.g., spatial + temporal).	Multi-modal or spatiotemporal modeling.	[14,22,33,37,38,39]
Classical/Statistical Models	Logistic Regression, Generalized Linear Models (GLM), MaxEnt	Parametric, interpretable statistical models; often used as baseline comparisons.	Binary occurrence prediction, susceptibility mapping.	[35,40]
Deep Learning—Feedforward Networks	ANN, MLP	Fully connected neural networks; no temporal or spatial convolution.	Structured/tabular datasets, global fire danger assessment.	[14,15,22,37,38,39]
Deep Learning—Recurrent/Temporal Models	LSTM, GRU, ConvLSTM	Capture temporal dependencies through recurrent units.	Time-series fire spread prediction, historical reconstruction.	[14,22,29,31,37,38,39,41]
Deep Learning—Generative Models	GAN, Autoencoders, CVAE	Generate or reconstruct data distributions.	Synthetic data generation, anomaly detection, spread simulation.	[17]
Deep Learning—Transformers & Vision Transformers	Transformer, ViT, Swin	Attention-based architecture capturing long-range dependencies.	Advanced spatial or sequence modeling, sub-seasonal forecasting.	[42,43,44]
Support Vector Machines & Related	SVM, SVR, TSVM	Margin-based classifiers/regressors; effective in high-dimensional spaces.	Classification/regression with limited data.	[25,28,35,36]

Table 2. Distribution of reviewed studies according to wildfire prediction task type (2020–2025).

Task	Number of Studies	%
Occurrence/Risk prediction	258	75.7%
Burned area/Severity prediction	35	10.3%
Detection/Monitoring	31	9.1%
Spread/Propagation modeling	5	1.5%
Other/Unclear	12	3.5%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Galván Lara, K.M.; Miquelajauregui, Y.; Enriquez Ocaña, L.F.; Meling-López, A.E.; Neger, C.; Abatzoglou, J.; Galicia, L.; Hinojo, C.; Jiménez-Guzmán, G.; Rodríguez Alcantar, E. Machine Learning and Deep Learning for Wildfire Prediction: A Systematic and Bibliometric Review of Methods, Data Practices, and Reproducibility (2020–2025). Fire 2026, 9, 204. https://doi.org/10.3390/fire9050204

AMA Style

Galván Lara KM, Miquelajauregui Y, Enriquez Ocaña LF, Meling-López AE, Neger C, Abatzoglou J, Galicia L, Hinojo C, Jiménez-Guzmán G, Rodríguez Alcantar E. Machine Learning and Deep Learning for Wildfire Prediction: A Systematic and Bibliometric Review of Methods, Data Practices, and Reproducibility (2020–2025). Fire. 2026; 9(5):204. https://doi.org/10.3390/fire9050204

Chicago/Turabian Style

Galván Lara, Kevin Manuel, Yosune Miquelajauregui, Luis Fernando Enriquez Ocaña, Alf Enrique Meling-López, Christoph Neger, John Abatzoglou, Leopoldo Galicia, César Hinojo, Graciela Jiménez-Guzmán, and Edelmira Rodríguez Alcantar. 2026. "Machine Learning and Deep Learning for Wildfire Prediction: A Systematic and Bibliometric Review of Methods, Data Practices, and Reproducibility (2020–2025)" Fire 9, no. 5: 204. https://doi.org/10.3390/fire9050204

APA Style

Galván Lara, K. M., Miquelajauregui, Y., Enriquez Ocaña, L. F., Meling-López, A. E., Neger, C., Abatzoglou, J., Galicia, L., Hinojo, C., Jiménez-Guzmán, G., & Rodríguez Alcantar, E. (2026). Machine Learning and Deep Learning for Wildfire Prediction: A Systematic and Bibliometric Review of Methods, Data Practices, and Reproducibility (2020–2025). Fire, 9(5), 204. https://doi.org/10.3390/fire9050204

Article Menu

Machine Learning and Deep Learning for Wildfire Prediction: A Systematic and Bibliometric Review of Methods, Data Practices, and Reproducibility (2020–2025)

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Temporal Trends and Geographic Distribution of Research Output (2020–2025)

3.2. Algorithm Families and Methodological Adoption Patterns

3.3. Geographic Variation in Algorithm Family Adoption

3.4. Input Data Domains Used for Model Training

3.5. Evaluation of Metrics and Alignment with Predictive Task Types

3.6. Transparency and Reproducibility Practices

4. Discussion

4.1. Temporal Trends and Geographic Distribution

4.2. Algorithm Families and Methodological Patterns

4.3. Input Data Domains

4.4. Evaluation Metrics and Task Alignment

4.5. Transparency and Reproducibility

4.6. Implications and Limitations

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI