1. Introduction
Forest fires have become an escalating global concern, now affecting approximately one-third of the world’s forests, equivalent to about 36% of all forested land and nearly a third of Earth’s terrestrial surface [
1,
2]. On average, between 3.3 and 6.3 million km
2 burn each year [
3], and this annual burn rate is expected to increase due to climate change [
4]. Beyond the widespread loss of ecosystem cover, forest fires also impose severe human and economic costs. The scale and intensity of wildfire impacts underscore the urgent need for comprehensive fire management and mitigation strategies, supported by data-driven methodologies and robust fire prediction and management decision-support tools. These systems must not only identify the underlying drivers of fire activity, including biophysical and anthropogenic ignition sources, but also account for uncertainty in fire behavior and response outcomes, supporting the prediction, assessment, and evaluation of management strategies through informed, data-driven decision-making.
The dynamics of wildfire are influenced by a complex interplay of climate, topography, vegetation, and human activity, operating across multiple temporal and spatial scales [
5,
6,
7]. Among these drivers, climatic variables play a critical role [
2,
8]. Elevated temperatures, low humidity, reduced precipitation, strong winds, and increased solar radiation influence ignition probability, spread rate, and fire intensity. Topography further modulates fire behavior and risk [
6]. Higher elevations and proximity to water bodies generally reduce fire risk due to cooler microclimates and higher humidity levels, whereas steep slopes and aspects facing south facilitate rapid and intense fire ignition and spread [
9]. Vegetation characteristics also play a pivotal role, with ecosystems featuring high canopy bulk densities, closed canopies, and abundant dry biomass being more prone to frequent and intense fires [
10,
11]. Additionally, human activities, including population density, road networks, and agricultural burning, significantly elevate ignition risk, while socioeconomic factors, such as regional development and the availability of fire prevention and suppression infrastructure, influence the capacity to manage and mitigate wildfire impacts [
11,
12].
Historically, fire regime characterization and behavior prediction have relied on physical models based on combustion thermodynamics [
13] and empirical meteorological indices. While effective for identifying broad correlations, these traditional methods often assume linearity and variable independence [
13], oversimplifying the complex, nonlinear interactions inherent in wildfire processes and lacking the capacity to integrate high-dimensional socioeconomic factors [
14]. To address these gaps, modern approaches have emerged: Multi-Criteria Decision Making (MCDM) methods like Analytic Hierarchy Process (AHP) and fuzzy logic integrate expert judgment with quantitative data across criteria such as topography and climate to produce vulnerability maps [
15]. Machine learning (ML), described as the search for useful representations and rules over an input dataset performed within a predefined space and guided by a feedback signal, and deep learning (DL), a specific subfield emphasizing the learning of successive layers of increasingly meaningful representations, support resource allocation [
16]; e.g., Wildfire Assessment Model; [
17]) and early warning through models like IOFireNet and Long-Short Term Memory (LSTM) [
18]. Simulation metamodels, often using artificial neural networks, emulate costly physics-based simulations to rapidly evaluate management scenarios [
19]. Explainable AI (XAI) tools like SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) enhance trust by clarifying model predictions [
20]. Finally, operational Decision Support Systems (DSS) integrate these components for real-time tactical response [
21]. Despite these advances, structural incompatibilities persist, as physical models require precise combustion parameters, while socioeconomic variables, such as Gross Domestic Product (GDP) or road distance, are indirect ignition proxies that cannot be directly ingested [
22,
23,
24,
25].
To address the limitations of physical and empirical models, wildfire management is increasingly adopting data-driven architectures and decision support tools capable of integrating multiscale environmental and human-related data to enhance prevention, detection, and response, especially under the uncertainties posed by global change. Geographic Information Systems (GIS) integrated with (MCDA) support dynamic fire risk mapping by synthesizing diverse environmental and socioeconomic variables, facilitating the strategic allocation of firefighting resources to areas of highest risk [
21]. Machine learning (ML) and deep learning (DL) models have demonstrated substantial potential to improve wildfire forecasting and enable more dynamic, data-driven resource allocation by capturing complex, nonlinear relationships among environmental, anthropogenic, and physical variables. Trained on historical, environmental, and fuel-related datasets, often derived from satellite imagery and sensor networks, these models can be applied to previously unseen regions, generalize across diverse and non-uniform input conditions, and improve predictive accuracy over time through continuous learning from new data and past forecasting errors [
26]. Defined as analytical and computational frameworks, decision-making tools enable authorities to manage the multifaceted uncertainties typical of wildfire scenarios [
27]. According to [
27,
28]. However, the development and validation of such decision-support systems are not evenly distributed across regions.
A significant geographic bias persists in wildfire research, with scientific output heavily concentrated in data-rich, high-income nations, particularly the United States, China, France and other countries, despite these regions representing only a small fraction of global burned area [
29]. Empirical evidence supports this imbalance. For instance, about 15% of wildfire publications focus on the western United States, which accounts for just 0.5% of global burned area [
30]. Conversely, high burn regions like Africa and Siberia remain severely underrepresented in predictive modeling, even though Africa and South America together contribute over 70% of the world’s burned area [
31,
32,
33]. This imbalance stems from limited technological infrastructure, scarce localized high-quality datasets, and a lack of AI expertise in the Global South, leading to models trained on Mediterranean or boreal ecosystems that may not generalize to tropical or savanna fire regimes [
29].
Although ML and DL approaches are increasingly applied in wildfire research, there remains a limited synthesis of how methodological configurations, data practices, and transparency standards are distributed across the literature. Addressing this gap is essential for understanding the structure of the current research landscape and its implications for reproducibility and methodological transferability.
This study systematically characterizes recent (2020–2025) applications of Machine Learning (ML) and Deep Learning (DL) in wildfire prediction through a structured literature review following the PRISMA-EcoEvo protocol. Specifically, the review aims to:
(1) Characterize the temporal and geographic evolution of ML/DL-based wildfire prediction research between 2020 and 2025; (2) examine how predictive tasks, algorithm families, input data types, and evaluation metrics are configured across the literature; and (3) evaluate transparency and reproducibility by examining completeness and the availability of open-source materials.
2. Materials and Methods
This study employed a systematic literature review following the PRISMA EcoEvo protocol [
34], (
Figure 1, to identify peer-reviewed studies applying Machine Learning (ML) or Deep Learning (DL) techniques to wildfire prediction. Searching was conducted in the Scopus and Web of Science (WoS) Core Collection databases using the Boolean query (“WILDFIRE” OR “FOREST FIRE”) AND (“MACHINE LEARNING” OR “DEEP LEARNING”) AND (“PREDICTION” OR “FORECAST”) AND (“DECISION MAKING” OR “DECISION SUPPORT”). The search was limited to English-language publications dated between 1 January 2020 and 31 December 2025.
This systematic review followed transparent and reproducible research practices and was prospectively registered in the Open Science Framework (OSF). The study protocol, including the search strategy, eligibility criteria, and data extraction framework, is publicly available to ensure methodological transparency and facilitate reproducibility. The registration can be accessed at the following link:
https://osf.io/tc9s2/overview, accessed on 26 March 2026. This registration documents the analytical workflow and supports the traceability of decisions made during the review process.
Studies were included if they (i) explicitly reported the ML/DL algorithm(s) used, (ii) incorporated geospatial or remote sensing input data, and (iii) presented quantitative evaluation metrics (e.g., accuracy, AUC, F1 score) for predicting wildfire occurrence, behavior, or risk in natural or Wildland–Urban Interface (WUI) environments. Reviews, editorials, abstracts, operational tools without a core predictive modeling component, urban structural fire studies, and studies relying solely on non-spatial theoretical or mathematical formulations were excluded. Screening involved duplicate removal followed by title, abstract, and full-text assessment according to predefined criteria. The complete selection process is documented in
Figure 1.
From the 1730 records initially retrieved through database searches, 1335 were excluded during title and abstract screening because they did not meet the inclusion criteria of this review. Excluded studies comprised articles unrelated to wildfire phenomena, studies addressing wildfire detection or monitoring without predictive modeling components, works focused on non-ML/DL approaches, methodological contributions lacking application to wildfire prediction tasks, and publications outside the thematic scope defined by the review protocol. A second abstract-level relevance screening was then conducted to ensure consistency with the study objectives, leading to the removal of additional records that did not explicitly address wildfire occurrence or risk prediction using data-driven approaches. The full screening procedure is summarized in the PRISMA flow diagram, and detailed inclusion and exclusion criteria are described in the PRISMA-EcoEvo methodological framework presented in the
Section 2 Methods section.
The resulting corpus of 341 studies served as the foundation for multiple analytical approaches: bibliometric analyses of temporal trends and geographic distribution, word clouds for textual pattern representation created using the WordCloud library, and an examination of relationships between frequently used algorithms and the primary country of study to explore potential regional patterns. Building on this comprehensive analysis, a subset of 110 articles published in 2024 (approximately 30% of the total corpus) was examined in greater methodological detail to assess algorithm classification, input data domains, evaluation metric alignment, and code availability practices.
To standardize algorithm reporting and reduce nomenclatural heterogeneity, predictive models were categorized using a structured taxonomy comprising nine families: (1) Tree-Based and Ensemble Methods; (2) Deep Learning—Convolutional Neural Networks (CNNs); (3) Deep Learning—Specialized, Hybrid, or Novel Architectures; (4) Classical or Statistical Models; (5) Deep Learning—Feedforward Networks; (6) Deep Learning—Recurrent or Temporal Models; (7) Deep Learning—Generative Models; (8) Deep Learning—Transformers and Vision Transformers; and (9) Support Vector Machines and Related Methods. Hybrid approaches were classified according to their dominant architectural component (
Table 1).
To identify and quantify the primary categories of input data used for model training, predictor variables were grouped into standardized thematic domains: meteorological, topographical, remote sensing, and socioeconomics. Meteorological sources were further distinguished between global reanalysis products (e.g., ERA5) and local station data. Each data source was additionally flagged as public (open access) or private/local to contextualize accessibility and reproducibility potential.
To examine reported evaluation metrics and their alignment with predictive task types, metrics were harmonized to consolidate synonymous terminology (e.g., Sensitivity, Recall, and True Positive Rate grouped under “Recall”) and categorized according to task type. Classification metrics (e.g., AUC, F1 Score) were distinguished from regression metrics (e.g., RMSE, MAE) to avoid analytical confounding between occurrence/risk and spread/burned area models.
Complete mappings of algorithm families, input data domains, and evaluation metrics are provided in the
Supplementary Material to ensure methodological transparency and facilitate replication.
To evaluate transparency and reproducibility practices, each article in the methodological subset was manually screened for the presence of accessible open-source code repositories (e.g., GitHub, institutional repositories). Code availability was recorded as a binary variable (“Yes”/“No”), and a study was considered reproducible only if a functional link to executable code was provided in the manuscript or
Supplementary Materials. Only publicly accessible and verifiable repositories were considered valid for classification. A χ
2 test was conducted to evaluate the association between code sharing and algorithm type.
The reviewed literature was additionally categorized (
Table 2) according to predictive task type based on title and abstract screening. Most studies focused on wildfire occurrence or risk prediction (75.7%), followed by burned-area or severity estimation (10.3%) and detection or monitoring-related predictive workflows (9.1%). Only a small fraction addressed wildfire spread or propagation modeling (1.5%). This distribution confirms that the dominant focus of recent ML/DL wildfire prediction research lies in occurrence and susceptibility modeling rather than fire dynamics simulation.
All analyses were conducted using Python (v3.11.12) within the Google Colab environment. Data processing, bibliometric analyses, statistical tests, and reproducibility assessments were implemented using reproducible scripts. Visualizations were generated with established libraries, including Matplotlib: 3.10.8, Altair (Vega-Altair): 6.0.0, and Plotly (Python): 6.0.0. Geospatial analyses and choropleth maps were produced using GeoPandas 1.1.3 to process vector geometries and overlay country-level boundaries.
To ensure full methodological transparency, the complete codebase, processed datasets, and visualization scripts are publicly available in a Zenodo repository (bevins93/bibliometric_global_wf: Machine and Deep Learning in Wildfire Prediction: A Systematic and Bibliometric Analysis of Methods, Data, and Reproducibility (2020–2025)), enabling replication of the bibliometric and geospatial workflows presented in this study.
4. Discussion
We conducted a systematic review to characterize recent (2020–2025) applications of Machine Learning (ML) and Deep Learning (DL) in wildfire prediction, examining temporal growth, geographic concentration, methodological configurations, evaluation practices, and transparency patterns.
4.1. Temporal Trends and Geographic Distribution
This study highlights the rapid growth in scientific production related to ML and DL applications in wildfire analysis over the past five years (
Figure 2). Rather than representing a simple increase in publication volume, this acceleration signals the consolidation of wildfire prediction as a computational research domain within applied environmental sciences. This trend aligns with broader developments in the literature, as the use of ML techniques [
13,
32,
45,
46,
47,
48] and DL approaches [
20,
38,
39,
49] has steadily increased.
Several factors may contribute to this rise, including technological advances in DL architectures [
20,
28,
38,
39], improvements in computational capabilities that enable large-scale data processing [
19] and increasing availability of high-resolution environmental datasets [
21,
42]. The expanded accessibility of meteorological and satellite data has further facilitated data-driven wildfire prediction modeling. Importantly, this convergence of computational power and environmental data availability reduces barriers to entry for model development but does not necessarily ensure comparability or methodological rigor across studies.
The geographic concentration of wildfire prediction research in high-income countries such as China, the United States, France, Italy, India, Portugal, Greece, and Canada reflects disparities in research infrastructure and data ecosystems. This distribution reflects differences in research infrastructure, data availability, and computational resources. In addition, this concentration has epistemological implications: models developed predominantly in temperate or boreal systems may implicitly encode region-specific fire regimes, potentially limiting transferability to tropical, savanna, or understudied ecosystems. However, it is important to acknowledge that this pattern is partially shaped by linguistic bias in Scopus and WoS, which favor English-language publications. Consequently, research from Latin America (e.g., Mexico, Chile) or Francophone Africa produced in local languages may be underrepresented, despite high fire incidence in those regions.
Beyond reflecting disparities in research infrastructure, this geographic concentration has important implications for the representativeness and operational relevance of wildfire prediction models at the global scale. Empirical evidence indicates that approximately 15% of wildfire-related predictive studies focus on the western United States alone, despite this region accounting for only about 0.5% of the global burned area, while regions such as Africa and large portions of South America, which together contribute more than 70% of global burned area, remain substantially underrepresented in predictive modeling efforts. This imbalance suggests that current methodological advances are disproportionately calibrated using temperate and Mediterranean-type fire regimes, potentially limiting their transferability to tropical and savanna-dominated ecosystems where ignition dynamics, vegetation structure, and human–fire interactions differ substantially. As a consequence, geographic bias does not only affect publication distribution but may also shape the implicit assumptions embedded within model architectures, training datasets, and evaluation strategies. In practice, this can reinforce structural gaps in decision-support capacity across fire-prone regions of the Global South, where predictive tools are most urgently needed but remain least represented in the literature. Addressing this imbalance, therefore, represents not only a question of scientific equity but also a prerequisite for improving the robustness and global applicability of ML/DL-based wildfire prediction systems.
4.2. Algorithm Families and Methodological Patterns
From a methodological perspective, Random Forest (RF) remains widely used, particularly in countries such as China and Australia, suggesting both the robustness of this algorithm and its suitability for predominantly tabular environmental datasets [
21,
35,
36,
50]. Its continued prominence may be attributed to computational efficiency, interpretability, and strong performance with structured regional datasets [
36,
43].
Its widespread use may also reflect its relative ease of implementation, interpretability, and solid performance with smaller or moderately sized datasets, making it a practical choice for many research groups [
36,
44].
In contrast, deep learning (DL) methods, such as Convolutional Neural Networks (CNNs), Artificial Neural Networks (ANNs), and Long Short-Term Memory (LSTM) networks, are increasingly represented, their adoption remains less dominant. These techniques are typically employed in more specialized contexts, such as satellite image classification (e.g., CNN, U Net) and time series prediction (e.g., LSTM, ConvLSTM), where their strengths in capturing spatial or temporal patterns are most advantageous [
17]. However, their higher data and computational requirements, as well as interpretability challenges, may partly explain their more limited and targeted application.
The geographic stratification observed in algorithmic adoption further complicates evaluation practices. China’s stronger reliance on ensemble methods, often associated with classification-oriented risk modeling, and the United States’ greater emphasis on deep learning architectures, frequently applied to spatial or regression-based tasks, reflect distinct regional data ecosystems and operational priorities. As a result, metric selection is not only task-dependent but also shaped by local research infrastructures and data availability. This fragmentation does not necessarily guarantee uniform standardization; rather, it reinforces the need for explicitly articulated, context-sensitive evaluation frameworks. However, when divergent metric practices are combined with the near absence of open-source code in leading research hubs, comparability and validation become severely constrained. In particular, the limited adoption of metrics tailored to imbalanced datasets (e.g., rare high-severity fires) and spatially explicit predictions, despite their documented importance [
31,
40], amplifies reproducibility gaps and restricts the transferability of models across regions. Strengthening the field, therefore, requires not homogenization, but transparent reporting standards and reproducible implementations that clearly link evaluation criteria to operational goals while enabling cross-context validation.
Our hierarchical analysis reveals that while Deep Learning (DL) architectures like CNNs dominate image-based tasks (e.g., active fire detection from satellite imagery), Ensemble Tree-based methods (Random Forest, XGBoost) remain the state of the art for tabular risk prediction, a resilience driven by their computational efficiency and robustness on structured, sample-limited regional fire records. Critically, this methodological duality is not merely technical but geographically stratified, as evidenced by the heatmap in
Figure 5, which exposes a stark bifurcation between research hubs: China’s overwhelming adoption of ensemble methods, with RF and XGBoost usage intensity exceeding 3.5 on a scale from 0 to 4.0, contrasts sharply with the United States’ stronger emphasis on DL frameworks (CNNs, LSTMs), reflecting divergent data ecosystems, China’s reliance on historical tabular datasets versus the U.S.’s access to high resolution satellite and spatial data.
4.3. Input Data Domains
The dominance of vegetation/fuel characteristics and historical fire labels indicates prioritization of biophysical predictors. This emphasis reflects both ecological theories, where fuel load and ignition history are central drivers, and practical data availability, as such datasets are more consistently archived than socioeconomic variables.
Climatic and meteorological variables, although categorized separately, often function as auxiliary predictors integrated within broader environmental datasets. Remote sensing data is particularly central to DL-based studies, reinforcing the coupling between data modality and architecture.
The limited incorporation of socioeconomic variables does not necessarily indicate conceptual neglect; rather, it may reflect the fragmentation of wildfire research between biophysical spread modeling and risk-oriented WUI frameworks. However, the underrepresentation of socioeconomic data constrains the integration of vulnerability and exposure dimensions into predictive modeling.
A key implication is that current wildfire ML/DL research remains predominantly hazard-centric rather than fully risk-integrated, which may limit its alignment with decision-support systems that require socio-environmental synthesis.
4.4. Evaluation Metrics and Task Alignment
Performance evaluation in wildfire prediction research exhibits a nuanced distribution of metrics, with Precision, Recall, and F1 metrics dominating at 26.7% (
Figure 7), reflecting the field’s strong emphasis on binary risk prediction and early warning systems, where minimizing false negatives is critical for disaster response. Notably, regression error metrics (e.g., RMSE, MAE) constitute a substantial 25% of evaluations, challenging the notion of “less attention” to continuous variables such as burned areas and spread dynamics. This comparable prevalence of regression and classification metrics suggests that wildfire prediction research increasingly integrates both discrete risk detection and continuous spatial modeling, reflecting the dual operational demands of early warning and resource allocation. Computational performance metrics (21.7%) and segmentation or spatial overlap measures (11.7%) further highlight emerging priorities for model efficiency and geospatial accuracy, while accuracy-based metrics and threshold-independent approaches play supplementary roles. Overall, performance evaluation exhibits clear differentiation between classification and regression tasks, reflecting fundamentally distinct predictive objectives within the field. Precision, Recall, and F1 metrics dominate classification-oriented studies, consistent with applications focused on fire occurrence and risk prediction, whereas regression error metrics remain central to modeling continuous outcomes such as burned area and spread dynamics.
Critically, the underrepresentation of metrics tailored for imbalanced datasets (e.g., rare high-severity fires) and spatially explicit predictions, despite their documented importance [
31,
40], exacerbates reproducibility challenges, particularly when paired with the 92.3% code non-sharing rate observed in high-impact studies. When metric selection is not explicitly aligned with predictive task type or decision-making context, reported model performance may be difficult to interpret operationally or compare across studies. To advance the field, community-wide adoption of context-aware evaluation standards, prioritizing metrics like spatial overlap measures for geospatial tasks and recall-sensitive frameworks for risk prediction, is essential to align methodological rigor with the dual imperatives of scientific validation and real-world operational utility.
4.5. Transparency and Reproducibility
Our analysis reveals a field at a critical inflection point: while wildfire prediction research has matured methodologically, with ensemble methods (26.7%) and deep learning architectures (59.4%) driving task-specific advances, and evaluation practices have diversified to address both classifications, using Precision, Recall, and F1 metrics (26.7%) and regression challenges (25.0% error metrics), this technical sophistication exists in stark tension with a profound reproducibility crisis. Only 7.7% of studies shared code, while 92.3% operated as closed “black boxes” (
Figure 8), a disparity that intensifies precisely where innovation concentrates. The geographic stratification of algorithmic adoption, China’s ensemble dominance versus U.S. deep learning specialization, compounds this challenge: regional methodological preferences, though contextually rational, fragment validation pathways when paired with near universal code non-sharing. This gap represents not merely a reporting omission, but a structural constraint on cumulative knowledge building in computational science, where reproducibility depends on executable artifacts rather than narrative descriptions alone.
When methodological specialization coexists with limited transparency, cross-regional validation becomes constrained, potentially reinforcing regional silos in model development.
This geographic fragmentation, however, amplifies the reproducibility crisis previously identified: 92.3% of studies withhold code (
Figure 5), with leading nations like China and the U.S., where complex, context-specific models dominate, contributing disproportionately to this gap. The absence of open-source practices in these high-impact regions creates a paradox where methodological innovation, such as China’s ensemble-driven precision in risk prediction or the U.S.’s DL powered spatial modeling, coexists with systemic barriers to validation and scalability. This underscores the urgent need for context-aware standardization: rather than a universal “best practice”, the field requires evaluation frameworks that account for regional data constraints, environmental heterogeneity, and task-specific priorities [
29,
35,
41,
51]. Future progress may hinge on hybrid architectures that merge the interpretability of ensembles with DL’s spatial–temporal modeling capabilities, but such integration demands not only technical innovation but also wide community commitments to transparency, ensuring that the geographic and methodological pluralism driving wildfire prediction research becomes a catalyst for robust, globally applicable solutions rather than a source of fragmentation.
Critically, the chi-square association between algorithm type and code availability (χ2 = 78.0, df = 44, p = 0.0012) reveals a statistically significant dependency, indicating that code availability is not independent of the algorithm employed. Notably, algorithms drive recent advances in the field, such as deep learning and hybrid architectures, and exhibit distinct patterns of repository disclosure, underscoring structural differences in reproducibility practices across methodological approaches. This variation may reflect differences in implementation complexity, institutional norms, or intellectual property considerations associated with certain architectures.
Beyond code availability alone, transparency in wildfire prediction research should be understood as a structural property of the modeling pipeline rather than a binary reporting decision. Reproducibility depends not only on executable scripts, but also on access to preprocessing workflows, feature engineering logic, training–validation partition strategies, hyperparameter selection procedures, and metadata describing spatial and temporal sampling assumptions. When these components remain implicit, models may appear technically replicable while remaining practically non-transferable across regions or datasets. In this sense, the current transparency gap reflects a broader infrastructural limitation affecting interoperability between research groups rather than a simple absence of repositories. Strengthening structural transparency, therefore, requires moving from isolated code release practices toward fully documented computational pipelines that enable cross-context validation, facilitate benchmark construction, and support operational adoption by agencies responsible for wildfire preparedness and response.
This paradox undermines the field’s dual mandate: scientific rigor and operational utility. Without transparent implementations, even metrics-rich evaluations, such as spatial overlap measures and recall-sensitive frameworks, become unverifiable assertions rather than actionable benchmarks. In high-stakes domains such as wildfire management, reproducibility is not merely an academic norm but a prerequisite for reliability. Models deployed without transparent validation pathways risk undermining stakeholder trust and limiting operational adoption.
Moving forward, reproducibility must transition from an optional virtue to a structural requirement. Journals and funding agencies should mandate code deposition for publication, while the community must develop context-aware open benchmarks that respect regional data constraints yet enable cross-validation. Only through such integration, where methodological pluralism coexists with uncompromising transparency, can wildfire prediction evolve from fragmented academic exercises into globally interoperable, lifesaving decision support systems. The path forward demands not less innovation, but innovation anchored in verifiability: because in wildfire management, unvalidated predictions are not merely scientifically incomplete, they are operationally dangerous.
4.6. Implications and Limitations
It is important to acknowledge that the patterns identified in this study may be partially influenced by search bias and methodological constraints. Although the Scopus and Web of Science (WoS) databases provide broad coverage of the peer-reviewed literature, they may omit relevant region-specific publications or studies published in languages other than English. As such, certain geographic regions or local approaches may be underrepresented. Database selection and language restrictions may therefore shape the observed geographic distribution and methodological patterns identified in this review.
Furthermore, while we conducted a subset analysis to explore specific objectives, such as open-source code availability, this approach may introduce additional bias by emphasizing studies that met our selection criteria more narrowly. The detailed coding of 110 articles enabled fine-grained methodological analysis but necessarily narrows interpretive scope relative to the full corpus. Despite these limitations, the study offers robust and valuable insights, strengthened by the rigorous application of the PRISMA Eco-Evo protocol for systematic literature reviews. The transparent reporting of search criteria, inclusion thresholds, and coding procedures enhances the reproducibility and interpretability of the synthesis itself. Our subset analysis involved a detailed review of 110 articles, particularly assessing aspects such as algorithm usage, performance evaluation, and code availability, thereby providing a solid empirical foundation for our conclusions.
Our findings carry important implications for various stakeholders involved in wildfire prediction research and its application. For researchers, the results highlight the urgent need to incorporate a broader range of data sources, particularly by including underrepresented socioeconomic and spatial factors. Additionally, the observed heterogeneity in evaluation metrics underscores the necessity of standardizing or more explicitly justifying metric selection to enhance comparability and reproducibility across studies. Addressing these methodological gaps will strengthen the reliability and impact of future research.
For practitioners and decision makers, this review offers a comprehensive overview of the predominant tools, algorithms, and data types currently employed in wildfire prediction. Understanding the relationship between algorithm families and data modalities (e.g., tabular vs. spatial inputs) is essential for selecting models that are context-appropriate rather than technically novel but operationally misaligned. This synthesis can inform the selection of models tailored to specific contexts and operational needs, ultimately improving the effectiveness of wildfire risk management and mitigation strategies.
From the viewpoint of those who provide research funding and develop policies, the geographic and thematic disparities revealed in our analysis point to a clear need for targeted investment aimed at promoting research equity and fostering innovation in underrepresented regions. Moreover, funding agencies and policymakers should consider implementing incentives that encourage transparency, reproducibility, and the adoption of open science practices. Such measures would facilitate broader collaboration, accelerate scientific progress, and enhance the overall robustness of wildfire prediction research. Advancing wildfire prediction, therefore, requires not only technical advancement but systemic alignment between data ecosystems, evaluation standards, and transparency norms across regions. These insights emphasize that advancing wildfire prediction is not solely a technical challenge but also requires concerted efforts across research, practice, and policy domains to build inclusive, transparent, and effective scientific frameworks.
For future research, we recommend shifting focus from purely algorithmic novelty to operational validity. True ‘Decision-making tools’ require not just high accuracy, but interpretability, uncertainty quantification, and, crucially, reproducibility. Future reviews should explicitly assess the operational readiness level (TRL) of these models and evaluate whether methodological advances translate into deployable decision-support systems.
5. Conclusions
The application of Machine Learning (ML) and Deep Learning (DL) to wildfire prediction expanded substantially between 2020 and 2025, with research output concentrated in a limited number of high-income countries, particularly China, the United States, and Australia. Methodologically, ensemble tree-based methods (e.g., Random Forest) and deep learning architectures (e.g., CNNs, ANNs) coexist, reflecting adaptation to distinct data modalities and modeling objectives rather than a simple replacement of traditional ML approaches.
In terms of data practices, studies predominantly rely on meteorological and topographical variables, with increasing but still limited integration of remote sensing inputs, while socioeconomic variables remain comparatively underrepresented. This distribution suggests that wildfire ML/DL research continues to prioritize biophysical predictors over human-system variables, potentially constraining applicability across diverse socio-environmental contexts.
Evaluation practices demonstrate differentiation between classification-oriented tasks (e.g., occurrence and risk prediction) and regression-based modeling (e.g., burned area and spread), yet metric heterogeneity limits cross-study comparability. Finally, the limited availability of open-source code represents a structural constraint on reproducibility and cumulative knowledge building.
Collectively, these findings indicate that wildfire ML/DL research is advancing technically but remains geographically concentrated and structurally constrained by transparency limitations. Strengthening reporting standards, dataset documentation, metric justification, and open code practices will be essential for translating methodological innovation into robust, globally applicable wildfire decision-support systems.