A Recurrent Neural Network for Forecasting Dead Fuel Moisture Content with Inputs from Numerical Weather Models

Hirschi, Jonathon; Mandel, Jan; Hilburn, Kyle; Farguell, Angel

doi:10.3390/fire9010026

Open AccessArticle

A Recurrent Neural Network for Forecasting Dead Fuel Moisture Content with Inputs from Numerical Weather Models

¹

Department of Mathematical and Statistical Sciences, University of Colorado Denver, 1201 Larimer St., Denver, CO 80204, USA

²

Cooperative Institute for Research in the Atmosphere (CIRA), Colorado State University, 3925A West Laporte Ave., Fort Collins, CO 80521, USA

³

Wildfire Interdisciplinary Research Center, San Jose State University, 801 Duncan Hall, One Washington Square, San Jose, CA 95192-0126, USA

^*

Author to whom correspondence should be addressed.

Fire 2026, 9(1), 26; https://doi.org/10.3390/fire9010026

Submission received: 2 November 2025 / Revised: 22 December 2025 / Accepted: 23 December 2025 / Published: 3 January 2026

(This article belongs to the Section Mathematical Modelling and Numerical Simulation of Combustion and Fire)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This paper proposes a recurrent neural network (RNN) model of dead 10 h fuel moisture content (FMC) for real-time forecasting. Weather inputs to the RNN are forecasts from the High-Resolution Rapid Refresh (HRRR), a numerical weather model. Geographic predictors include longitude, latitude, and elevation. Forecast accuracy is estimated in a study that utilizes a spatiotemporal cross-validation scheme. The RNN is trained on HRRR forecasts and observed FMC from weather station sensors within the Rocky Mountain region in 2023, then used to forecast FMC at new locations for all of 2024. The model is evaluated using a 48 h forecast window. The forecasts are compared to observed data from FMC sensors that were not included in training. The accuracy of the RNN is compared to several common baseline methods, including a physics-based ordinary differential equation, an XGBoost machine learning model, and hourly climatology. The RNN shows substantial forecasting accuracy improvements over the baseline methods.

Keywords:

fuel moisture content (FMC); machine learning (ML); recurrent neural networks (RNNs); Long Short-term Memory (LSTM); cross-validation (CV); root mean squared error (RMSE); backpropagation through time (BPTT)

1. Introduction

1.1. Background

Wildfires impact human health and safety in many ways. In addition to the direct damage inflicted by the blazes, wildfires emit toxic gases that harm air quality [1]. Wildfires also contribute to erosion and flooding and lead to harmful deposits in waterways, thus impairing water quality [2]. There has been a substantial increase in the scope and severity of wildfires in the Western United States since the early 1980s [3]. More regions are at risk for wildfires, and the total area burned has grown [4]. In Colorado, the three largest fires in the state’s history were all in 2020 [5]. The most destructive fire in Colorado’s history was the Marshall Fire in December 2021, which spread rapidly in dry fine fuels [6].

Fuel moisture content (FMC) is a measure of water content in burnable fuels (e.g., [7]). Dry fuels burn more quickly, and wetter fuels burn more slowly or may even completely stifle the spread of wildfire. FMC is a key input to wildfire models and is used to identify dry areas prone to wildfire spread for the purposes of issuing warnings and directing mitigation efforts. FMC is defined as the ratio of the weight of a fuel to the dry weight of the fuel or the total mass of burnable woody material when all moisture has been removed.

FMC (%) = \frac{Fuel Weight - Dry Weight}{Dry Weight} \cdot 100 %

(1)

Traditionally, FMC was measured by manually collecting samples, weighing their initial wet weight, drying the samples out in ovens, measuring the resulting dry weight, and then calculating FMC with Equation (1). Manual sampling is still used today, particularly for living plant material, but this process is slow and labor-intensive. Today, automated sensors provide real-time observations of FMC from across the continental United States (CONUS) [8].

Dead fuels are of particular interest to wildfire researchers, since dead plant material tends to be much drier and burn more readily than living plant material. The moisture content of dead plant material changes quickly relative to living plant material in response to changes in atmospheric conditions. Live fuel moisture content (LFMC) measures the water content of living plant material. Many of the physical drivers of LFMC are the same as those for dead FMC, such as local weather conditions, but there are additional factors to consider, and different data and modeling techniques are required. LFMC tends to be much higher than dead FMC (e.g., [9]), and there are different relevant timescales and spatial dynamics (e.g., [9,10]).

Dead fuels can be a variety of sizes and compositions, but researchers have defined idealized fuel types to make modeling tractable. Dead fuels are divided into classes based on how quickly the material is expected to change in response to changes in atmospheric conditions. The fuel classes are defined by a threshold lag-time or the expected number of hours needed for a fuel to change its moisture content by

1 - e^{- 1}

(≈63%) in response to an atmospheric change (e.g., [11], p. 216). These include 1 h, 10 h, 100 h, and 1000 h fuels, where “h” is “hour” [7]. Remote Automated Weather Stations (RAWS) provide real-time data from standardized sensors for 10 h dead FMC [8]. RAWS are ground-based weather stations with scientific instruments for acquiring weather data, operated by the National Interagency Fire Center [8] and other public and private organizations [12]. Many RAWS have an FMC sensor, which consists of a 1/2-inch pine dowel with a probe that measures moisture content in the woody interior [13,14]. Weather stations with FMC sensors are maintained by many different private and public organizations, and the data is aggregated and provided by Synoptic Data [12]. We will use the term RAWS to refer to any weather station in the Synoptic provider network with available FMC data.

This study will focus on building models to forecast 10 h dead FMC. The term “FMC” in this paper will from now on refer to dead 10 h FMC. Figure 1 shows a sample time series of FMC observations from RAWS “CHAC2”, located in southwest Colorado. The observed FMC displays a cyclical pattern with a period of 24 h, indicative of diurnal cycles. The FMC increases and decreases following corresponding changes to the equilibrium moisture content values, with a time delay characteristic of hysteresis. Additionally, rain leads to dramatic increases in equilibria as well as the observed FMC. The acquisition of weather variables is described in Section 2.1.2.

1.2. Literature Review

Mathematical models of FMC are used to make predictions at unobserved locations in the future. Wildfire simulations can be initialized with forecasts from FMC models (e.g., [15]). FMC forecasts can also be used to develop fire weather risk warnings. Forecasts of FMC have been used within fire danger rating systems and wildfire spread models for decades (e.g., [16]). The Canadian Forest Fire Danger Rating System estimated fine fuel moisture content using daily weather observations of rainfall and equilibrium moisture content, along with the previous day’s moisture estimate [17]. Physics-based methods attempt to directly represent the physical processes of energy and moisture transfer, and such models have been the dominant way to predict FMC in publicly available wildfire modeling projects. Nelson Jr. [18] developed a set of physical equations to model FMC as part of the National Fire Danger Rating System. The model represents a stick as a one-dimensional system, where moisture and energy exchange act radially towards the center of the stick, with various boundary conditions accounting for the exchange of moisture and energy between the stick surface and the environment. The model receives daily weather inputs of air temperature, RH, accumulated rainfall, and downward shortwave solar radiation. Carlson et al. [19] conducted a field study to calibrate the Nelson model for use within the National Fire Danger Rating System (NFDRS). Van der Kamp et al. [20] modeled a stick as two homogeneous layers. The outer layer of the stick exchanges thermal energy and moisture with the environment, and the inner layer exchanges thermal energy and moisture with the outer layer. They added other physical considerations of the system, such as the orientation of the stick to the position of the sun, and added wind speed as input. Both physics-based models were calibrated to manual field observations of FMC. The FMC measurements were made on standardized pine dowels, the same as those used for the real-time measurements of the FMC from RAWS [19].

Mandel et al. [15] modeled a fuel element as homogeneous and used an ordinary differential equation (ODE) implementing a version of the classical time-lag model, in which the rate of change in FMC is proportional to its difference from the equilibrium moisture content. Vejmelka et al. [21] added data assimilation of RAWS to the model. Real-time measurements from RAWS were extended using spatial regression to grid nodes, then assimilated into a simple ODE time-lag model using the augmented Kalman Filter on each grid node. In this paper, we will refer to this model as the “ODE+KF” model. Details of the ODE+KF model will be presented in Section 2.5. The purpose of this project is to provide a standalone FMC spatiotemporal forecast and to potentially replace and enhance the ODE+KF model in the WRF-SFIRE system in the future.

In recent years, machine learning (ML) models of FMC built entirely from data have gained popularity. Fan and He [22] used an LSTM model to predict FMC with the output of the Van der Kamp et al. [20] physics-based model as input. The model was tested on the original data used by Van der Kamp et al. [20], which consisted of two physical locations in British Columbia over five months in 2014 with atmospheric input from ground-based stations. They concluded that the LSTM model was more accurate than the purely physics-based model, as well as several ML baselines. Kang et al. [23] used an LSTM model to predict the monthly average live FMC across California using daily high and low weather variables as input. Schreck et al. [24] tested various ML architectures using FMC observations from RAWS and weather inputs from the High-Resolution Rapid Refresh (HRRR) weather model and VIIRS satellite reflectance band data. They evaluated the precision of different ML models in a study that spanned several years of data across the continental United States (CONUS). The ML architectures they tested were “static”, in the sense that there is no explicit recurrence relationship between the FMC at past times and the FMC at future times. They concluded that “XGBoost” was the most accurate, and predictions from a version of this model are publicly available in a product offered by the National Center for Atmospheric Research (NCAR) [25]. This product provides historical reconstructions of FMC and near real-time predictions. The implementation of an XGBoost model will be used as a baseline method for comparison, with details described in Section 2.5. However, this is a custom implementation with a smaller set of predictors that is not the same as the NCAR product. Hou et al. [26] utilized a similar methodology where they compared a set of static FMC models, including but not limited to a random forest and XGBoost, for modeling fine FMC compared to field samples in China.

Live FMC has been treated by researchers as a separate modeling domain. However, many of the modern modeling techniques are applicable to both dead and live FMC. All the studies discussed below utilize remote sensing data for inputs to ML models of LFMC, and they compare the predictions to manually collected field samples. Rao et al. [27] construct a set of predictors with 15-day temporal resolution based on physical considerations and model LFMC using an LSTM model. Zhu et al. [28] and the follow-up study by Miller et al. [29] utilize one-dimensional temporal convolutions to predict LFMC point-wise across CONUS. Xie et al. [30] combine one-dimensional temporal convolutions with LSTM layers to model LFMC. Wang et al. [31] model LFMC in forested environments with an architecture that combines transformer layers with gated recurrent units, a variety of the recurrent neural network (RNN).

This paper builds on the work of Mandel et al. [32], who used an RNN to forecast FMC. In their study, the analytical solution of the ODE model was exactly reproduced within a single RNN cell by manually pretraining the weights of the network. Thus, the RNN was shown to be capable of directly encoding the solutions to a physics-based model of FMC.

The goal of this study is to develop an RNN model to forecast FMC at arbitrary locations in the United States from weather prediction data. The RNN is trained using atmospheric inputs from HRRR and FMC observations from RAWS. Its performance is evaluated against several baseline methods, including an XGBoost model, a climatology method, and the existing FMC model based on ordinary differential equations with the assimilation of RAWS FMC data [21,33] (see Section 2.5.1). Ultimately, the RNN is intended to replace this model. This model is intended to provide accurate predictions for what the readings of a 10 h FMC sensor will say in the future. FMC sensors are already used operationally, and predictions from this model can be used directly in place of the sensor readings. Connecting the sensor readings to the fuel moisture content of diverse real-world landscapes is a broader research topic. Obtaining an accurate model of what the sensor readings will say in the future is one part of this effort.

2. Materials and Methods

This paper focuses on the development of an RNN architecture and the estimation of forecast accuracy at unobserved locations. As a baseline of comparison for the RNN, we use a physics-based model, a nonrecurrent ML model, and a climatology method. In all these models, FMC is considered to be a single continuous value for the idealized 10 h fuel that corresponds to the sensors used by RAWS. We evaluate model accuracy by comparing the FMC forecast to observed values with a cross-validation procedure described in Section 2.6.

2.1. Data

2.1.1. FMC Observations from RAWS

RAWS provide real-time, hourly observations of dead 10 h FMC, which can be retrieved through the Synoptic Weather API via Python 3.12 software [34]. All RAWS FMC data in the study region from 2023 to 2024 were visually examined to identify suspect data from broken or faulty sensors. Automated filters were used to identify instances where the FMC is outside physically reasonable ranges of 1% to 90%. This is a conservative threshold compared to operational implementations of the Nelson model [18], which has a built-in maximum FMC level of 60% ([19], p. 205). In addition, automated filters were used to identify long stretches of data where the FMC is constant, which is indicative of a broken sensor. Missing observations were interpolated linearly if the stretch of missing data is 3 h or less. Any longer stretches of missing data were kept as missing with no further imputation.

2.1.2. Predictors of FMC

For the ML models used in this project, we select a set of predictors of FMC that include atmospheric weather data as well as the geographic features of the physical RAWS locations. The set of predictors was chosen based on the typical inputs to various physics-based models of FMC. The final set of predictors is shown in Table 1. Individual RAWS report spatial coordinates including longitude, latitude, and elevation above sea level in meters. These three spatial characteristics are used as predictors in the RNN and XGBoost models.

The weather variables were collected from the HRRR, a product developed by NOAA [35], using the Python software package Herbie [36]. The HRRR model provides real-time, hourly forecasts of many weather variables on a 3 km resolution grid across CONUS. All weather variables used in this project were extracted from HRRR version 4, which was introduced in December 2020 [35]. Every 6 h, the model provides forecasts 48 h into the future. The HRRR model can therefore be used within a real-time forecasting tool for FMC. Schreck et al. [24] used the HRRR in their FMC modeling project, and the HRRR is used to initialize simulations in the WRF-SFIRE project. We retrieve predictors at a given time from the 3rd HRRR forecast hour for the corresponding time 3 h in the past. The 3rd forecast hour is used for two reasons. First, certain modeled variables, such as wind, tend to be very smooth spatially at the zero-hour of the model, and these variables begin to develop a more realistic structure by the third hour. Second, rainfall is one of the key predictors of FMC, and the zero-hour of HRRR forecasts always has zero precipitation accumulated. We use rainfall in units of accumulated millimeters per hour, since ODE+KF uses rainfall data with these units. Also, there are ground-based rainfall capture sensors that update hourly on some RAWS, which can be used as a quality control check for HRRR model output for rain. Calibration efforts within the WRF-SFIRE project settled on using the 3rd HRRR forecast hour, when available, instead of the zero-hour model. The HRRR weather variables at a given time are interpolated linearly to the location of the RAWS and joined to the FMC data by location and time.

The air temperature and RH at 2 m height from the HRRR are used to calculate wetting and drying equilibrium moisture content (

E_{d}

and

E_{w}

), in units of percent FMC (or unitless, depending on convention) ([16], Equations (4) and (5)). Downward shortwave solar radiative flux (DSWRF) at the surface is collected from the HRRR, in units W m⁻². DSWRF is a type of solar radiation used by several physics-based models of FMC, including Nelson Jr. [18] and Van der Kamp et al. [20]. Some RAWS have pyranometers which measure DSWRF and can be used as a quality control check. The rain in terms of mm h⁻¹ is calculated from the HRRR by subtracting the accumulated precipitation at the 3rd HRRR forecast hour with the accumulated precipitation from the 2nd forecast hour. Some RAWS have rain capture buckets that provide observations of hourly accumulated rainfall, which be used as a quality control check for the hourly rain forecasts. Finally, wind speed at 10 m is collected from the HRRR, in units of ms⁻¹. Some RAWS also have wind sensors positioned at a sampling height of 20 feet, which can be used for the validation of the HRRR wind speed data.

Lastly, we utilize two predictors derived from the time of the observations. The hour of the day (0–23 UTC) and the Julian day of the year (0–365 or 366 for leap years) are used as predictors in the ML models. The hour of the day is used to help the ML models capture diurnal cycles in FMC. FMC is the highest at night, just after when humidity is the highest and temperature is the lowest. The relative position of the sun is a physical driver of the process, so the hour of the day is used as a proxy for the sun position. The day of the year is used to help the ML models capture seasonal effects.

All predictor data was scaled before being input into the ML models. Scaling inputs can lead to improved performance for neural networks. Standard scaling was used, which consists of subtracting the sample mean and dividing by the sample standard deviation for each predictor variable. The mean and standard deviation used for scaling were calculated only on training sets and then applied to testing data. See Section 2.6 for more information on the spatiotemporal cross-validation method used in this paper. The response variable of FMC was left unscaled.

2.1.3. Data Collection Summary

We collected all available FMC data for 2023 through 2024 within the Rocky Mountain Geographic Area Coordination Center (GACC), a geographically diverse region in the Central United States. The spatial bounding box for this region is from latitude 37 N to latitude 46 N and from longitude −111 W to longitude −95 W. This region includes parts of Colorado, Wyoming, Nebraska, Kansas, Montana, and South Dakota. There were 161 total RAWS with valid FMC data in this period and spatial domain and 151 total RAWS with valid data in the forecasting period of 2024. Figure 2 shows the location of the RAWS within the Rocky Mountain GACC with valid FMC data from 2023 to 2024. There is a greater density of RAWS around the heavily populated area of Denver, Colorado, as well as in the Black Hills National Forest near Rapid City, South Dakota. The Rocky Mountain region is particularly prone to wildfire danger, and numerous federal, state, and local agencies are involved with wildfire management. The atmospheric weather predictors were chosen based on physical considerations and inputs to other models of FMC in the scientific literature. A complete list of all data variables used in this study is provided in Table 1.

2.2. Recurrent Neural Networks

Recurrent neural networks (RNNs) are a class of neural network architectures that are specific to processing sequence data, where predictors and outcomes are ordered in time. An RNN cell is a basic processing unit where the output at a given time is a function of the input and the output of the cell at previous times ([37], pp. 498–501). The simple RNN cell takes as input the set of predictors at time t, denoted as

X_{t}

, as well as the output of the recurrent connection at the previous time step, which we refer to in this paper as the “recurrent state”. The right-hand side of Figure 3 shows the computational flow in a single recurrent unit. The input

X_{t}

and the recurrent state

h_{t - 1}

are combined in a linear combination with fixed weights

W_{x}

,

W_{h}

, and b. This linear combination is modified by an activation function, denoted as

σ

, to produce the cell output, which in the simple RNN cell is stored as the new hidden state.

h_{t} = σ (W_{x} X_{t} + W_{h} h_{t - 1} + b)

(2)

The weight vector

W_{x}

is the same length as

X_{t}

, which represents input. The weight

W_{h}

and the bias b are scalar values. For the simple RNN cell, the output is the same as the new recurrent state.

Figure 4 shows the recurrent unit “unrolled” in time, where the same computational unit is applied recursively. Over time, the recurrent state is modified, but the same set of weights and biases is used at each time step.

Multiple recurrent cells are typically stacked into a “recurrent layer”. One or more recurrent layers are used in conjunction with other neural network layers, such as the standard fully connected “dense” layer. We will refer to an “RNN” as any neural network architecture with one or more recurrent layer.

A popular type of recurrent model architecture is Long Short-Term Memory (LSTM) (Figure 5), which was originally developed to address certain issues with the simple RNN cell [38]. Like the simple RNN cell, the LSTM cell maintains a hidden state, which is the output of the cell at the previous time step. In addition to the hidden state, an LSTM cell maintains a “cell state”, denoted as

c_{t}

, which incorporates information about the output of the cell over more time steps in the past than just the previous time step. At time t, the hidden state

h_{t - 1}

and the predictors

X_{t}

are used to modify the cell state through a series of computational gates. The resulting cell state

c_{t}

can be viewed as a compromise between the old information as represented by the cell state

c_{t - 1}

and the new information as represented by

h_{t - 1}

and

X_{t}

. The final output of the cell at time t is a combination of

h_{t - 1}

,

X_{t}

, and

c_{t}

. This is stored as the new hidden state at time t or

h_{t}

. LSTM cells are typically stacked into an LSTM layer that is combined with other neural network layers to form a more complicated model architecture. We will refer to an “LSTM” network as any neural network that contains one or more LSTM layers. The computational graph in Figure 4 still applies to the LSTM cell, with the modification that the state passed recursively in time to the unit consists of both the hidden state and the cell state, and the output of the unit is the hidden state. For the LSTM cell, the combination of the hidden state and the cell state is the recurrent state.

In this paper, an RNN is trained to map time series of weather variables to time series of FMC, processing one time step at a time. The weather inputs at time t not only result in a predicted FMC at the same time t but also modify the recurrent state and thus affect the predicted FMC at future times

t + 1, t + 2, \dots

. Since numerical weather forecasts are used as inputs, it is assumed that the weather is known in the future.

Neural networks contain many parameters that are fitted using a training set with an attempt to minimize a loss function of the model when compared to the observed data. With RNNs, the loss can be computed over time, so the model can learn to minimize the prediction error over a forecast window. During training, a fixed number of time steps, called the sequence length, is used to calculate the loss.

We use the mean squared error (MSE) as a loss function, which is standard in machine learning for a continuous response. Suppose that an input sequence is of length T. The predicted FMC at time t is denoted as

{\hat{y}}_{t}

, and the observed FMC at time t is

y_{t}

. The loss for a single sequence is

MSE = \frac{1}{T} \sum_{t = 1}^{T} {(y_{t} - {\hat{y}}_{t})}^{2},

(3)

Loss is calculated by comparing the predicted time series of FMC to the observed values (Figure 6).

2.3. RNN Training, Prediction, and Tuning

We trained a recurrent neural network (RNN) with gradient descent using the Adam optimizer. Inputs combine time-varying meteorological variables with static spatial features (longitude, latitude, elevation). The target variable is the 10 h dead fuel moisture content (FMC) measured at Remote Automated Weather Station (RAWS) sites.

Training data are arranged as a tensor of shape (batch size, sequence length, features). Here, features is the number of unique inputs. Following (Zhang et al. [39], Section 9.7), we use truncated backpropagation through time (BPTT). A single sample consists of a tensor of shape (sequence length, features). At the start of processing a sample, the recurrent state is reset. At each time step of the sample, a one-dimensional tensor of features is input to the model, the recurrent state is updated, and a single output is generated. This process is repeated for the entire sample. The output of the network from an entire sample is a time series of FMC predictions of the same sequence length as the input sample, which requires setting the function argument return_sequences to true [40]. The model loss for the sample is the MSE, as seen in Equation (3), averaged over the entire sequence length. The gradient of the loss with respect to all model parameters is then calculated via BPTT. Gradients are calculated for each sample sequence in a batch and then averaged across samples [41]. Model parameters are updated after each batch. A single epoch of training is completed when all batches of samples have been processed. Early stopping halts training when validation performance no longer improves.

Static predictors are repeated at each time step. Batches include samples from multiple locations to promote generalization and learn the dependence on location. The model parameters are updated after each batch.

During training, the effective temporal depth of optimization is limited by the sequence length and batch size. Consequently, truncated BPTT is applied only within this finite window, ensuring computational stability ([39], Section 9.7.1.2). In contrast, during prediction, the model operates with an unconstrained sequence length and a batch size equal to the unique number of locations for prediction, receiving new weather inputs sequentially and updating its recurrent state indefinitely. The prediction process is therefore equivalent to unrolling the RNN for an arbitrary number of time steps without truncation, allowing the model to integrate meteorological history over potentially long periods. The same network parameters, trained within the finite truncated BPTT horizon, are reused for recursive forecasting by copying the learned weights from the training configuration to the stateful inference model ([37], p. 512).

Neural networks have many fixed hyperparameters, such as the number of cells to use within a layer, the overall number of layers, the batch size, the learning rate, etc. It is important to keep the data used to tune hyperparameters separate from the data used to estimate forecast accuracy. The metrics calculated on forecasts should represent the accuracy of the model when predicting entirely new data. The hyperparameter tuning used in this project will be described in Section 2.6.

2.4. Recurrent Neural Network for Forecasting FMC

In this paper, we use the following RNN model architecture. The input is a collection of features consisting of weather variables and the geographic locations of RAWS. These inputs are fed into a single LSTM layer consisting of 64 cells with a hyperbolic tangent activation function. The LSTM layer outputs a sequence of the same length as the input data, and the entire sequence is passed through the network until the output layer. The output of the LSTM layer is fed into two dense, fully connected layers with Rectified Linear Unit (ReLU) activation and 32 and 16 cells. The LSTM layer and the dense layers process each time step in the sequence independently. Finally, the output of the model is a single neuron with linear activation. Linear activation is chosen for the final output layer, since we are mapping the inputs to a continuous real number. A single neuron is used as an output layer since model predictions are generated by deploying the model point-wise at each location, and thus the FMC at a given time is considered to be a single scalar value. The final model output is a two-dimensional vector, where the first dimension is the number of unique locations, and the second is the number of time steps (48 h in our case). Other model architectures can be used to generate outputs that correspond to a spatial grid, but we chose this architecture since it is the easiest to implement and is the most flexible when predicting across various spatial domains. The left side of Figure 3 shows a diagram of the architecture of the RNN model used in this study. Figure 3 shows the core model architecture. Additional hyperparameters for the model used in this paper are shown in Table 2.

The HRRR model is used to provide weather input to the RNN. The HRRR provides hourly forecasts up to a maximum of 48 h. The sequence length used to structure the data input (batch size, sequence length, features), described in Section 2.2, is set to 48. Each input to the RNN consists of a 48 h sequence of observations, and every output of the model is a 48 h sequence of FMC predictions at the same times as the input. The batch size is a tuned hyperparameter, and the number of features was determined through the theoretical considerations described in Section 2.1.2.

In this study, we evaluate the forecast accuracy over all of 2024 broken into 48 h periods to align with the sequence length of training inputs. The initial hidden state and cell state of the LSTM layers are zero by default. Starting at 00:00 UTC on 1 January 2024, we input 48 h of HRRR weather data and geographic input at a given set of test locations to generate the first set of forecasts. Then, for the next 48 h, the initial hidden state and the cell state of the LSTM layer are reset. This process is repeated until all forecasts are generated in 2024. The initial recurrent state of the LSTM layer is reset for each 48 h forecast period, and no information is retained across distinct forecasting periods. Resetting the recurrent state in this way is conducted for the purposes of estimating the forecast accuracy of the model 48 h into the future. Resetting the recurrent state is not required for this model architecture, and an operational version of the model could retain the recurrent state for arbitrarily long.

The hyperparameters for the RNN were selected with a restricted grid search, which evaluates a subset of all possible hyperparameter combinations. The combination of hyperparameters that results in the most accurate predictions on the validation set is chosen. Table 2 lists the hyperparameters used for the RNN model. All hyperparameters listed were chosen via restricted grid search, except for the activation functions. The LSTM activation function is a hyperbolic tangent, the activation function for the internal dense layers is ReLU, and the activation function for the output is linear. The first two of these choices are viewed as sensible defaults, and the output layer has linear activation since it maps the inputs to a continuous real number. Any hyperparameters not listed in Table 2 were set to their default values from the TensorFlow software project, which are generally accepted in the literature as reasonable defaults. The Adam optimizer was used for the gradient descent procedure. Additionally, the set of features was chosen based on theoretical considerations and was not subject to hyperparameter tuning, but a sensitivity analysis for the set of predictors used will be presented in Section 3. The hyperparameter tuning method is discussed in Section 2.6, and Appendix A has additional technical details.

2.5. Baseline Methods

2.5.1. ODE Model with Kalman Filter in WRF-SFIRE

The ODE+KF method extends the RAWS measurements spatially by regression to grid nodes; then it runs a first-order time-lag ODE [15] with data assimilation via an augmented Kalman Filter [33] on every grid node independently. The model produces hourly estimates of the 1, 10, and 100 h FMC, which are analogous to the hidden state in ML, and an estimate of FMC equilibrium bias, which plays a role analogous to an ML parameter and converges over time.

In deployment [42], WRF-SFIRE receives the initial 1, 10, and 100 h FMC states and the equilibrium correction values, interpolated spatially by the WRF preprocessing system WPS to the simulation grid, and it runs its own implementation of the ODE model of the FMC inside WRF-SFIRE, with fire-modified weather inputs from the atmospheric model WRF in WRF-SFIRE. Model calibration was performed using wildfire simulations from WRF-SFIRE, comparison to the fine fuel moisture model from the Canadian fire danger rating system [16], and comparison to RAWS in Colorado and the Pacific Northwest.

In this paper, a spin-up period of 24 h is used for the equilibrium bias correction to stabilize. The spin-up period is the number of time steps where data assimilation via the Kalman Filter occurs. After 24 h of spin-up, the model is run in forecast mode for 48 h. In forecast mode, the model receives weather inputs but no input from observed FMC. Since we are interested in a comparison at the locations of the RAWS in the test set, we run ODE+KF at those RAWS locations and assimilate the RAWS FMC data directly. Table 3 shows a summary of the hyperparameters used within ODE+KF.

2.5.2. The XGBoost Static Model

The ML model used as a baseline is an implementation of the Extreme Gradient Boosting (XGBoost) technique [43], which has been used by FMC researchers in recent years [24]. XGBoost uses ensembles of regression trees that are iteratively fit and re-weighted. XGBoost produces a nonlinear model which is a weighted average of regression trees, resulting in a piecewise-constant mapping of the predictors to the output. Our implementation of XGBoost is “static” or nonrecurrent, meaning that the model is learning a map between the predictors and the observed FMC at a moment of time and incorporating no information about the FMC at previous times. Thus, the instantaneous relationship between the predictors and the response is learned, and observations of FMC are implicitly considered independent in time. Time-based predictors are utilized in our implementation of the model, such as the hour of the day and the day of the year, but the model is still nonrecurrent since the output at a given time step is not a function of the output at the previous time step. Hyperparameters for the XGBoost model were based on previous work by Hirschi [44], where various static ML models were tested over 2023 in the Rocky Mountain region. Table 4 shows a summary of the hyperparameters used within the XGBoost model.

2.5.3. Climatology Method

Lastly, we utilize a climatology method. In weather forecasting, climatology refers to simple statistical methods in which historical weather patterns are used to predict new values of a particular variable. We utilize a climatology method for FMC that is based on a method used by Schreck et al. [24]. Given a particular time and location, the hour of the day (0–23 UTC) and the Julian day of the year (0–365 or 366) are extracted. The FMC forecast for that hour and location is the historical average of the FMC at that exact location and the same hour of the day, for the days of the year within 15 days of the target time. In this study, we build a climatology by looking back 10 years for each forecast time and location. If there are historical observations from less than 6 unique years at a particular location for a particular time, the FMC forecast was set to missing. These configurations for the climatology method will be referred to as hyperparameters for consistency with the other methods. Table 5 shows a summary of the climatology hyperparameters used in this study.

2.6. Analysis Design

We seek to forecast FMC at arbitrary locations and at future times. The model maps a time series of weather forecasts to a time series of spatial FMC forecasts based on FMC and weather data that were presented to it during training. In a sense, the model interpolates in space and extrapolates the weather-to-FMC mapping in time. We will compare the models using the root mean squared error (RMSE) at new locations and at times in the future of the times used for training. The RMSE for the FMC models is interpretable in units of percent. In all reported RMSE values in this paper, the square root is applied after any averaging operations; that is, the square root is always the final operation.

Cross-validation methods estimate the accuracy with which a model predicts unseen data ([45], Section 7.10). In machine learning, samples are randomly drawn to form independent training, validation, and test sets ([45], Section 7.2). The training set is used to fit the model, the validation set to tune hyperparameters and monitor overfitting, and the test set, kept separate, to estimate final predictive accuracy. In time-dependent problems, the test data should come from a period after the training data to reflect the real forecast conditions. This requirement conflicts with the common assumption that training and test samples are independent and identically distributed because temporal ordering introduces dependence. The evaluation of time series models must therefore balance statistical independence with causal realism. In spatially dependent problems, the test data needs to be from locations that were not used in training model parameters. If a model is used to predict at locations included in its training data, it has already learned aspects of the data structure at those locations, leading to overly optimistic accuracy estimates [24].

To estimate the forecast error of FMC models at unobserved locations, we apply a spatiotemporal cross-validation procedure. In the hyperparameter tuning step, FMC observations from 2022 are used to train a set of candidate model architectures. A random sample of 90% RAWS is used to generate the training locations and 10% to generate the testing locations. Then, the forecast accuracy of those models is compared at the testing locations over all of 2023. During the forecast period, HRRR inputs are used to generate predictions, but no FMC data from the testing time period or testing locations are used to inform predictions. The model architecture with the lowest RMSE when forecasting in 2023 was selected.

With the model architecture fixed, the model is trained from scratch on observations in 2023 with a random sample of testing locations and training locations. The trained model is used to forecast at locations not used in the training over all of 2024 broken into 48 h forecast windows. The final accuracy metrics for the model are calculated by comparing the forecasts from 2024 to FMC observations at the testing locations. The training period always precedes the forecast period in time, and the random sampling of locations accounts for spatial uncertainty. A schematic diagram of this cross-validation method is shown in Figure 7, with the training period of 2023 and the forecast period of 2024.

ODE+KF and the climatology methods do not fit directly into the training and testing paradigm used in ML. ODE+KF is run in forecast mode (i.e., without data assimilation) over all of 2024 at the test locations in 48 h increments to mimic the initialization with a spin-up period. ODE+KF utilizes 24 h of data prior to the start of the forecast as a spin-up period (i.e., with data assimilation). Each iteration of 24 h spin-up plus 48 h forecasting was independent. The climatology method produces forecasts by retrieving historical data for each time in 2024 and each testing location. Therefore, the climatology method does not utilize spatial holdout like the ML models, so the accuracy metrics for the climatology method are more optimistic than those for the ML methods.

To account for the variability in the random spatial samples and randomness due to model initialization, the training in 2023 and forecasting in 2024 were repeated 500 times with different random seeds. So, 500 random samples of RAWS were used as training and test splits, and 500 different initial weights were used for the ML models. These will be referred to as replications from now on. The replications are used to construct uncertainty bounds on the final accuracy metrics. There were 151 RAWS with valid data in the forecast period of 2024, and each replication had 16 RAWS in the test set, and each RAWS was included in the test set on average 53.0 times in distinct replications. Additional sources of uncertainty come from the HRRR weather inputs, but the HRRR model only provides a single deterministic forecast, rather than an ensemble or a probabilistic forecast that could be easily incorporated into an estimate of uncertainty. To quantify the uncertainty from HRRR weather inputs, we would need an estimate of the uncertainty for each weather variable at each location in the study region and across the entire year. Currently, that would require making many strong assumptions about the model, since no such analysis exists to our knowledge. For these reasons, we do not account for the uncertainty in weather inputs in the error metrics presented in this paper, but any systematic errors or biases from the weather inputs are constitutive of the error metrics presented in this paper.

Within each replication, the forecast residuals are calculated by subtracting the predicted FMC from the observed FMC for each model and at each time. The residual shows whether the forecast was too high or too low, and it can be positive or negative. We then calculate the squared residual, which is always positive. Within a replication for a given model, squared residuals are calculated across a set of test set locations and a set of forecast times. The per-replication MSE is calculated by averaging the squared residuals over all test locations and test times for a given replication. Then, the overall RMSE is calculated by averaging all the per-replication RMSE values and taking the square root. Uncertainty bounds are calculated from the square root of one standard deviation of the per-replication MSE values. The overall model bias is calculated in the same way using the raw residuals rather than the squared residuals. This metric is used to analyze whether the forecasts are systematically too high or too low. The overall model RMSE and bias are calculated by averaging over all locations and times in the test set. Both the RMSE and bias are interpretable in units of percent FMC. Equation (5) shows the mathematical form of the RMSE estimate used to quantify forecast error across the entire study region and all times in 2024, using the mathematical definitions in Table 6. Equation (7) shows the mathematical form of the bias estimate. Equation (6) shows how the standard deviation of the RMSE is calculated, and the standard deviation of the bias estimate is calculated in an analogous way.

We calculate the per-replication MSE by averaging the squared error over all times and locations:

Per - Replication MSE \equiv {MSE}_{r} = \frac{1}{N \cdot T} \sum_{i = 1}^{N} \sum_{t = 1}^{T} {(y_{i, t} - {\hat{y}}_{r, i, t})}^{2}

(4)

The overall RMSE is calculated by averaging the per-replication RMSE over all replications:

Overall Mean RMSE \equiv \bar{MSE} = \sqrt{\frac{1}{R} \sum_{r = 1}^{R} {MSE}_{r}}

(5)

We estimate the uncertainty in the overall RMSE by calculating the square root of the standard deviation of the MSE across all replications:

Overall Std . of RMSE \equiv \sqrt{\frac{1}{R - 1} \sum_{r = 1}^{R} {[{MSE}_{r} - \bar{MSE}]}^{2}}

(6)

We calculate the bias of the models by averaging the error across all times and replications, and the associated standard deviation of the bias is calculated in an analogous way to the standard deviation of the MSE:

Overall Bias \equiv \bar{Bias} = \frac{1}{R \cdot N \cdot T} \sum_{r = 1}^{R} \sum_{i = 1}^{N} \sum_{t = 1}^{T} (y_{i, t} - {\hat{y}}_{r, i, t})

(7)

We further analyze the forecast error by location and by hour of the day. For the RMSE by hour of the day, model errors are averaged over location, replication number, and days of the year to produce a single RMSE estimate for each hour to analyze how the forecast error changes throughout the diurnal FMC cycle. Further, this analyzes how the accuracy changes as the forecast is run longer into the future. To analyze the spatial variability in the forecast error, the RMSE and bias are calculated by averaging over all times and replications in the test set, producing accuracy metrics for each RAWS. Again, the replications are used to construct uncertainty bounds. Within a single replication, we obtain an estimate of the forecast accuracy for 16 of the total RAWS. After many replications, we end up with estimated RMSE and bias, and uncertainty bounds, for every RAWS with data availability in 2024. The standard deviations of these RMSE estimates across replications are calculated in an analogous way to Equation (6).

3. Results

The accuracy of the RNN model and three baseline models was evaluated when predicting values at unobserved locations across all of 2024 in 48 h forecast windows. The models were evaluated on the same randomly selected test locations across 500 random samples of locations. Replications had different combinations of RAWS in the test set, which each have different amounts of data availability across the testing period of 2024. A single replication had 16 test RAWS that were not included in training and on average roughly 92,300 individual FMC predictions. In total, 46,165,248 individual FMC predictions were generated across all hours of 2024. Table 7 shows the RMSE when averaged over all times, locations, and replications. The bounds on the estimated metrics are plus or minus one standard deviation of the given metric across all replications. The uncertainty presented in Table 7 represents the uncertainty in the overall RMSE estimate when we vary things under our direct control, namely the RAWS used for training and testing and the random initial weights of the RNN. The uncertainty bounds represent the uncertainty of the overall RMSE estimate. The RNN achieved the lowest forecast error compared to the baseline methods as indicated by the RMSE. Further, the RMSE of the RNN is well below the uncertainty bounds of the other methods. Thus, there is strong statistical evidence that the RMSE of the RNN is lower than that of the other methods for 2024 in the Rocky Mountain region.

A bias of zero would indicate that a model is on average over-predicting FMC just as often as it is under-predicting it. Based on the bias metrics and the associated uncertainty, the RNN shows signs of being unbiased overall, since zero lies within the uncertainty bounds. We examine the RNN errors more closely by stratifying the FMC values into low (0–10%), medium (10–20%), and high (+20%). Table 8 shows the mean bias and standard deviation for these thresholds, calculated over all 500 replications. Figure 8 shows the distribution of observed and predicted FMC.

For low-moisture fuels, the RNN is slightly over-predicting, as indicated by the negative bias in Table 8. For high-moisture fuels, the RNN is slightly under-predicting, as indicated by a positive bias. The distributions in Figure 8 reflect this, showing that there is less density of predicted FMC values in the low and high regions. In practice, these types of model biases could lead to underestimating the wildfire rate of spread in very dry fuel conditions or overestimating the wildfire rate of spread in very wet fuel conditions.

Figure 9 shows the forecast error for individual RAWS. The RMSE is averaged over all times and replications to produce an estimated RMSE for each RAWS used in the forecasting period of 2024. On average, a RAWS was included in the test set in 53.0 distinct replications, so the metrics displayed in Figure 9 are averaged over roughly 53 different randomly selected training sets and initial RNN weights. The RMSE values plotted in white represent those close to the overall RMSE estimate of

3.02 %

, those in blue represent stations where the forecast error was lower than the overall error, and those in red represent stations where the forecast error was higher. The histogram on the right side of Figure 9 shows the distribution of the RMSE for stations with high elevation, defined as those with elevation greater than 1800 m, and those with low-elevation, defined as those with elevation less than or equal to 1800 m. The histograms show that the RNN is slightly more accurate and less variable at low elevation than at high elevation.

We evaluate the forecast error of the RNN and baseline models by the hour of the day (0–23 UTC). FMC varies throughout the day in a diurnal pattern, as shown in Figure 1, so it is important to understand how this affects model accuracy. Figure 10 shows the forecast RMSE averaged across all locations, replications, and days of the year. All models show a relative peak in the forecast RMSE from around 14:00 UTC, which corresponds to either 7:00 a.m. MST or 8:00 a.m. MDT. This is the time of the day when FMC tends to be the highest, since it is slightly after the lowest typical temperatures and highest typical RH of the day, and dew often forms directly on the surface of fuels at these times. In this study, the first hour of forecasting was fixed to be midnight 00:00 UTC, so we wish to quantify how the model error changes as the forecast is extended in time. ODE+KF shows signs of recursive error accumulation, where the modeling error grows as the forecast continues to be extended in time. The XGBoost and climatology methods are not recursive, so there is no accumulation of forecast error. The RNN calculates loss by comparing the predicted FMC to a 48 h stretch of observed FMC data, so the network is learning to minimize the error across this entire forecasting window. Thus, we do not observe recursive error accumulation in the RNN.

3.1. Predictor Sensitivity Analysis

To evaluate the contributions of the different predictors in the RNN model, we conduct a sensitivity analysis that removes different predictors and compares the effect on the forecast error. The full model used predictors from the HRRR weather model, including wetting equilibrium, drying equilibrium, solar radiation, wind speed, and rain. Geographic predictors from the physical location of the RAWS included elevation, longitude, and latitude. Finally, predictors from the time of observation were hour of the day and day of the year. See Table 1 for more information on the individual predictors. We rerun the forecast analysis described in Section 2.6 using different combinations of these predictors. All other model configurations and hyperparameters are unchanged. It should be noted that these predictors are all related to each other. Weather conditions vary in space and time, and the weather variables themselves are related to each other. The purpose of the sensitivity analysis in this section is to give a general indication of which predictors are driving RNN model performance and also to evaluate whether there is any sign of overfitting where the model accuracy improves if any predictors are removed.

First, we evaluate the change in forecast error when groups of predictors are excluded from the model, where the groups are the weather variables, the geographic variables, and the temporal variables. This leads to three different model configurations. Table 9 shows the results of this sensitivity analysis. The relative difference is calculated by dividing the difference by the overall RMSE from the full model of

3.02 %

. Removing geographic predictors and leaving the weather and temporal predictors led to a

1.32 %

increase in the overall model error, and this was still within one standard deviation of the full model-estimated RMSE of

3.02 %

. Removing temporal predictors and leaving weather and geographic predictors led to a

5.96 %

increase in the RMSE, still within one standard deviation of the full model-estimated RMSE. Removing the weather and leaving temporal and geographic predictors resulted in the worst performance with a

77.81 %

increase in the estimated RMSE, and this was the only case where the estimated RMSE was outside the uncertainty bounds given by the full model.

Next, we conduct a sensitivity analysis that removes specific predictors. For this analysis, we group together the wetting and drying equilibrium predictors, since both are constructed from temperature and relative humidity. Additionally, we group together longitude and latitude. Table 10 shows the result of this sensitivity analysis. Removing each predictor led to an increase in the overall RMSE, and only removing the wetting and drying equilibria resulted in an RMSE that was outside the one standard deviation uncertainty bound given by the full model.

From the sensitivity analysis, we first conclude that each configuration that removed predictors resulted in worse model performance, though some of these changes were not extreme relative to the uncertainty bounds given by the full model. The set of predictors for the RNN was chosen theoretically, and the sensitivity analysis shows that there is no sign of overfitting with this set of predictors, since removing any of the predictors reduced model accuracy. Second, it is clear that the wetting and drying equilibria are the main drivers of RNN model accuracy. Finally, the small relative change in accuracy when removing rain as a predictor shows that the full model does not perform as well as we would hope in rainy conditions. This is reinforced by the model accuracy shown by FMC level in Table 8 and further by a case study that will be presented in Section 3.2. The wetting and drying equilibria are highly related to rain, and both increase before and during rain events. So some of the relationship between rain and FMC is likely being captured by the wetting and drying equilibria, but this is still a source of error in the model that could be improved upon.

3.2. Case Study—The Alexander Mountain Fire, Colorado, Summer 2024

To evaluate how the model performs during periods of extreme fire–weather, we present a case study based on the Alexander Mountain Fire, which burned roughly 9668 acres in the Canyon Lakes Ranger District, a forested region near Fort Collins, Colorado, from 28 July to 17 August 2024 [46]. This fire was selected since it was the largest wildfire in Colorado in 2024 [47]. The goal was to analyze the FMC predictions from the RNN during the pre-fire period, so we evaluated the FMC forecast accuracy 72 h prior to 28 July 2024, midnight Mountain time. During the actual time a fire is spreading, the fire modifies the weather, so the physically realistic weather inputs should come from a coupled wildfire–atmospheric model. This is outside the scope of this paper, so we chose to focus only on the pre-fire period. Based on the wildfire perimeter from NIFC, there were four nearby RAWS with valid FMC observations during the relevant time period, as shown in Figure 11. Geographic info for the four RAWS used in this case study is shown in Table 11.

For each of the RAWS within the case study region shown in Table 11, all RNN forecasts across all replications were extracted from 25 July midnight to 27 July 23:00 (Mountain time). These are subsets of the same forecasts from the Rocky Mountain region in 2024, filtered in space and time to the case study region and time period. Two of the RAWS, BEYC2 and ESPC2, had valid FMC observations for the entire 72 h pre-fire period. The other two RAWS, RFRC2 and TT789, had periods of missing or invalid FMC observations and thus only had 30 h of FMC observations leading up to the fire. Both stations had valid FMC observations for the 30 h immediately preceding the estimated fire start time of 28 July 2024, midnight Mountain time.

The RMSE of the RNN forecasts was calculated for each replication for the hours with valid FMC observations. To reiterate, these forecasts were generated using training data from 2023 without using the test station in the training set. For the forecast analysis presented in this paper, the hidden state was reset every 48 h to account for the error that is introduced during hidden state initialization. In this case study, we ignore this and line up the forecasts in time, since the hidden state resetting is only reasonably expected to increase the forecast error for a given time since it makes the RNN forget the past trajectory of FMC. For this case study, we present the median RMSE across replications along with the high and low bounds from the minimum and maximum RMSEs across replications. We choose to present the median along with the minimum and maximum to highlight that the case study considers individual realizations from the RNN forecasts, and the metrics are not being aggregated over space or over an entire year. Further, we aggregate the RMSE over all time periods when the FMC is dry, 0% to 10% FMC, to give an indication of how the RNN forecasts perform during periods of extreme fire danger. In contrast to the overall forecast RMSE presented in Table 7, the metrics in Table 12 represent a specific prediction error for a targeted location and time. The distributions of RNN forecast error across replications are shown in Figure 12.

The RNN forecast accuracy during the case study period was representative of the overall RMSE of 3.02%; two of the stations had median RMSE estimates below the overall estimate, and two were above it. In all cases, the overall RMSE estimate of 3.02% was within the high and low bounds at the RAWS in the study region. The RNN was relatively more accurate when FMC was below 10%. Figure 13 show the forecasted and observed FMC at RAWS BEYC2, along with the wind and rain from the HRRR at the same times. During the very dry period, where FMC was less than 10%, the RNN slightly over-predicted the FMC. Towards the end of the case study time period, a rain event led to an increase in FMC, and the RNN model under-predicted the FMC during this period of wetting. For the two RAWS with only 30 h of valid FMC observations in the pre-fire period, RFRC2 and TT789, the higher estimated RMSE for these stations is likely due to the fact that the rainy period represents a larger proportion of the forecasts for the case study period at these locations. We conclude from this case study that the RNN is capable of providing reliable indications of pre-fire danger.

4. Discussion

In this study, we compared an RNN model of dead 10 h fuel moisture content with weather inputs from the HRRR to several baseline methods. The accuracy of the models was compared when forecasting 48 h into the future and at unobserved locations for all of 2024 in the Rocky Mountain region. The model is the first continent-wide ML model of dead FMC that is compatible with real-time forecasting. The rigorous spatiotemporal cross-validation approach to estimating forecast accuracy is the first large-scale method of its kind for FMC modeling, either alive or dead. The case study of Alexander Mountain in Colorado in 2024 shows that the RNN forecasts can provide reliable indications of pre-fire danger. The RNN was the most accurate model of FMC compared to the baseline methods, namely a physics-based model, a non-recursive ML model, and a climatology method, as shown in Section 3. This model should be interpreted as forecasting what an FMC sensor reading will say in the future. We do not attempt to correct for systematic errors between the FMC sensors and gravimetric FMC measurements. FMC sensors are already in operational use around the world, and the forecasts from the RNN are intended to be used in place of those measurements at future times.

The overall estimated RMSE for the model was

3.02 %

. The case study presented in Section 3.2 showed that this overall forecast accuracy is representative of the granular forecast error at locations and times that represented extreme fire danger.

Forecasts from the RNN could be used within wildfire simulations that need to be run for multiple days. Additionally, forecasts from the RNN could be used to inform wildfire risk indices that provide warnings for areas prone to rapid wildfire spread. The RNN model is ready to be incorporated into the WRF-SFIRE modeling project, with coding implementation being the obstacle to that goal.

The RNN architecture used in this study was based on a restricted hyperparameter search scheme in the Rocky Mountain region. Spatial features were used as input to the RNN in this study, including elevation and latitude and longitude coordinates. In other geographic regions or on a national scale, the choice of hyperparameters may be different. This will be evaluated in the future. The model is trained and deployed point-wise. This approach provides several practical benefits. First, it allows for training with samples from a sparse grid of stations that changes over time as new stations are added and some are taken down for maintenance. Second, it allows for evaluating the model on spatial grids with different resolutions, which is a future goal of research to provide FMC forecasts on a much finer spatial resolution used in an active fire simulation with WRF-SFIRE. The point-wise model can easily be evaluated on any spatial grid with the right inputs, so it can generate predictions on a grid with much finer spatial resolution than that used for training. The same sources of error that exist on the continent-wide forecasts for the point-wise model exist at the local scale, namely error and uncertainty in weather inputs, microclimate effects, and local topographical conditions.

Future work will consider adding convolutional layers in combination with LSTM layers. Additionally, attention layers and transformers are alternatives to RNNs which have grown in popularity in recent years. We conducted minimal exploration with these methods and concluded that RNNs were preferable. A full comparison of those model architectures should be considered in future work, but there are theoretical reasons why the point-wise LSTM model was used without these additional architectures.

Standard convolutional approaches do not allow for missing data and assume a uniform input grid. This does not align with the data input structure from the sparse grid of RAWS. One of the main motivations for the transformer architecture was to account for very long-term dependencies that RNNs struggle to effectively model. These types of long-term dependencies are particularly relevant to natural language processing, where context and logical coherence need to be maintained over potentially thousands of tokens in time. We hypothesize that very long-term dependencies may be useful for modeling LFMC, where a period of drought or heavy rain could affect plant physiology long into the future.

Future work will apply RNNs to other dead fuel classes, including 1 h and 100 h fuels. This modeling project only considered 10 h dead FMC, since the vast majority of available data is for this fuel class. The RNN modeling approach in this paper could be adapted to model dead fuels of other sizes. Additionally, the RNN considered in this study will be tested using custom loss functions that attempt to account for the nonlinear relationship between FMC and the wildfire rate of spread [44]. Future research will examine additional predictors of FMC within the RNN, including satellite data products such as VIIRS reflectance bands and CYGNSS soil moisture. Finally, the RNN will be integrated into the WRF-SFIRE modeling project so that the FMC model can be tested on coupled wildfire simulations.

5. Conclusions

This paper proposes an RNN model for forecasting dead 10 h FMC with inputs from numerical weather models. The model utilizes inputs from the HRRR numerical weather model, along with the geographic inputs of elevation, latitude, and longitude and inputs derived from the time of observation. The model is compatible with real-time forecasting at arbitrary locations across CONUS. The forecast accuracy of the RNN model was evaluated by training in year 2023 on a training set of locations and forecasting in 48 h periods across year 2024 on a test set of locations not included in training, then repeating this process 500 times to account for the uncertainty from the training and testing splits as well as the uncertainty from the random weight initialization in the RNN. The estimated forecast accuracy was

3.02 \pm 1.12

, which was substantially more accurate than the baseline methods of a physics-based ODE, a static XGBoost model, and climatology. This rigorous spatiotemporal cross-validation gives a reliable estimate of the forecast accuracy of the model as it would be deployed in real time and at unobserved locations. A sensitivity analysis shows that RNN model performance is mainly driven by the weather, specifically the wetting and drying equilibrium moisture content, which is derived from air temperature and RH. A case study of the Alexander Mountain wildfire in 2024 shows that the RNN model provides reliable FMC forecasts during periods of extreme fire danger. The RNN model could be deployed as an operational replacement for FMC sensors that are widely used, and further, the model will be integrated into the WRF-SFIRE modeling project. Future research will consider incorporating remote sensing data and also adapting this model to forecast FMC for other dead fuel classes.

Author Contributions

J.H., primary author; J.M., initial RNN work and editing the manuscript; A.F., data collection and theoretical support; K.H., general science support. All authors have read and agreed to the published version of the manuscript.

Funding

The research presented in this paper was partially funded by NASA grants 80NSSC22K1717, 80NSSC23K1118, 80NSSC22K1405, 80NSSC23K1344, and 80NSSC25K7276. Computing resources were provided by the University of Colorado Denver supercomputing cluster Alderaan, partially funded by NSF Campus Cyberinfrastructure grant OAC-2019089.

Data Availability Statement

The code associated with this project was built in Python and is publicly available in the GitHub repository at https://github.com/openwfm/ml_fmda (accessed on 22 December 2025). This repository contains the materials needed to reproduce the data set used in the analysis, the forecast accuracy metrics, tables, and images. The data used in this project is derived from public domain resources, including the Amazon AWS data set of historical HRRR forecasts (https://registry.opendata.aws/noaa-hrrr-pds/ (accessed on 2 November 2025)) and the historical RAWS data from Synoptic, accessed through the software package SynopticPy (https://github.com/blaylockbk/SynopticPy (accessed on 2 November 2025)). The data used in this project was collected, formatted, and stored using the University of Colorado Denver supercomputing cluster Alderaan. The exact data is available upon request due to the privacy restrictions of the university computing system and the substantial storage space needed for this particular project.

Acknowledgments

A portion of this work used code generously provided by Brian Blaylock’s Herbie Python packages Herbie (Version 2024.8.0) (https://github.com/blaylockbk/Herbie (accessed on 2 November 2025)) and SynopticPy (https://github.com/blaylockbk/SynopticPy (accessed on 2 November 2025)). Large Language Models, specifically ChatGPT-5, were utilized for support with coding and literature search.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FMC	Fuel moisture content
ML	Machine learning
RNN	Recurrent neural network
LSTM	Long short-term memory
MSE	Mean squared error

Appendix A. RNN Hyperparameter Tuning

Machine learning models have a set of fixed parameters that relate to model architecture or aspects of the fitting procedure, called “hyperparameters”. These hyperparameters are fixed in the sense that they are not updated during the normal training process. Instead, they are “tuned” by repeatedly running the model with various sets of hyperparameters and comparing the resulting predictive accuracy. Neural networks have many hyperparameters, which can greatly affect the training process and final model accuracy. There are many methods for hyperparameter tuning. Grid search is a rigorous method of hyperparameter tuning where all possible combinations of hyperparameters are compared within their given ranges. For neural networks, a full grid search usually results in a massive space of possible hyperparameter combinations that is not computationally feasible to search exhaustively. We utilized a restricted grid search, where a subset of the full number of hyperparameters was systematically explored, and other hyperparameters were fixed to theoretically or empirically justified defaults.

Hyperparameter tuning was performed by training various model architectures on data from 2022 and evaluating forecast accuracy in 2023. Different model architectures and optimization parameters were evaluated on each of these forecast periods, and the overall model accuracy in terms of the RMSE was averaged across all locations and times to compare models.

The model architecture with the lowest forecast RMSE in 2023 was selected. The hyperparameters were fixed from this point on. The model was trained from scratch in 2023 and used to forecast the entire year of 2024. The randomization of the RAWS used in training versus testing ensures spatial independence. The temporal ordering of training and testing ensures causal realism, and it ensures that the FMC observations from 2024 used to calculate the accuracy metrics presented in Section 3 were not used in any way to conduct hyperparameter tuning.

We used a two-step hyperparameter tuning procedure. First, we tuned the model architecture, including the type and number of hidden layers and the number of cell units per layer. The model architecture with the minimum average test RMSE from this step was used in the second stage of hyperparameter tuning. In the second stage, hyperparameters related to the optimization of trainable parameters were tuned, including batch size and learning rate. Using two stages in this way greatly reduces the number of possible model combinations and speeds up computation.

To tune the model architecture, we use the following constraints. We only consider LSTM and dense hidden layers. The LSTM layer(s) always precedes the dense layers. We consider one, two, or three recurrent LSTM layers, and we consider zero, one, two, or three dense layers. For each layer, we consider a grid of numbers of units 8, 16, 32, or 128. Further, we impose a “funnel”-like structure where the number of units always decreases or stays the same as you move across the layers from input to output. So, a layer with 64 units could feed into a 64-, 32-, or 16-unit layer, but a 16-unit layer can only feed into another 16-unit layer. For optimization-related parameters, we conducted a grid search for learning rate (0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001) and batch size (16, 32, 64, 128).

Other hyperparameters were fixed across all models. These include, but are not limited to, the following. The list of predictors was fixed to be equilibrium moisture content, hourly rainfall, elevation, wind speed, downward shortwave radiation, and the longitude and latitude coordinates of the RAWS. The sequence length, or number of time steps, was set to 48 because the goal is to predict 48 h into the future based on the forecast availability in the HRRR model. The activation functions were fixed to commonly accepted defaults for the layers. These are the hyperbolic tangent (tanh) for LSTM activation, sigmoid for LSTM recurrent activation, and rectified linear unit (ReLU) for dense layers. A dropout rate of 15% was used for dense layers and 20% for the recurrent layers. Dropout is a common regularization technique where in every training batch, a random sample of the network weights is set to zero. The “Adam” optimizer was used in training to minimize the loss function. Finally, we used early stopping with a patience of 5. This means that training is stopped if the accuracy on a validation set did not decrease for 5 epochs in a row. Using early stopping allows us to set the number of training epochs to a large number and avoid the need to tune this parameter. In practice, using 100 epochs with early stopping patience equal to 5 resulted in training where none of the iterations reached the maximum of 100, and training was typically halted after 5 to 10 epochs.

References

World Health Organization. Wildfires. 2024. Available online: https://www.who.int/health-topics/wildfires (accessed on 26 December 2024).
Water Resources Mission Area, U.S. Geological Survey. Water Quality After Wildfire. 2024. Available online: https://www.usgs.gov/mission-areas/water-resources/science/water-quality-after-wildfire (accessed on 26 December 2024).
Cartier, K. U.S. Fires Quadrupled in Size, Tripled in Frequency in 20 Years. Eos 2022, 103. Available online: https://eos.org/articles/u-s-fires-quadrupled-in-size-tripled-in-frequency-in-20-years (accessed on 2 November 2025). [CrossRef]
Gulev, I. Land-Climate Interactions. In IPCC Sixth Assessment Report; IPCC: Geneva, Switzerland, 2021; Chapter 2; pp. 133–168. Available online: https://www.ipcc.ch/site/assets/uploads/sites/4/2021/02/05_Chapter-2-V5.pdf (accessed on 3 July 2024).
CDFPC. Historical Wildfire Information. 2024. Available online: https://dfpc.colorado.gov/sections/wildfire-information-center/historical-wildfire-information (accessed on 26 December 2024).
Juliano, T.; Lareau, N.; Frediani, M.; Shamsaei, K.; Eghdami, M.; Kosiba, K.; Wurman, J.; DeCastro, A.; Kosovic, B.; Ebrahimian, H. Toward a Better Understanding of Wildfire Behavior in the Wildland-Urban Interface: A Case Study of the 2021 Marshall Fire. Geophys. Res. Lett. 2023, 50, e2022GL101557. [Google Scholar] [CrossRef]
NCEI. Dead Fuel Moisture. 2024. Available online: https://www.ncei.noaa.gov/access/monitoring/dyk/deadfuelmoisture (accessed on 31 May 2024).
NIFC. Remote Automatic Weather Stations (RAWS). 2024. Available online: https://www.nifc.gov/about-us/what-is-nifc/remote-automatic-weather-stations (accessed on 26 December 2024).
National Wildfire Coordinating Group. Live Fuel Moisture Content. 2022. Available online: https://web.archive.org/web/20250428064520/https://www.nwcg.gov/publications/pms437/fuel-moisture/live-fuel-moisture-content (accessed on 15 October 2025).
Lewis, C.H.M.; Little, K.; Graham, L.J.; Kettridge, N.; Ivison, K. Diurnal fuel moisture content variations of live and dead Calluna vegetation in a temperate peatland. Sci. Rep. 2024, 14, 4815. [Google Scholar] [CrossRef] [PubMed]
Viney, N.R. A review of fine fuel moisture modelling. Int. J. Wildland Fire 1991, 1, 215–234. [Google Scholar] [CrossRef]
Synoptic Data Public API. Mesonet Station Networks and Providers. 2025. Available online: https://demos.synopticdata.com/providers/index.html (accessed on 15 October 2025).
Campbell Scientific, Inc. CS506 Fuel Moisture Sensor Instruction Manual. 2015. Available online: https://s.campbellsci.com/documents/us/manuals/cs506.pdf (accessed on 3 July 2024).
Forest Technology Systems, Inc. Fuel Stick Sensor (FS-3) Technical Specifications. 2016. Available online: https://ftsinc.com/wp-content/uploads/2016/12/Fuel-Stick-Sensor-Technical-Specifications.pdf (accessed on 15 October 2025).
Mandel, J.; Amram, S.; Beezley, J.D.; Kelman, G.; Kochanski, A.K.; Kondratenko, V.Y.; Lynn, B.H.; Regev, B.; Vejmelka, M. Recent advances and applications of WRF-SFIRE. Nat. Hazards Earth Syst. Sci. 2014, 14, 2829–2845. [Google Scholar] [CrossRef]
Van Wagner, C.E.; Pickett, T.L. Equations and FORTRAN Program for the Canadian Forest Fire Weather Index System. Canadian Forestry Service, Forestry Technical Report 33. 1985. Available online: https://ostrnrcan-dostrncan.canada.ca/handle/1845/228362 (accessed on 2 November 2025).
Van Wagner, C.E. Development and Structure of the Canadian Forest Fire Weather Index System. Canadian Forestry Service Forestry Technical Report 35. 1987. Available online: https://ostrnrcan-dostrncan.canada.ca/handle/1845/228434 (accessed on 2 November 2025).
Nelson, R.M., Jr. Prediction of diurnal change in 10-h fuel stick moisture content. Can. J. For. Res. 2000, 30, 1071–1087. [Google Scholar] [CrossRef]
Carlson, J.D.; Bradshaw, L.S.; Nelson, R.M.; Bensch, R.R.; Jabrzemski, R. Application of the Nelson model to four timelag fuel classes using Oklahoma field observations: Model evaluation and comparison with National Fire Danger Rating System algorithms. Int. J. Wildland Fire 2007, 16, 204–216. [Google Scholar] [CrossRef]
Van der Kamp, D.W.; Moore, R.D.; McKendry, I.G. A model for simulating the moisture content of standardized fuel sticks of various sizes. Agric. For. Meteorol. 2017, 236, 123–134. [Google Scholar] [CrossRef]
Vejmelka, M.; Kochanski, A.K.; Mandel, J. Data assimilation of fuel moisture in WRF-SFIRE. In Proceedings of 4th Fire Behavior and Fuels Conference, Raleigh, NC, USA, 18–22 February 2013; St. Petersburg, Russia, 1–4 July 2013; Wade, D.D., Fox, R.L., Eds.; Compiled by M. L. Robinson; International Association of Wildland Fire: Missoula, MT, USA, 2014; pp. 122–137. Available online: https://www.iawfonline.org/wp-content/uploads/2018/02/4th_Fuels_Conference_Proceedings_USA-Russia_updated_5.28.2015.pdf (accessed on 15 October 2025).
Fan, C.; He, B. A Physics-Guided Deep Learning Model for 10-h Dead Fuel Moisture Content Estimation. Forests 2021, 12, 933. [Google Scholar] [CrossRef]
Kang, Z.; Jiao, M.; Zhou, Z. Fuel Moisture Content Forecasting Using Long Short-Term Memory(LSTM) Model. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 5672–5675. [Google Scholar] [CrossRef]
Schreck, J.S.; Petzke, W.; Jiménez, P.A.; Brummet, T.; Knievel, J.C.; James, E.; Kosović, B.; Gagne, D.J. Machine Learning and VIIRS Satellite Retrievals for Skillful Fuel Moisture Content Monitoring in Wildfire Management. Remote Sens. 2023, 15, 3372. [Google Scholar] [CrossRef]
Research Applications Laboratory, National Center for Atmospheric Research (NCAR). Fuel Moisture Content Retrievals. 2025. Available online: https://ral.ucar.edu/tool/fuel-moisture-content-retrievals (accessed on 31 October 2025).
Hou, X.; Wu, Z.; Zhu, S.; Li, Z.; Li, S. Comparative Analysis of Machine Learning-Based Predictive Models for Fine Dead Fuel Moisture of Subtropical Forest in China. Forests 2024, 15, 736. [Google Scholar] [CrossRef]
Rao, K.; Williams, A.P.; Flefil, J.F.; Konings, A.G. SAR-enhanced mapping of live fuel moisture content. Remote Sens. Environ. 2020, 245, 111797. [Google Scholar] [CrossRef]
Zhu, L.; Webb, G.; Yebra, M.; Scortechini, G.; Miller, L.; Petitjean, F. Live fuel moisture content estimation from MODIS: A deep learning approach. ISPRS J. Photogramm. Remote Sens. 2021, 179, 81–91. [Google Scholar] [CrossRef]
Miller, L.; Zhu, L.; Yebra, M.; Rüdiger, C.; Webb, G.I. Projecting live fuel moisture content via deep learning. Int. J. Wildland Fire 2023, 32, 709–727. [Google Scholar] [CrossRef]
Xie, J.; Qi, T.; Hu, W.; Huang, H.; Chen, B.; Zhang, J. Retrieval of Live Fuel Moisture Content Based on Multi-Source Remote Sensing Data and Ensemble Deep Learning Model. Remote Sens. 2022, 14, 4378. [Google Scholar] [CrossRef]
Wang, W.; Zhou, C.; Zhang, J.; Li, Y.; Chen, Z.; Luo, Y. A Novel Approach for Inverting Forest Fuel Moisture Content Utilizing Multi-Source Remote Sensing and Deep Learning. Forests 2025, 16, 1423. [Google Scholar] [CrossRef]
Mandel, J.; Hirschi, J.; Kochanski, A.K.; Farguell, A.; Haley, J.; Mallia, D.V.; Shaddy, B.; Oberai, A.A.; Hilburn, K.A. Building a Fuel Moisture Model for the Coupled Fire-Atmosphere Model WRF-SFIRE from Data: From Kalman Filters to Recurrent Neural Networks. In SNA’23 Seminar on Numerical Analysis; Institute of Geonics of the Czech Academy of Sciences: Brno, Czech Republic, 2023; pp. 52–55. Available online: https://www.ugn.cas.cz/event/2023/sna (accessed on 2 November 2025).
Vejmelka, M.; Kochanski, A.K.; Mandel, J. Data assimilation of dead fuel moisture observations from remote automatic weather stations. Int. J. Wildland Fire 2016, 25, 558–568. [Google Scholar] [CrossRef]
Blaylock, B. SynopticPy: Synoptic API for Python (Version 2024.12.0). Computer Software. 2024. Available online: https://github.com/blaylockbk/SynopticPy (accessed on 2 November 2025).
Dowell, D.C.; Alexander, C.R.; James, E.P.; Weygandt, S.S.; Benjamin, S.G.; Manikin, G.S.; Blake, B.T.; Brown, J.M.; Olson, J.B.; Hu, M.; et al. The High-Resolution Rapid Refresh (HRRR): An Hourly Updating Convection-Allowing Forecast Model. Part I: Motivation and System Description. Weather Forecast. 2022, 37, 1371–1395. [Google Scholar] [CrossRef]
Blaylock, B. Herbie: Retrieve Numerical Weather Prediction Model Data (Version 2024.8.0). Computer Software. 2024. Available online: https://github.com/blaylockbk/Herbie (accessed on 2 November 2025).
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 2nd ed.; O’Reilly Media: Sebastopol, CA, USA, 2019; Available online: https://research-ebsco-com.aurarialibrary.idm.oclc.org/linkprocessor/plink?id=6093bf50-e842-3a3a-a38b-be8566dfa5e1 (accessed on 2 November 2025).
Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into Deep Learning; Cambridge University Press: Cambridge, UK, 2023; Available online: https://d2l.ai/index.html (accessed on 31 October 2025).
Keras Team. Keras API: LSTM Layer. Model Training used return_sequences=True. 2025. Available online: https://keras.io/api/layers/recurrent_layers/lstm/ (accessed on 31 October 2025).
Keras Team. Keras API: Loss Functions. “sum_over_batch_size” Means the Loss Instance Will Return the Average of the Per-Sample Losses in the Batch. 2025. Available online: https://keras.io/api/losses/ (accessed on 31 October 2025).
Mandel, J.; Vejmelka, M.; Kochanski, A.K.; Farguell, A.; Haley, J.D.; Mallia, D.V.; Hilburn, K. An Interactive Data-Driven HPC System for Forecasting Weather, Wildland Fire, and Smoke. In Proceedings of the 2019 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC), Supercomputing 2019, Denver, CO, USA, 17 November 2019; pp. 35–44. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Hirschi, J. Custom Loss Functions in Fuel Moisture Modeling. arXiv 2025, arXiv:2501.10401. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009; Available online: https://link.springer.com/book/10.1007/978-0-387-84858-7 (accessed on 15 October 2025).
U.S. Forest Service, Arapaho & Roosevelt National Forests and Pawnee National Grassland. Alexander Mountain Fire Burned Area Summary Report: Burned Area Emergency Response (BAER); Technical Report; U.S. Department of Agriculture, Forest Service: Bellvue, CO, USA, 2024.
National Interagency Fire Center. Interagency Fire Perimeter History (All Years). 2024. Available online: https://services3.arcgis.com/T4QMspbfLg3qTGWY/ArcGIS/rest/services/InterAgencyFirePerimeterHistory_All_Years_View/FeatureServer/0 (accessed on 22 December 2025).

Figure 1. FMC observations over one week at station CHAC2, along with the key weather inputs of hourly rain, wetting equilibrium moisture content, and drying equilibrium moisture content. The time series displays diurnal variability across a wide range of FMC values that are relevant to wildfire risk and behavior.

Figure 2. Rocky Mountain GACC–161 RAWS with valid data from 2023 to 2024. (a) National view is rendered in Web Mercator (EPSG:3857). (b) Regional view is plotted using Lambert Conformal Conic projection with central longitude 110 W and central latitude 40 N.

Figure 3. A single recurrent unit takes as input weather data and a recurrent state (left). An LSTM layer and subsequent dense layers form the recurrent unit (right). For each LSTM cell, the recurrent state has two elements: the short-term hidden state and the long-term cell state.

Figure 4. The recurrent unit unrolled in time. The recurrent unit with fixed parameters is applied at each time step, and a recurrent state is maintained over time. The output is a mapping of weather inputs to the FMC at the same time.

Figure 5. A single LSTM cell, illustrated following the formulation in Géron [37], p. 569.

Figure 6. Example of predicted versus observed time series of FMC.

Figure 7. Spatiotemporal cross-validation diagram. Spatial locations are randomly sampled for train/test sets (left), and training period precedes testing period (right).

Figure 8. Observed (left) and predicted (right) FMC distributions. The x-axis indicates FMC (%), and the y-axis indicates the frequency of occurrences. Across 500 replications, there were a total of 46,165,248 individual FMC predictions generated, with an average of 92,300 predictions per replication.

Figure 9. Forecast error by RAWS, averaged over all times and replications. Blue color indicates stations where RMSE was lower than overall estimated RMSE of 3.02%, and red indicates stations where RMSE was higher. Triangles represent high-elevation stations (>1800 m), and squares represent low-elevation stations (≤1800 m).

Figure 10. Forecast error for RNN, ODE+KF, XGBoost, and climatology methods by hour of day, averaged over all locations, replications, and days of year.

Figure 11. The 2024 Alexander Mountain Fire perimeter from NIFC (red) and the 4 nearest RAWS (blue).

Figure 12. Distribution of RMSE for four stations near Alexander Mountain Fire across replications.

Figure 13. Forecasted and observed FMC (top) and some weather inputs (bottom) for RAWS BEYC2 during pre-fire period for Alexander Mountain Fire.

Table 1. Summary of predictors.

Variable	Units	Description	Mean	Low	High
Drying Equilibrium	%	Derived from RH and temperature.	15.21	1.33	36.90
Wetting Equilibrium	%	Derived from RH and temperature.	13.84	0.61	34.65
Solar Radiation	$W m^{- 2}$	Downward shortwave radiative flux.	220.99	0.00	1130.00
Wind Speed	$m s^{- 1}$	Wind speed at 10 m.	4.70	0.13	32.50
Elevation	meters	Height above sea level.	1859.86	304.80	3521.96
Longitude	degree	X-coordinate.	−104.78	−110.93	−95.85
Latitude	degree	Y-coordinate.	40.90	37.09	45.99
Rain	$mm h^{- 1}$	Calculated from rain accumulated over the hour.	0.04	0.00	68.29
Hour of Day	hours	From 0 to 23, UTC time.	11.50	0.00	23.00
Day of Year	days	From 0 to 365 or 366 (leap year in 2024).	197.00	1.00	366.00

Table 2. RNN hyperparameters.

Hyperparameter	Value	Description
LSTM Layers	1	Number of recurrent layers in the model.
LSTM Units	64	Number of cells in the recurrent layer.
LSTM Activation	Hyperbolic Tangent	Activation function for the recurrent layer.
Dense Layers	3	Number of fully connected feed-forward layers downstream from the recurrent layer.
Dense Units	32, 16, and 1	Number of cells in the dense layers.
Dense Activation	ReLU, ReLU, Linear	Activation functions for the dense layers.
Learning Rate	0.001	Controls how fast weights update during training.
Time Steps	48	Sequence size.
Batch Size	64	Number of sequences to process before updating weights.
Dropout	0.15	Rate that weights are randomly set at during training.
Early Stopping Patience	5	Number of epochs with no improvement in validation error before stopping training.
Scaling	Standard	Scaling inputs to mean zero and std. 1.

Table 3. ODE+KF hyperparameters.

Hyperparameter	Value	Description
Spin-up	24	Number of hours to run model with data assimilation.
Process Variance	1 × 10⁻³	Uncertainty in model dynamics
Data Variance	1 × 10⁻³	Uncertainty in measurement of input data.
$r_{0}$	0.05	Threshold rainfall intensity (mm h⁻¹).
$r_{s}$	8.00	Saturation rainfall intensity (mm h⁻¹).
$T_{r}$	14.00	Characteristic decay time for wetting dynamics (h).
S	250	Saturation FMC level.
T	10.00	Characteristic decay time for fuel class.

Table 4. XGBoost hyperparameters.

Hyperparameter	Value	Description
N Estimators	100	Number of trees in ensemble.
Max Tree Depth	3	Maximum tree depth.
Learning Rate	0.10	Controls how fast weights update during training.
Min Child Weight	1	Minimum sum of Hessian of samples in a partition.
Gamma	0.10	Minimum loss reduction required for a tree partition.
Subsample	0.80	Percent of training data randomly selected in each boosting iteration.
Column Sample by Tree	0.90	Percent of predictors randomly selected in each boosting iteration.

Table 5. Climatology hyperparameters.

Hyperparameter	Value	Description
Years	10	Number of years to look for historical FMC at a given location.
Days	15	Number of days plus or minus the target time to aggregate historical FMC data.
Min. Years	6	Minimum unique number of years of observed FMC to generate a prediction.

Table 6. Mathematical definitions used for accuracy metrics.

Term	Description	Data Summary
N	Number of unique RAWS	151 RAWS with FMC data in 2024 in study region
T	Number of forecast hours	8784 h in 2024 (a leap year)
R	Number of replications	500
$y_{i, t}$	Observed FMC at location i and time ¹ t	-
$y_{r, i, t}$	Predicted FMC at location i, time t, and replication ² r	-

¹ Observed FMC has no variance across replications. ² Predicted FMC has variance across replications due to random selection of stations used in training set and random initializations of ML model weights.

Table 7. Overall forecast error.

Model	Bias (% FMC)	RMSE (% FMC)
RNN	−0.19 ± 0.33	3.02 ± 1.12
ODE+KF	−2.41 ± 0.24	4.97 ± 1.52
XGBoost	−0.4 ± 0.32	4.19 ± 1.31
Climatology	−0.64 ± 0.4	5.0 ± 1.46

Table 8. RNN bias by FMC level.

FMC Level	Bias (% FMC)	±Std
Low (0–10)	−1.19	0.28
Medium (10–20)	0.18	0.41
High (20+)	4.46	0.64

Table 9. Sensitivity analysis for predictor groups. The overall estimated RMSE for each model configuration is averaged over all replications, all locations, and all times. The relative difference divides the difference by the baseline estimate from the full model.

Configuration	RMSE	Relative Difference	Difference
Full Model	3.02 ± 1.12
No Geographic	3.06	1.32%	0.04
No Temporal	3.2	5.96%	0.18
No Weather	5.37	77.81%	2.35

Table 10. Sensitivity analysis for specific predictors. The overall estimated RMSE for each model configuration is averaged over all replications, all locations, and all times.

Configuration	RMSE Mean	Relative Difference	Difference
Full Model	3.02 ± 1.12
No Lat/Lon	3.07	1.66%	0.05
No Solar	3.07	1.66%	0.05
No Rain	3.09	2.32%	0.07
No Wind	3.1	2.65%	0.08
No Equilibria	4.23	40.07%	1.21

Table 11. RAWS within Alexander Mountain Fire area with valid FMC data during pre-fire period.

STID	Elevation	Longitude	Latitude
BEYC2	1704.14	−105.036700	40.811520
ESPC2	2405.48	−105.562780	40.366360
RFRC2	2509.42	−105.572360	40.792810
TT789	2051.30	−105.302420	40.174800

Table 12. RNN forecast accuracy during pre-fire period, 25 July, midnight to 27 July 23:00, Mountain time.

STID	N. Hours	Median RMSE	Min. RMSE	Max. RMSE	RMSE (0–10 FMC)
BEYC2	72	2.32	1.86	4.10	2.35
ESPC2	72	2.19	1.60	3.07	1.74
RFRC2	30	3.08	2.25	3.93	1.64
TT789	30	3.11	2.76	3.53	2.18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hirschi, J.; Mandel, J.; Hilburn, K.; Farguell, A. A Recurrent Neural Network for Forecasting Dead Fuel Moisture Content with Inputs from Numerical Weather Models. Fire 2026, 9, 26. https://doi.org/10.3390/fire9010026

AMA Style

Hirschi J, Mandel J, Hilburn K, Farguell A. A Recurrent Neural Network for Forecasting Dead Fuel Moisture Content with Inputs from Numerical Weather Models. Fire. 2026; 9(1):26. https://doi.org/10.3390/fire9010026

Chicago/Turabian Style

Hirschi, Jonathon, Jan Mandel, Kyle Hilburn, and Angel Farguell. 2026. "A Recurrent Neural Network for Forecasting Dead Fuel Moisture Content with Inputs from Numerical Weather Models" Fire 9, no. 1: 26. https://doi.org/10.3390/fire9010026

APA Style

Hirschi, J., Mandel, J., Hilburn, K., & Farguell, A. (2026). A Recurrent Neural Network for Forecasting Dead Fuel Moisture Content with Inputs from Numerical Weather Models. Fire, 9(1), 26. https://doi.org/10.3390/fire9010026

Article Menu

A Recurrent Neural Network for Forecasting Dead Fuel Moisture Content with Inputs from Numerical Weather Models

Abstract

1. Introduction

1.1. Background

1.2. Literature Review

2. Materials and Methods

2.1. Data

2.1.1. FMC Observations from RAWS

2.1.2. Predictors of FMC

2.1.3. Data Collection Summary

2.2. Recurrent Neural Networks

2.3. RNN Training, Prediction, and Tuning

2.4. Recurrent Neural Network for Forecasting FMC

2.5. Baseline Methods

2.5.1. ODE Model with Kalman Filter in WRF-SFIRE

2.5.2. The XGBoost Static Model

2.5.3. Climatology Method

2.6. Analysis Design

3. Results

3.1. Predictor Sensitivity Analysis

3.2. Case Study—The Alexander Mountain Fire, Colorado, Summer 2024

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. RNN Hyperparameter Tuning

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI