A Physics-Guided Deep Learning Model for 10-h Dead Fuel Moisture Content Estimation

: Dead fuel moisture content (DFMC) is a key driver for ﬁre occurrence and is often an important input to many ﬁre simulation models. There are two main approaches to estimating DFMC: empirical and process-based models. The former mainly relies on empirical methods to build relationships between the input drivers (weather, fuel and site characteristics) and observed DFMC. The latter attempts to simulate the processes that occur in the fuel with energy and water balance conservation equations. However, empirical models lack explanations for physical processes, and process-based models may provide an incomplete representation of DFMC. To combine the beneﬁts of empirical and process-based models, here we introduced the Long Short-Term Memory (LSTM) network and its combination with an effective physics process-based model fuel stick moisture model (FSMM) to estimate DFMC. The LSTM network showed its powerful ability in describing the temporal dynamic changes of DFMC with high R 2 (0.91), low RMSE (3.24%) and MAE (1.97%). When combined with a FSMM model, the physics-guided model FSMM-LSTM showed betterperformance ( R 2 = 0.96, RMSE = 2.21% and MAE = 1.41%) compared with the other models. Therefore, the combination of the physics process and deep learning estimated 10-h DFMC more accurately, allowing the improvement of wildﬁre risk assessments and ﬁre simulating.


Introduction
Wildfires are crucial to water and carbon cycles on earth [1][2][3][4]. On one hand, they emit CO 2 into the atmosphere, which may inflict damage to climate [5], air quality [6] and health [7]. On the other hand, human lives, property and infrastructure are vulnerable to large, high-intensity wildfires [8]. Thus, predicting forest wildfire risk counts for a great deal [9].
Fuels consumed in wildfires are composed of dead and live plant material. Many efforts have focused on live fuels [10][11][12], whose moisture content changes slowly throughout the day [13]. While Dead Fuel Moisture Content (DFMC) responds rapidly to atmospheric conditions, which is of great importance in affecting fire, such as ignition probability, spread rate and intensity [1,14,15]. Therefore, DFMC estimation is required for quantifying fire danger, and almost all fire models include DFMC as an input variable [16].
DFMC is a function of fuel size and atmospheric conditions [17]. It increases or decreases with the change of climate variables through water vapor sorption or desorption until it eventually reaches a stable moisture content, i.e., Equilibrium Moisture Content (EMC) [18]. The time it takes to lose or gain about 63% of the difference between its initial values and EMC is defined as time-lag [19], which is related to the diameter of fuel. With the time-lag theory, dead fuels can be divided into four categories (1-h, 10-h, 100-h and 1000-h fuels, where the number refers to the time-lag of fuel) [20]. For instance, 10-h fuel usually is related to the fuel with a diameter ranging from 0.64 to 2.54 cm [21]. The 10-h DFMC is a promising predictor since it can be automatically measured in real time at study sites. For dead fuel with a certain size, its water content is commonly modeled with meteorological variables such as air temperature, relative humidity, wind speed, rainfall, solar radiation, soil moisture

Materials
The dataset used in this study is available in D.W. van der Kamp et al. (2017) [49] and shared in Github (https://github.com/dvdkamp/fsmm, accessed on 20 May 2021). The observation values were collected at a forested field site located near Kamloops, British Columbia, Canada, between May and September 2014. Both the DFMC and other required variables (i.e., air temperature, relative humidity) were measured at the BC1 site in 3156 h. The 10-h DFMC observations were measured by Campbell Scientific CS506 Fuel Moisture Sensors at a standard height of 30.5 cm. This sensor consists of a time-domain reflectometer probe embedded within a standard moisture stick with a length of 50.8 cm and a radius of 0.65 cm. A Rotronic HC-S3 humidity and air temperature sensor (also at a height of 30.5 cm), a Rainwise tipping bucket rain-gauge, a Met One anemometer and a Kipp and Zonen CM3 pyranometer were used to record the air relative humidity, air temperature, rainfall, wind speed and solar radiation, respectively. Table 1 shows all the input variables. Relative humidity (0-100%) RH 3 Rainfall (cm) P 4 Wind speed (m/s) Ws 5 Sun altitude (rad) S alt 6 Sun azimuth (rad) S azi 7 Downwelling direct shortwave radiation (W/m 2 ) K dir 8 Downwelling diffuse shortwave radiation (W/m 2 ) K diff 9 Downwelling longwave radiation (W/m 2 ) L

Methods
We combined FSMM (process-based model) and LSTM (empirical model) to estimate 10-h DFMC (hereafter 'FSMM-LSTM'). FSMM was selected as the fundamental process, because of its good performance across all size fuels [49]. Usually, FSMM estimates the DFMC at a certain time t, based on DFMC value at the last time t-1 [50]. On the other hand, the temporal dynamic information of DFMC can be mined by hourly iterations [51], and this dynamic information can be well processed by the LSTM algorithm [52]. Thus, LSTM was selected as the empirical model to estimate DFMC [52]. Figure 1 shows the flowchart of the DFMC estimation algorithms, including LSTM and FSMM-LSTM. radius of 0.65 cm. A Rotronic HC-S3 humidity and air temperature sensor (also at a height of 30.5 cm), a Rainwise tipping bucket rain-gauge, a Met One anemometer and a Kipp and Zonen CM3 pyranometer were used to record the air relative humidity, air temperature, rainfall, wind speed and solar radiation, respectively. Table 1 shows all the input variables. Relative humidity (0-100%) RH 3 Rainfall (cm) P 4 Wind speed (m/s) Ws 5 Sun altitude (rad) Salt 6 Sun azimuth (rad) Sazi 7 Downwelling direct shortwave radiation (W/m 2 ) Kdir 8 Downwelling diffuse shortwave radiation (W/m 2 ) Kdiff 9 Downwelling longwave radiation (W/m 2 ) L

Methods
We combined FSMM (process-based model) and LSTM (empirical model) to estimate 10-h DFMC (hereafter 'FSMM-LSTM'). FSMM was selected as the fundamental process, because of its good performance across all size fuels [49]. Usually, FSMM estimates the DFMC at a certain time t, based on DFMC value at the last time t-1 [50]. On the other hand, the temporal dynamic information of DFMC can be mined by hourly iterations [51], and this dynamic information can be well processed by the LSTM algorithm [52]. Thus, LSTM was selected as the empirical model to estimate DFMC [52]. Figure 1 shows the flowchart of the DFMC estimation algorithms, including LSTM and FSMM-LSTM.  Table 1.   [49]. With hourly measurements of relative humidity, air temperature, precipitation, wind speed, shortwave radiation and longwave radiation, FSMM can estimate DFMC of fuels at multiple sizes, particularly including the 10-h fuel. In this model, dead fuel was divided into two zones: an outer part that responds to atmospheric forcing, and a central core that only trades water and energy with the outer layer. The schematic of FSMM is shown in  Figure 3. Based on an initial value m t−1 , FSMM can estimate the DFMC at t (m t ) with hourly input variables and subsequently based on m t , the value at t + 1 (m t+1 ) can be estimated as well. The water content of the outer layer and the core was named m o and m c , respectively. In addition, the temperatures of them were named To and Tc. Then the average temperature, Ts and the moisture content, m s of the fuel can be calculated as: where f is the fraction of the fuel volume taken up by the outer layer and can be estimated via calibration.  [49]. With hourly measurements of relative humidity, air temperature, precipitation, wind speed, shortwave radiation and longwave radiation, FSMM can estimate DFMC of fuels at multiple sizes, particularly including the 10-h fuel. In this model, dead fuel was divided into two zones: an outer part that responds to atmospheric forcing, and a central core that only trades water and energy with the outer layer. The schematic of FSMM is shown in Figure 2 and the diagram of the directed graph of FSMM is shown in Figure 3. Based on an initial value mt−1, FSMM can estimate the DFMC at t (mt) with hourly input variables and subsequently based on mt, the value at t + 1 (mt+1) can be estimated as well. The water content of the outer layer and the core was named mo and mc, respectively. In addition, the temperatures of them were named To and Tc. Then the average temperature, Ts and the moisture content, ms of the fuel can be calculated as:

Physics-Guide Model
where is the fraction of the fuel volume taken up by the outer layer and can be estimated via calibration. Figure 2. FSMM schematic. The mt−1 is refer to DFMC at time t-1 while mt is the DFMC at time t. E is moisture flux between the outer layer and the atmosphere (kg/s), K is shortwave radiation (W/m 2 ), L is longwave radiation (W/m 2 ), P is precipitation (kg/s), Q is turbulent heat flux (W/m 2 ), Qh is sensible heat flux (W/m 2 ), D is diffusion into the core (kg/s) and C is conduction into the core (W). Figure 2. FSMM schematic. The m t−1 is refer to DFMC at time t-1 while m t is the DFMC at time t. E is moisture flux between the outer layer and the atmosphere (kg/s), K is shortwave radiation (W/m 2 ), L is longwave radiation (W/m 2 ), P is precipitation (kg/s), Q is turbulent heat flux (W/m 2 ), Q h is sensible heat flux (W/m 2 ), D is diffusion into the core (kg/s) and C is conduction into the core (W).  Table 1.
The water exchange between air and the outer layer is affected by three processes: absorbed precipitation (P), evaporation/desorption (E) and diffusion (D) into the core: where is the surface area of the entire fuel and − 2 * π * 2 means the lateral fuel  Table 1. The water exchange between air and the outer layer is affected by three processes: absorbed precipitation (P), evaporation/desorption (E) and diffusion (D) into the core: where a s is the surface area of the entire fuel and a s − 2 * π * r 2 means the lateral fuel surface. All these three terms are in units of kg/s. Evaporation is closely related to the latent heat flux, which connects the moisture content and the energy budgets. The moisture content budget for the core only contains diffusion: The outer layer temperature To, it involves multiple energy exchange processes: where K dir and K di f f is the absorbed direct and diffuse shortwave radiation, respectively (W). L is the absorbed longwave radiation (W). L emit is the emitted longwave radiation (W/m 2 ), Q h is the sensible heat flux (W/m 2 ), Q e is the latent heat flux (W/m 2 ) and C is the conduction into the fuel's core (W). The coefficient C S is the fuel-specific heat (J/(K*kg), ρ s is the stick density (400 kg/m 3 ) [38] and V o is the volume of the outer layer.
The only energy exchange between the outer layer and core is conduction:

LSTM Network
The LSTM network is developed based on the standard RNN [53], which has good performance in describing the temporal dynamic changes of the time-sequential data such as DFMC. The LSTM model was proposed to solve the problem of gradient vanishing as well as the explosion of long-term dependences in traditional the RNN networks [45]. The schematic of LSTM is shown in Figure 4. The LSTM network computed a mapping from an input sequence to an output sequence ( Figure 4A). is the input driver introduced in Table 1.
There are three gate controllers used to determine what information should be forgotten in the LSTM unit: input ( ), forget ( ) and output gates ( ). Switching the gates to prevent the gradient from vanishing, the LSTM network kept the temporal memory. The basic LSTM unit requires the current input vector , its previous cell state −1 and  Table 1. There are three gate controllers used to determine what information should be forgotten in the LSTM unit: input (Z i ), forget (Z f ) and output gates (Z o ). Switching the gates to prevent the gradient from vanishing, the LSTM network kept the temporal memory. The basic LSTM unit requires the current input vector X t , its previous cell state C t−1 and the previous hidden state h t−1 . These three gates are obtained as: where σ is the nonlinear activation function, usually it is set to be the sigmoid function. Except for these three gates, another intermediate state Z is calculated as: where tanh means the nonlinear tanh activation function.
Then, the memory cell (C t ) and hidden state (h t ) of this LSTM are updated as: where represents the pointwise multiplication operation for two vectors Therefore, y t can be obtained as: We applied the LSTM network by following the Keras [54] package with Tensorflow backend [55]. To avoid the over-fitting issues and improve the convergence speed in the LSTM training process, we adjusted the epochs, batch size, time step, learning rate, neurons, dropout and patience, as well as early stopping ( Table 2). The optimal LSTM network was determined based on the comprehensive consideration of the prediction accuracy and stability of the model. For example, a time step of 20 could successfully capture the dynamic changes of DFMC in time series. To avoid over-fitting, an early stopping procedure was employed using 10% of all data for validation, where the patience value was set to 65. The max epoch was set as 500. Given our predictive learning problem where we are given a variety of input drivers, X, that are physically related to our target variable of interest, DFMC. An efficient approach is to train a data science model, e.g., a recurrent neural network, f LSTM : X → DFMC , over a set of training variables. Then DFMC can be estimated through this trained model. Alternatively, a physics-based numerical model, e.g., FSMM, f PHY : X → DFMC , can also be used to estimate the value of our target variable, with its physical relationships with the input variables. However, process-based model may provide an incomplete description of the target variable because of simplified or missing physics in f PHY [42], which may lead to inaccurate DFMC estimates and erroneous judgment of fire risk. Therefore, a hybrid model was proposed to combine f PHY and f LSTM as to overcome their respective shortcomings and take advantage of the information in both physics and data. The schematic of the hybrid model (FSMM-LSTM) is shown in Figure 5. This model was composed of two parts: FSMM and LSTM. All input variables need to be sent into the FSMM model to estimate a sequence of DFMC. Then the estimated DFMC becomes the input of the LSTM model, along with input variables such as air temperature, relative humidity. Finally, the final variable DFMC was estimated.
proach is to train a data science model, e.g., a recurrent neural network, ∶ → , over a set of training variables. Then DFMC can be estimated through this trained model. Alternatively, a physics-based numerical model, e.g., FSMM, ∶ → , can also be used to estimate the value of our target variable, with its physical relationships with the input variables. However, process-based model may provide an incomplete description of the target variable because of simplified or missing physics in [42], which may lead to inaccurate DFMC estimates and erroneous judgment of fire risk. Therefore, a hybrid model was proposed to combine and as to overcome their respective shortcomings and take advantage of the information in both physics and data. The schematic of the hybrid model (FSMM-LSTM) is shown in Figure 5. This model was composed of two parts: FSMM and LSTM. All input variables need to be sent into the FSMM model to estimate a sequence of DFMC. Then the estimated DFMC becomes the input of the LSTM model, along with input variables such as air temperature, relative humidity. Finally, the final variable DFMC was estimated.
Since the input variables of the FSMM-LSTM model contain not only the input drivers used in the LSTM network but also the output of the FSMM model, we cannot use the optimal LSTM network parameters to optimize the FSMM-LSTM model. Therefore, we adjusted the hyperparameters by multiple debugging. The training hyperparameters for the FSMM-LSTM algorithm is shown in Table 2.  Since the input variables of the FSMM-LSTM model contain not only the input drivers used in the LSTM network but also the output of the FSMM model, we cannot use the optimal LSTM network parameters to optimize the FSMM-LSTM model. Therefore, we adjusted the hyperparameters by multiple debugging. The training hyperparameters for the FSMM-LSTM algorithm is shown in Table 2.

Model Comparison
In previous studies, a set of models including process-based methods and empirical methods have been developed for DFMC estimating. Of them, an effective process-based model FSMM has already been introduced in Section 3.1.1. Furthermore, a simple but efficient method MLR was used. In addition, machine learning methods such as random forest were selected due to their excellent performance in DFMC estimating [25]. Another machine learning method ANN has shown better performance than random forest in regression and classification tasks [56,57]. However, to the best of our knowledge, no previous study has applied ANN to estimate DFMC of any size dead fuel, including 10-h DFMC. Therefore, ANN was select here to make sure its performance in DFMC estimating.

MLR
Given observation data of both the predictors for a specific sample, simple linear regression was used to estimate the relationship between a response predictor and a single explanatory predictor [58]. Adding additional explanatory predictors to a simple linear regression model developed a multiple linear regression model.

Random Forest
Random forest, based on the classification and regression tree (CART) [59], was one of the widely used machine-learning methods. It can build an ensemble of a lot of regression trees using generally two-thirds of the whole data as bootstrapped training data [25]. Each tree can train a predefined size part of the whole available variables. The final regression of the random forest depends on the votes of the multiple trees [60]. With the correlation among decision trees in random forest decrease, the ensemble of the trees is more reliable [61]. In this study, the number of trees was set as 100 for a balance between computational cost and reliable performance.

ANN
ANN is a computing system composed of a collection of connected artificial neurons [62]. Each neuron means a specific output function, named the activation function. Every connection between two neurons is a weight for the signal passing through the connection, which is equivalent to the memory of ANN [63]. It can handle nonlinear and complex problems better [64]. Due to its powerful ability to describe the relationship between inputs and outputs from the training data, it has been widely used in geophysical parameter estimations [56]. For the ANN algorithm, there was two hidden layers with 40 units and 50 units, respectively.

Model Evaluation
We have a continuous time series of DFMC data from May to September 2014. Empirical methods are prone to over-fit to the training data, rendering the calculation of the model's performance on training data unusable [9]. Therefore, 10-fold cross-validation was adopted to validate the models except for the process-based model, FSMM. In other words, the whole data were divided into 10 pieces equally. One piece was for verification in each run, and the remaining nine pieces were for training. This process should repeat 10 times until each piece was selected for verification. To fairly compare the performance of the models, the input variables of all other models were the same as the LSTM network except the FSMM-LSTM model which has an extra input from the output of the FSMM model. The determination coefficient (R 2 ), the root mean square error (RMSE) and the mean absolute error (MAE) were used to evaluate all model performances. Figure 6 provides a summary of the performance of different methods for modeling DFMC on the BC1 site. The DFMC estimated by MLR is heavily underestimated and the R 2 is 0.50, which is much lower than that of other models. Compared to MLR, the black-box data science models such as random forest and ANN, can capture the non-linear relationships between variables and DFMC without using the physics process. The random forest and ANN are much better than MLR, but still cannot reach the performance level of the process-based model, FSMM. The LSTM model achieved R 2 of 0.91, RMSE of 3.24% and MAE of 1.97%, which is significantly higher than that of MLR, random forest and ANN. Compared to FSMM, LSTM has a close R 2 , but lower RMSE and MAE. This demonstrates that knowledge gaps of the process model may be closing as long as the information of the data is fully mined and used efficiently. If the output of the process-based model along with the variables are used as input of the FSMM-LSTM model, the results are the best, with R 2 of 0.96, RMSE of 2.21% and MAE of 1.41%.

Results
of the process-based model, FSMM. The LSTM model achieved R 2 of 0.91, RMSE of 3.24% and MAE of 1.97%, which is significantly higher than that of MLR, random forest and ANN. Compared to FSMM, LSTM has a close R 2 , but lower RMSE and MAE. This demonstrates that knowledge gaps of the process model may be closing as long as the information of the data is fully mined and used efficiently. If the output of the process-based model along with the variables are used as input of the FSMM-LSTM model, the results are the best, with R 2 of 0.96, RMSE of 2.21% and MAE of 1.41%. To compare the performance of the six models in time-series more accurately, we provide their time series pattern in Figure 7. The MLR model is still the worst, which will overestimate or underestimate the DFMC. The results, estimated by random forest and ANN algorithms, can basically agree with measurements in the time series, but there are still plenty of significant overestimations and underestimations. Furthermore, the process-based FSMM model, generates a satisfying result, except for the underestimates from May 20 to May 28 and August 14 to 16. The LSTM model generally shares a similar underestimation as FSMM, but it is more minor. The FSMM-LSTM algorithm still achieved To compare the performance of the six models in time-series more accurately, we provide their time series pattern in Figure 7. The MLR model is still the worst, which will overestimate or underestimate the DFMC. The results, estimated by random forest and ANN algorithms, can basically agree with measurements in the time series, but there are still plenty of significant overestimations and underestimations. Furthermore, the process-based FSMM model, generates a satisfying result, except for the underestimates from May 20 to May 28 and August 14 to 16. The LSTM model generally shares a similar underestimation as FSMM, but it is more minor. The FSMM-LSTM algorithm still achieved the best results, that the estimated results are in perfect agreement with the measured values, with no overestimations and underestimations over the entire time series.
In the practical application of the model, a non-negligible aspect is computation efficiency. Thus, we listed the calibration time and the test time for all models on our dataset in Table 3. When the process-based model FSMM was used, the computing time is enormous on both calibration (53.75 h) and test (7.53 h). However, all the data-driven models including the LSTM network did not take much time to train and test. Of all data-driven models, the LSTM network showed a comparative result with FSMM with higher efficiency. When the FSMM model and the LSTM network were combined, the total time cost is the same as FSMM, but the test time cost is very tiny.  In the practical application of the model, a non-negligible aspect is computation efficiency. Thus, we listed the calibration time and the test time for all models on our dataset in Table 3. When the process-based model FSMM was used, the computing time is enormous on both calibration (53.75 h) and test (7.53 h). However, all the data-driven models including the LSTM network did not take much time to train and test. Of all data-driven models, the LSTM network showed a comparative result with FSMM with higher efficiency. When the FSMM model and the LSTM network were combined, the total time cost is the same as FSMM, but the test time cost is very tiny.

Discussion
The classical deep learning network-LSTM and its combination with a physics model FSMM were introduced to estimate DFMC. Furthermore, the effective process-based model FSMM and excellent empirical methods (MLR, random forest and ANN) were implemented for comparison.
Our results suggest that the MLR model has the worst performance, this may be because the linear model is insufficient to characterize the non-linear relationship between the input variables and the DFMC. Except for this, when black-box data science models such as random forest and ANN were used, the result did not become much better. The reason is that although random forest and ANN try to learn the non-linear relationships between drivers and DFMC, they cannot capture the information on a time series, that is what the LSTM network performs better in. The results of the LSTM network showed that the information in the data if it is fully mined, may help in closing knowledge gaps of the process model. Even though the process-based model FSMM performed well as the LSTM network, it is at the expense of enormous time (total about 61.28 h), which may be due to the continuous iteration hour by hour and the 3600 times incessant iteration in each hour. When we combined LSTM and FSMM (FSMM-LSTM), we can achieve even better results. This is because the output of FSMM contains important physical information about the dynamics of DFMC which when coupled with powerful data science frameworks such as the LSTM network, can result in great improvements in R 2 and RMSE ( Table 3).
The LSTM network is a classic deep learning algorithm that performs well in capturing the temporal relationship in data. Furthermore, there are plenty of variants of the LSTM network. For example, SLSTM [53], in which the quality variable vector is additionally regarded as the input of the intermediate cell and the three gates. In addition, when the residual connection was applied to the LSTM network, a more efficient deep learning algorithm was developed [65]. It was beyond the scope of our study to investigate whether variants of the LSTM network performed best. We introduced the classic LSTM network only to show the superior performance of the deep learning network in estimating DFMC.
This study focuses on 10-h DFMC while there are four sizes of fuel in total: 1-h, 10-h, 100-h, 1000-h. The results showed that both the LSTM network and FSMM-LSTM model worked excellently in 10-h DFMC estimating. Since LSTM is a data-driven algorithm, it is not difficult to figure out its excellent performance on 1-h DFMC, 100-h DFMC and 1000-h DFMC. The FSMM-LSTM model performed better than the LSTM network, based on the hybrid of the LSTM network and a complex process-based model. Since the FSMM model has shown outstanding performances for all-size fuel [49], it seems that the FSMM-LSTM model also applies to all types of fuel.
In addition, some other methods are effective at estimating DFMC, such as  2010) is a complete process-based model which simulated the processes that occur in the fuel with energy and water balance conservation equations. These two methods are the typical example of the empirical method and the process-based model. However, empirical models lack explanations for physical processes, and process-based models may provide an incomplete representation of DFMC. Our study provides a kind of idea to combine them. Going forward, there are some directions that can be exploited future as a continuation of this work. Firstly, the LSTM network used in this study is not a state-of-the-art recurrent neural network and can be replaced. Second, for the specific problem of DFMC estimating, given its temporal and spatial nature, a promising extension would be to explore the temporal and spatial dependencies in DFMC. Third, we present a simple way for constructing a hybrid physics process and data information model by using the output of the process-based model as an input of the data science model, more complex ways of constructing models need to be explored to make the process-based and data science parts are tightly coupled.

Conclusions
In this study, a widely used deep learning method LSTM was introduced in DFMC estimation. Furthermore, we proposed a novel approach FSMM-LSTM to estimate DFMC, by using the outputs of process-based model estimates to guide the learning of the LSTM network. By anchoring the LSTM network with a priori knowledge, we found that the proposed physics-guided method was superior across all conditions relative to the other models, which may help acquire more accurate DFMC estimates to address wildfire risk assessments and fire simulation.

Data Availability Statement:
The data that support the findings of this study are openly available in Github at https://github.com/dvdkamp/fsmm, the last visit was on the 20 of May 2021.