Machine Learning Techniques for Fine Dead Fuel Load Estimation Using Multi-Source Remote Sensing Data

: Fine dead fuel load is one of the most signiﬁcant components of wildﬁres without which ignition would fail. Several studies have previously investigated 1-h fuel load using standard fuel parameters or site-speciﬁc fuel parameters estimated ad hoc for the landscape. On the one hand, these methods have a large margin of error, while on the other their production times and costs are high. In response to this gap, a set of models was developed combining multi-source remote sensing data, ﬁeld data and machine learning techniques to quantitatively estimate ﬁne dead fuel load and understand its determining factors. Therefore, the objectives of the study were to: (1) estimate 1-h fuel loads using remote sensing predictors and machine learning techniques; (2) evaluate the performance of each machine learning technique compared to traditional linear regression models; (3) assess the importance of each remote sensing predictor; and (4) map the 1-h fuel load in a pilot area of the Apulia region (southern Italy). In pursuit of the above, ﬁne dead fuel load estimation was performed by the integration of ﬁeld inventory data (251 plots), Synthetic Aperture Radar (SAR, Sentinel-1), optical (Sentinel-2), and Light Detection and Ranging (LIDAR) data applying three different algorithms: Multiple Linear regression (MLR), Random Forest (RF), and Support Vector Machine (SVM). Model performances were evaluated using Root Mean Squared Error (RMSE), Mean Squared Error (MSE), the coefﬁcient of determination (R 2 ) and Pearson’s correlation coefﬁcient ( r ). The results showed that RF (RMSE: 0.09; MSE: 0.01; r : 0.71; R 2 : 0.50) had more predictive power compared to the other models, while SVM (RMSE: 0.10; MSE: 0.01; r : 0.63; R 2 : 0.39) and MLR (RMSE: 0.11; MSE: 0.01; r : 0.63; R 2 : 0.40) showed similar performances. LIDAR variables (Canopy Height Model and Canopy cover) were more important in fuel estimation than optical and radar variables. In fact, the results highlighted a positive relationship between 1-h fuel load and the presence of the tree component. Conversely, the geomorphological variables appeared to have lower predictive power. Overall, the 1-h fuel load map developed by the RF model can be a valuable tool to support decision making and can be used in regional wildﬁre risk management.


Introduction
In the last decade, Europe has experienced unprecedented and devastating wildfires across its landscapes [1][2][3]. Climate change and the continuous pressure of human activities are leading to an increase in the intensity and extent of wildfires. In 2019, more than 400,000 ha of natural areas were burned and most of the Natura 2000 sites were devastated by wildfires [4]. While such extreme events are becoming ever more significant, European management policies are keeping the focus on reactive fire suppression instead of properly and proactively addressing prevention [5]. Preventive actions for fuel load management The landscape is dominated by agriculture and characterized by the presence of olive groves and vineyards (29%). Conversely, wooded areas cover only 9.7% of the landscape and are mostly concentrated in hilly and mountainous areas which are difficult to cultivate [39]. Oak woodlands and Mediterranean maquis are the most dominant forest types in the region, while conifers (Pinus pinea L., P. halepensis Mill., and P. pinaster Aiton) are limited to coastal areas where, historically, reforestation has taken place for slope stability. The main tree species are Quercus ilex L., Q. pubescens Willd., Q. cerris L., Q. calliprinos Webb., Phyllirea spp., Ruscus aculeatus L., Pistacia lentiscus L., Asparagus acutifolius, L., Paliurus spina-christi Mill., Cistus monspeliensis L., C. incanus L., and C. salviifolius L.
According to the European Forest Fire Information System (EFFIS), Apulia is one of the Italian regions most frequently crossed by wildfires, although only 9.7% of the total area (189,086 ha) is covered with forests (INFC 2015). During the 2000 to 2019 time period, the average burnt area was slightly under 100,000 ha, with more than 10,000 recorded fire events (Civil Protection Department). The fire season coincides with the summer period. Typically, the number of wildfires slightly increases between May and June, reaching a peak in August; these fires are mainly related to human activity [40,41]. Martina Franca represents one of the main towns in the Itria Valley, located in the central-south part of Apulia. Most of the territory is covered by agricultural lands (olive groves, vineyards, and arable crops), followed second by forests and shrubland areas. In recent years, tourism has caused an expansion of residential patches (i.e., buildings, infrastructures, and educational farms) in wildland areas, which have generated urban interfaces. These areas have subjected the territory of Martina Franca to more frequent wildfire occurrences and spread. For this reason, it has been chosen as a pilot area in this study.

Field Data
The field data were collected during the 2018 and 2019 time period as part of the F.OR.MA. (Fuel models and fORest roads Maps) project in collaboration with the Civil Protection Department of the Apulia region. To assess fine dead fuel load, 251 plots were randomly selected from the Apulian land-cover layer (http://www.sit.puglia.it, accessed The landscape is dominated by agriculture and characterized by the presence of olive groves and vineyards (29%). Conversely, wooded areas cover only 9.7% of the landscape and are mostly concentrated in hilly and mountainous areas which are difficult to cultivate [39]. Oak woodlands and Mediterranean maquis are the most dominant forest types in the region, while conifers (Pinus pinea L., P. halepensis Mill., and P. pinaster Aiton) are limited to coastal areas where, historically, reforestation has taken place for slope stability. The main tree species are Quercus ilex L., Q. pubescens Willd., Q. cerris L., Q. calliprinos Webb., Phyllirea spp., Ruscus aculeatus L., Pistacia lentiscus L., Asparagus acutifolius, L., Paliurus spina-christi Mill., Cistus monspeliensis L., C. incanus L., and C. salviifolius L.
According to the European Forest Fire Information System (EFFIS), Apulia is one of the Italian regions most frequently crossed by wildfires, although only 9.7% of the total area (189,086 ha) is covered with forests (INFC 2015). During the 2000 to 2019 time period, the average burnt area was slightly under 100,000 ha, with more than 10,000 recorded fire events (Civil Protection Department). The fire season coincides with the summer period. Typically, the number of wildfires slightly increases between May and June, reaching a peak in August; these fires are mainly related to human activity [40,41]. Martina Franca represents one of the main towns in the Itria Valley, located in the central-south part of Apulia. Most of the territory is covered by agricultural lands (olive groves, vineyards, and arable crops), followed second by forests and shrubland areas. In recent years, tourism has caused an expansion of residential patches (i.e., buildings, infrastructures, and educational farms) in wildland areas, which have generated urban interfaces. These areas have subjected the territory of Martina Franca to more frequent wildfire occurrences and spread. For this reason, it has been chosen as a pilot area in this study.

Field Data
The field data were collected during the 2018 and 2019 time period as part of the F.OR.MA. (Fuel models and fORest roads Maps) project in collaboration with the Civil Protection Department of the Apulia region. To assess fine dead fuel load, 251 plots were randomly selected from the Apulian land-cover layer (http://www.sit.puglia.it, accessed on 23 February 2021) provided by the regional government. The field sampling protocol is illustrated in Figure 2. on 23 February 2021) provided by the regional government. The field sampling protocol is illustrated in Figure 2. Sampling of the fine dead fuel load was conducted according to Elia et al. [18] and the method developed by Brown et al. [42]. During the survey, circular plots with a radius of 5 m each were identified by geographical coordinates using a GPS receiver. Within each plot, a transect measuring 15 m was determined. If the piece of wood intersected the transect more than once, each intersection was measured. For the purposes of the F.OR.MA. project, dead woody materials were divided into three time-lag classes (1-h, 10-h, and 100h) according to the diameter measured by the caliper at the point of intersection with the transect. In this study, the 1-h fuel load was chosen as the response variable since it represents the primary carrier of wildfire occurrence [43]. The 1-h time-lag fuels consist of needles, leaves, and small twigs with a diameter between 0 and 0.65 cm. The small size of this fuel makes its moisture content more responsive to changes in atmospheric conditions, affecting its ignitability [44,45]. Its rapid flammability makes it prone to fire ignition, especially in the hot and dry season.

Sentinel-1 Data
Sentinel-1 is a space mission of the European Space Agency (ESA) consisting of a constellation of two satellites: S1-A and S1-B. On the satellites, a Synthetic Aperture Radar (SAR) sensor is mounted, which provides C-band (5.405 GHz) images under all weather conditions. For the period from April to September 2019, the "COPERNICUS/S1_GRD" collection was downloaded from the Google Earth Engine (GEE) platform (https://earthengine.google.com/platform/, accessed on 23 February 2021) using five sub-regions of Apulia as extracting masks ( Figure S1, see Supplementary Material). The 164 S1 images were acquired in the Interferometric Wide Swath (IW) mode with two polarizations, VH (Vertical transmit-Horizontal receive) and VV (Vertical transmit-Vertical receive), at 10-m resolution with ascending orbit (Table 1). GEE already provided pre-processed images (https://developers.google.com/earth-engine/guides/sentinel1, accessed on 23 February 2021). Sampling of the fine dead fuel load was conducted according to Elia et al. [18] and the method developed by Brown et al. [42]. During the survey, circular plots with a radius of 5 m each were identified by geographical coordinates using a GPS receiver. Within each plot, a transect measuring 15 m was determined. If the piece of wood intersected the transect more than once, each intersection was measured. For the purposes of the F.OR.MA. project, dead woody materials were divided into three time-lag classes (1-h, 10-h, and 100-h) according to the diameter measured by the caliper at the point of intersection with the transect. In this study, the 1-h fuel load was chosen as the response variable since it represents the primary carrier of wildfire occurrence [43]. The 1-h time-lag fuels consist of needles, leaves, and small twigs with a diameter between 0 and 0.65 cm. The small size of this fuel makes its moisture content more responsive to changes in atmospheric conditions, affecting its ignitability [44,45]. Its rapid flammability makes it prone to fire ignition, especially in the hot and dry season.

Sentinel-1 Data
Sentinel-1 is a space mission of the European Space Agency (ESA) consisting of a constellation of two satellites: S1-A and S1-B. On the satellites, a Synthetic Aperture Radar (SAR) sensor is mounted, which provides C-band (5.405 GHz) images under all weather conditions. For the period from April to September 2019, the "COPERNI-CUS/S1_GRD" collection was downloaded from the Google Earth Engine (GEE) platform (https://earthengine.google.com/platform/, accessed on 23 February 2021) using five sub-regions of Apulia as extracting masks ( Figure S1, see Supplementary Material). The 164 S1 images were acquired in the Interferometric Wide Swath (IW) mode with two polarizations, VH (Vertical transmit-Horizontal receive) and VV (Vertical transmit-Vertical receive), at 10-m resolution with ascending orbit (Table 1). GEE already provided pre-processed images (https://developers.google.com/earth-engine/guides/sentinel1, accessed on 23 February 2021). The pre-processing steps include applying the orbit file, GRD border noise removal, thermal noise removal, radiometric calibration, and terrain correction using Shuttle Radar Topography Mission (SRTM) at 30 m, or the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Digital Elevation Model (DEM) for high latitudes. The backscatter coefficient (σ • ) is converted to decibels (dB) using Equation (1): where σ • is backscatter for a specific pixel, σ • (dB) is the backscattering coefficient or normalized radar cross-section measured in dB units with values ranging from +5 dB to −40 dB. We generated the VH and VV time series by calculating the median values, then extracted the mean value of VH and VV polarizations within the plots.

Sentinel-2 Data
The Sentinel-2 mission, developed by ESA in 2015, is based on a two-satellite constellation (S2-A and S2-B) that orbits in space for Earth observation. Each satellite is equipped with a multispectral sensor that acquires images in 13 bands in the visible, near infrared and short-wave infrared. In GEE, the "COPERNICUS/S2_SR" collection was used to select 218 images acquired from April to September 2019. The images in the collection are radiometrically and atmospherically corrected products representing surface reflectance. The entire dataset was further processed using the QA60 and SLC bands to mask out cirrus, clouds, and shadows. Finally, the Normalized Difference Vegetation Index (NDVI) [46], Normalized Difference Water Index (NDWI) [47], and Normalized Difference Moisture Index (NDMI) [48] were calculated using Equations (2)-(4): where b 3 , b 4 , b 8 , and b 11 are green, red, near-infrared, and short-wave infrared bands centered at approximately 560, 665, 842, and 1610, respectively. We generated the time series of each vegetation index by calculating the median values and subsequently extracted the mean value of each index within each field plot in the Geographic Information System (GIS).

LIDAR Data
The LIDAR (Light Detection and Ranging) sensor was used in the study to measure variables related to the height and degree of canopy coverage within the field plots. LIDAR data were acquired from two aerial survey missions that took place between 2015 and 2019. The average point density was 4 points/m 2 with a relative height accuracy of 10 cm. The accuracy of the point clouds was verified using previously surveyed ground control points. For each aerial survey mission, the point clouds were filtered and classified into ground and non-ground (building and vegetation) using TerraScan software (https: //terrasolid.com/products/terrascan/, accessed on 25 February 2021). Following the method by Giannico et al. [49], a Digital Terrain Model (DTM) (1 × 1 m 2 ) was generated by interpolating the ground points through Inverse Distance Weighting interpolation. The Digital Surface Model (DSM) was obtained by selecting the maximum height of vegetation points for each 1 × 1 m 2 pixel after filtering for outliers. Subsequently, the Canopy Height Model (CHM) was calculated as follows: where CHM is the distance between the ground and the top of the vegetation, DSM is the Digital Surface Model and DTM is the Digital Terrain Model. Moreover, canopy and vegetation cover were subsequently extracted from CHM. We defined canopy cover as the percentage of vegetation cover with a height greater than 2 m, and vegetation cover as the percentage of the entire vegetation. Within each plot, the average and 0.95th percentile of CHM, canopy cover and vegetation cover were extracted.

Geomorphological and Land-Cover Data
Elevation data were obtained from the 8-m DEM of the Apulia region, downloaded from the Territorial Information System platform (http://www.sit.puglia.it, accessed on 23 February 2021). The slope (degree) and easterness (i.e., sin of the aspect) raster were produced from the DEM. The mean value of each geomorphological variable was extracted using field plots in GIS. For each plot, the information concerning the land-cover was derived from the land-use map (2011) provided by the regional government. Nine landcover classes were extracted using the field plots ( Figure 3). A total of 107 field plots were counted in forest areas, 56 of which are in deciduous forests, 39 in coniferous forests, and 12 plots in mixed forests. Shrublands, consisting of sclerophyllous vegetation, bushes, and shrubs, were divided into a total of 71 plots. Lastly, 45 plots were classified as meadow and pasture areas, with or without the presence of trees, while the remaining 28 plots fell into areas classified as natural recolonization and reforestation areas.
where CHM is the distance between the ground and the top of the vegetation, DSM is the Digital Surface Model and DTM is the Digital Terrain Model. Moreover, canopy and vegetation cover were subsequently extracted from CHM. We defined canopy cover as the percentage of vegetation cover with a height greater than 2 m, and vegetation cover as the percentage of the entire vegetation. Within each plot, the average and 0.95th percentile of CHM, canopy cover and vegetation cover were extracted.

Geomorphological and Land-Cover Data
Elevation data were obtained from the 8-m DEM of the Apulia region, downloaded from the Territorial Information System platform (http://www.sit.puglia.it, accessed on 23 February 2021). The slope (degree) and easterness (i.e., sin of the aspect) raster were produced from the DEM. The mean value of each geomorphological variable was extracted using field plots in GIS. For each plot, the information concerning the land-cover was derived from the land-use map (2011) provided by the regional government. Nine landcover classes were extracted using the field plots ( Figure 3). A total of 107 field plots were counted in forest areas, 56 of which are in deciduous forests, 39 in coniferous forests, and 12 plots in mixed forests. Shrublands, consisting of sclerophyllous vegetation, bushes, and shrubs, were divided into a total of 71 plots. Lastly, 45 plots were classified as meadow and pasture areas, with or without the presence of trees, while the remaining 28 plots fell into areas classified as natural recolonization and reforestation areas.

Pre-Processing and Feature Selection
A flowchart providing information about pre-processing, modeling, and model performances is illustrated in Figure 4. Before applying the models, Pearson's coefficients were estimated to determine the correlation among the variables, and data cleaning was performed by identifying and removing outliers. Variables with a Pearson's coefficient

Pre-Processing and Feature Selection
A flowchart providing information about pre-processing, modeling, and model performances is illustrated in Figure 4. Before applying the models, Pearson's coefficients were estimated to determine the correlation among the variables, and data cleaning was performed by identifying and removing outliers. Variables with a Pearson's coefficient greater than ±0.90 were excluded from the analysis. Subsequently, each model was calibrated using the training set (70% of data), while the goodness of fit was evaluated using the test set (30% of data). The training and test sets were both normalized using the caret package [50] in R.
Remote Sens. 2021, 13, 1658 7 of 17 greater than ±0.90 were excluded from the analysis. Subsequently, each model was calibrated using the training set (70% of data), while the goodness of fit was evaluated using the test set (30% of data). The training and test sets were both normalized using the caret package [50] in R.

Machine Learning Models
In this study, three models were adopted to estimate fine dead fuel load: Multiple Linear Regression (MLR), Support Vector Regression (SVM), and Random Forest (RF). In recent years, machine learning applications in wildfire science have increased. In particular, RF and SVM, two supervised machine learning algorithms, have been widely and extensively used for both classification and regression problems [32]. RF is an ensemble model of decision trees, usually trained with the "bagging" method [51]. In this study, the RF model was created using the randomForest() function of the randomForest package [51]. To tune the RF algorithm, the "ntree" and "mtry" parameters have been set. The "mtry" parameter is the number of variables randomly sampled as candidates at each split, while the "ntree" parameter is the number of trees to be grown. For the regression problem, the default value for "mtry" is calculated by p/3, where p is the number of independent variables of the database. In our study, the "mtry" number was set to 2, whereas for "ntree" the default value of 500 was used. SVM is a related kernel-based method that finds the optimal hyperplane for dividing data in space [52]. The SVM model was trained using the svm() function of the e1071 package [53]. The kernel used in training and predicting is radial, and the regression function used in the model is the eps-regression default

Machine Learning Models
In this study, three models were adopted to estimate fine dead fuel load: Multiple Linear Regression (MLR), Support Vector Regression (SVM), and Random Forest (RF). In recent years, machine learning applications in wildfire science have increased. In particular, RF and SVM, two supervised machine learning algorithms, have been widely and extensively used for both classification and regression problems [32]. RF is an ensemble model of decision trees, usually trained with the "bagging" method [51]. In this study, the RF model was created using the randomForest() function of the randomForest package [51]. To tune the RF algorithm, the "ntree" and "mtry" parameters have been set. The "mtry" parameter is the number of variables randomly sampled as candidates at each split, while the "ntree" parameter is the number of trees to be grown. For the regression problem, the default value for "mtry" is calculated by p/3, where p is the number of independent variables of the database. In our study, the "mtry" number was set to 2, whereas for "ntree" the default value of 500 was used. SVM is a related kernel-based method that finds the optimal hyperplane for dividing data in space [52]. The SVM model was trained using the svm() function of the e1071 package [53]. The kernel used in training and predicting is radial, and the regression function used in the model is the eps-regression default function.
To fit the MLR model, the lm() function of a stats package was used [54]. All models have been developed with R software [55].

Model Performances and Assessment of Variable Importance
To compare the performances of the MLR, SVM and RF models, the Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Pearson's correlation coefficient (r) and coefficient of determination (R 2 ) were calculated as follows: where P i is the predicted value, A i is the observed value, P i is the mean of predicted values, and A i is the mean of observed values. To estimate and map the fine dead fuel load, the model with the smallest RMSE and highest r square was chosen as the most fitting.
To assess the stability of the best model, 10-fold cross-validation was performed. This procedure consists of running the model 10 times on random subsamples of training and test sets and calculating the metrics of each subsample. Lastly, variable importance was calculated according to the best model, which was applied to a pilot area of the Apulia region to produce a predicted 1-h fuel load map.

Preliminary Results
The correlation coefficients of the independent variables are shown in Figure S2 (see Supplementary Materials). Four variables (VV, NDWI_Gao, NDWI_McFeeters, CHM_mean) were removed from the database before modeling. Table 2 presents 1-h fuel load statistics for the field plots divided by land-cover classes. The 1-h fuel load ranged from a low of 0 to 4.10 t/ha, with an average value of 0.63 t/ha. Deciduous and mixed forest vegetation recorded the highest mean values of fine dead fuel load (1.13 and 0.92 t/ha, respectively), while the lowest values were found in meadows and pastures with (0.29 t/ha) or without (0.14 t/ha) the presence of trees. High standard error (SE) and standard deviation (SD) values within the mixed forest plots indicated a high variability in fine dead fuel load values, probably related to the number of plots in this land-cover category. From a total of 251 plots, 7 plots were removed as outliers.
Remote Sens. 2021, 13, 1658 9 of 17 A total of 171 were randomly selected as a training set while the remaining 73 plots were used as a test set. Table 3 summarizes the performance metrics of each model to estimate the 1-h fuel load (t/ha) in the field plots. Based on the performance metrics, RF showed better performances in fine dead fuel load estimation compared to the MLR and SVM models. No substantial differences in prediction accuracy were observed between the MLR and SVM models. For RF, Pearson's correlation coefficient (r) was 0.71 and the R 2 value was 0.50, while for MLR, both values were 0.63 and 0.40, respectively. The highest RMSE value (0.11) was observed in the MLR model, while the lowest value (0.09) was found in the RF model. The cross-validations confirmed the robustness of the RF model ( Table 4). The results calculated for 10 subsamples of training and test sets showed similar metrics values, demonstrating no overfitting. Thus, the RF model was chosen to predict the 1-h fuel load in a pilot area of Apulia.

Variable Importance
The importance of each variable was assessed using two indicators provided by the RF model: mean decrease accuracy (%IncMSE) and mean decrease in node impurity (IncN-odePurity) [56,57]. The first indicator measures the decrease in accuracy and, consequently, the percentage increase in MSE when a variable is excluded from the model. A variable with high percentage increases in MSE has more predictive power than others, because when it is removed from the model the accuracy in estimating 1-h fuel load decreases. The second indicator measures the decrease in Gini impurity when a variable is chosen to split a node. The higher the value of mean increase in node purity, the greater the importance of the variable in the model ( Figure 5). because when it is removed from the model the accuracy in estimating 1-h fuel load decreases. The second indicator measures the decrease in Gini impurity when a variable is chosen to split a node. The higher the value of mean increase in node purity, the greater the importance of the variable in the model ( Figure 5). The most important variables influencing 1-h fuel load estimation were the 95th percentile of CHM (CHM_maxq) and canopy cover (Canopy_cover) values, increasing the MSE by more than 10% (15.71% and 11.35%, respectively); both variables were randomly permuted ( Figure 5A). NDVI and the other variables related to geomorphological data (Slope and Elevation) showed low importance values (<5%), while VH and Easterness were considered less important for the RF model. The importance of the CHM_maxq and Canopy_cover variables, followed by Vegetation_cover, was also confirmed by the mean decrease in Gini impurity, which indicated that these variables have the highest values of node purity and the highest degree of importance ( Figure 5B).

Fine Dead Fuel Map
According to performance metrics, the best model (lowest RMSE and highest R 2 ) was used to estimate and map 1-h fuel load in a pilot area of the Apulia region. Figure 6 shows the 1-h fuel load (t/ha) map estimated by the RF model in the administrative area of Martina Franca. The 1-h fuel load values ranged from a low of 0.01 t/ha to a high of 0.53 t/ha. The most important variables influencing 1-h fuel load estimation were the 95th percentile of CHM (CHM_maxq) and canopy cover (Canopy_cover) values, increasing the MSE by more than 10% (15.71% and 11.35%, respectively); both variables were randomly permuted ( Figure 5A). NDVI and the other variables related to geomorphological data (Slope and Elevation) showed low importance values (<5%), while VH and Easterness were considered less important for the RF model. The importance of the CHM_maxq and Canopy_cover variables, followed by Vegetation_cover, was also confirmed by the mean decrease in Gini impurity, which indicated that these variables have the highest values of node purity and the highest degree of importance ( Figure 5B).

Fine Dead Fuel Map
According to performance metrics, the best model (lowest RMSE and highest R 2 ) was used to estimate and map 1-h fuel load in a pilot area of the Apulia region. Figure 6 shows the 1-h fuel load (t/ha) map estimated by the RF model in the administrative area of Martina Franca. The 1-h fuel load values ranged from a low of 0.01 t/ha to a high of 0.53 t/ha. The highest 1-h fuel load predicted (0.26-0.53 t/ha) was recorded in the south-west part of the pilot area where deciduous forests can be found. Likewise, the lowest estimated 1-h fuel load (0.01-0.14 t/ha) was found in the south-east portion of the pilot area, mostly covered by meadows and pastures. Table 5 represents the distribution of each land-cover category in terms of surface area (ha) within the 1-h fuel load classes as estimated by the RF model. Among the land-cover classes, the deciduous forest (6171 ha) and sclerophyllous vegetation (803.55 ha) areas covered most of the entire pilot area, whereas mixed woods (22.27 ha), coniferous woods (67.62 ha), and natural recolonization areas (11.03 ha) were less representative. As the number of trees increased, the 1-h fuel load increased as well. More than 85% of the pastures and meadows (with or without trees) and natural recolonization areas recorded 1-h fuel load values between 0.01 t/ha and 0.14 t/ha and rarely exceeded 0.18 t/ha. The 1-h fuel load values for deciduous forests were highly variable and approximately between 0.14 t/ha and 0.53 t/ha. However, the highest 1-h fuel load values (0.26-0.53 t/ha) were found almost exclusively in deciduous forests. The components of the Mediterranean scrub (brush, shrubs, and sclerophyllous area) produced a value between 0.01 t/ha and 0.18 t/ha compared with mixed and coniferous woods with values between 0.18 t/ha and 0.26 t/ha. The highest 1-h fuel load predicted (0.26-0.53 t/ha) was recorded in the south-west part of the pilot area where deciduous forests can be found. Likewise, the lowest estimated 1-h fuel load (0.01-0.14 t/ha) was found in the south-east portion of the pilot area, mostly covered by meadows and pastures. Table 5 represents the distribution of each land-cover category in terms of surface area (ha) within the 1-h fuel load classes as estimated by the RF model. Among the land-cover classes, the deciduous forest (6171 ha) and sclerophyllous vegetation (803.55 ha) areas covered most of the entire pilot area, whereas mixed woods (22.27 ha), coniferous woods (67.62 ha), and natural recolonization areas (11.03 ha) were less representative. As the number of trees increased, the 1-h fuel load increased as well. More than 85% of the pastures and meadows (with or without trees) and natural recolonization areas recorded 1-h fuel load values between 0.01 t/ha and 0.14 t/ha and rarely exceeded 0.18 t/ha. The 1-h fuel load values for deciduous forests were highly variable and approximately between 0.14 t/ha and 0.53 t/ha. However, the highest 1-h fuel load values (0.26-0.53 t/ha) were found almost exclusively in deciduous forests. The components of the Mediterranean scrub (brush, shrubs, and sclerophyllous area) produced a value between 0.01 t/ha and 0.18 t/ha compared with mixed and coniferous woods with values between 0.18 t/ha and 0.26 t/ha.

Discussion
Wildfires in Europe are becoming increasingly extreme and, as a result, fire management must dedicate more effort to improve the prevention phase [58]. An effective way to start managing wildfire risk is to characterize and estimate fuel loads by adopting multiple and novel approaches [40]. This study intended to corroborate the knowledge on fuel load estimations and to understand which remote sensing variables are the most important in predicting them. Although previous studies have attempted to describe and map the properties of fuel load types [59][60][61], our study is the first to integrate multiple remote sensing predictors and several machine learning techniques in a Mediterranean landscape of Southern Europe.
In this study, the 1-h fuel load estimation was conducted by comparing machine learning techniques (RF and SVM) with the MLR model. Based on performance metrics, we found that the RF model showed greater predictive power in estimating the response variable compared to the other models. The prediction error and correlation between the predicted and observed 1-h fuel load values (i.e., 0.09 and 0.71, respectively) suggest that the RF model has great potential in estimating fine dead fuel loads. These results are in line with those of Pierce et al. [25], who used the RF algorithm integrating Landsat 5 imagery, field data, and topographic factors for modeling and mapping forest canopy fuels in Lassen Volcanic National Park, California, USA. Their RF model showed that the remote sensing variables used to predict canopy fuel characteristics had pseudo-R 2 values ranging from 0.55 to 0.68. Conversely, our results are less consistent with the findings of Arellano-Perez et al. [62], who employed Sentinel-2 data and two machine learning approaches (RF and multivariate adaptive regression splines) to estimate the surface fuel load of pinewood in northwestern Spain. Their results showed a weak model performance with R 2 ranging between 0.02 and 0.12, probably due to the fact that the authors only used the products deriving from Sentinel-2A images as remote sensing variables instead of a multi-source approach like ours.
Compared to the RF adopted in our study, the MLR model demonstrated a weaker predictive power. The prediction error and correlation between the predicted and observed 1-h fuel load values were 0.11 and 0.63, respectively. Other authors have confirmed the high performances of RF compared to MLR in similar studies [28]. For example, López-Serrano et al. [61] compared three machine learning techniques with MLR for above-ground biomass estimation in the Sierra Madre Occidental, in Mexico. Their results proved that RF and SVM techniques yielded the best results compared to MLR. Moreover, Castillo et al. [63] found that the use of machine learning algorithms in estimating the above-ground biomass of mangrove forests gave better results compared to linear regression.
In our study, the remote sensing variables did not have the same weight in terms of importance in the RF model. Indeed, the LIDAR variables showed a higher predictive power than the NDVI and VH polarization. CHM followed by canopy and vegetation cover are the most important variables in estimating 1-h fuel load. This is most likely related to the fact that if the degree of coverage and the height of the canopy increase, then the biomass in terms of leaves, twigs, and small branches (e.g., fine and coarse dead fuel loads) increases as well [64][65][66]. These results are in line with those of Chen et al. [67], who applied multiple regression analysis using airborne and terrestrial LIDAR for surface fuel load estimations in the open eucalyptus forests of Australia. The developed models presented R 2 values up to 0.89. Moreover, Lopes Queiroz et al. [31] used multispectral LIDAR data to apply generalized additive models to coarse woody debris load estimations, yielding a high accuracy of R 2 = 0.72.
Contrary to LIDAR variables, the NDVI does not provide information on vegetation structure but qualitatively describes the photosynthetic and phytosanitary activity of plants [68,69]. Furthermore, NDVI is not able to detect the undergrowth, especially when canopy layers are dense and stratified. For these reasons, results show that NDVI and the land-cover variable have the same weight in terms of importance within the RF model. The results also suggest that VH polarization had less importance in fuel estimation. Based on the findings of similar studies, a more plausible explanation is that the Sentinel-1 satellite radar images have a short wavelength and consequently low canopy penetration [70]. For example, Kumar et al. [71] found an insignificant correlation between VH polarization acquired by Sentinel-1A and above-ground biomass estimated in the field. Conversely, a significant correlation between VH polarization acquired by ALOS PALSAR and biomass was found by the authors due to the fact that the L-Band, compared to the C-band, of Sentinel 1-A is more sensitive to backscatter values and has a greater penetration of the canopy.
The geomorphological variables (Slope, Elevation, and Easterness) showed lesser importance in estimating the 1-h fuel load, probably due to their relationship to the landscape topography rather than to vegetation. According to D'Este et al. [41], fire ignition probability is high at an altitude of about 500 m, which is the mean value of elevation in Apulia. This confirms the poor predictive power of these variables in discriminating the fire occurrence and the fine dead fuel production. Numerous studies have shown that geomorphological variables have a greater influence on the likelihood of fire occurrence, ignition and spread [72,73]. For example, Elia et al. [74] found that elevation and slope significantly influence the probability of wildfire occurrence in Italy. However, there is no evidence to support the fact that geomorphological variables influence the production of fuel load.
Our results may be applied regardless of the specific landscape context since our study has suggested a method to understand where the fine dead fuel is more likely to accumulate. According to the map in Figure 6, the areas associated with a high quantity of 1-h fuel load were found in forests, while this quantity radically decreased in meadow and pasture areas. This is most likely related to the fact that if the ecosystem complexity increases, in terms of biomass and vertical structure, the 1-h fuel load also increases and, hence, the likelihood of ignition. In this regard, quantitatively estimating the fuel load and knowing its spatial distribution could have great practical significance, thus allowing the prioritization of silvicultural interventions within fire management and mitigation plans [75,76]. In a context where resources are often limited and local and national authorities focus their efforts solely on fire suppression [77], our approach could create the basis for planning new preventive policies and budgeting financial resources. Although the RF model has shown a reliable performance, our study presents some limitations. For example, LIDAR is a valuable tool for estimating 1-h fuel load, but not all areas are covered by this data source, as it is extremely expensive to run. Another limitation of this study was the use of the C-band images of Sentinel-1, which are less sensitive to fine dead fuel load compared to the L-band images of the ALOS PALSAR satellite. However, the L-band images of the ALOS PALSAR satellite were only available until 2018. Future studies are needed to improve the knowledge on which predictors determine and influence the distribution of fine dead fuel. For example, in fuel load estimation models, future research should include new variables such as those related to climate or landscape heterogeneity [78]. According to Pickering et al. [79], if the moisture content of the fine dead fuel is high (>16%), fires are unlikely to ignite and burn, while if this content is less than 5-7%, the fire could ignite rapidly. Humidity, temperature, and wind speed play a fundamental role in influencing the ignition of fuel loads and, consequently, fire ignition. Therefore, the introduction of new variables could significantly contribute to building fire prediction models.

Conclusions
Fine dead fuel load (i.e., 1-h fuel load) estimation was conducted using multi-source remote sensing data and machine learning techniques in a Mediterranean landscape of Southern Europe. The RF, SVM, and MLR models were employed to predict 1-h fuel load and to understand which remote sensing variables are most important to determine it. Based on performance metrics, the RF model was the most accurate in estimating 1-h fuel load compared to the traditional MLR model that showed a weaker predictive power. The SVM model showed similar performances to the MLR model. The results showed that the variables providing information on vegetation structure (i.e., Canopy Height Model, Canopy cover) are the most important in fuel estimation, thus underscoring the possibility of using these remote sensing data as a proxy of fine dead fuel load. Therefore, this methodology could be very useful for managers and decision makers, since it would allow the regular estimation of fuel loads, especially the fine dead fuel load. Future research should investigate the importance of climatic variables in fine dead fuel estimation modeling and understand how these affect fire ignition.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13091658/s1, Figure S1: Five sub-regions of the Apulia region used as masks to extract radar images., Figure S2