Open Data and Machine Learning to Model the Occurrence of Fire in the Ecoregion of “Llanos Colombo–Venezolanos”

A fire probability map is an important tool for landscape management, providing better identification of areas prone to fires and helping optimize the allocation of limited resources for fire prevention, control, and management. In this study, the random forest machine learning algorithm was applied to model the probability of fire occurrence in the Colombian-Venezuelan plains (llanos) ecoregion in South America. Information on burned areas was collected using Moderate Resolution Imaging Spectroradiometer (MODIS) Product MCD64A1 for the period 2015–2019. We also used spatial information of related factors that were grouped into four levels of information: topography, human presence, vegetation, and climate-related variables. The model had an accuracy of 94%, which indicates the performance of the model was excellent. The cartography generated from the model can be used as base information in the context of fire management in the region, to identify areas for prioritizing efforts and attention. The probability of occurrence zoning results indicates that the very low category covers the largest area (28.2%), followed by low (23.2%), very high (17.6%), moderate (17.2%), and high (13.8%).


Introduction
Fire is an integral part of the natural history of ecosystems and is considered a natural force that has influenced the evolution and development of species, ecosystems, and landscapes at a global level [1]. However, uncontrollable fires (wildfires) often represent a major threat to public safety, infrastructure, biodiversity, and forest resources [2]. Each year billions of dollars are spent on fire control, which ultimately aims to mitigate or prevent the negative effects of wildfires [3]. It is estimated that 420 Mha of land are burned each year globally [4], mainly in savannahs and grasslands [3]. Climate change, reduced rainfall, increased temperature, prolonged dry seasons, and the impact of human activity have increased the potential for forest fires in many regions of the world [5,6] and there is evidence of an increase in the frequency, size, and severity of forest fires, in addition to a consequent increase in the costs associated with controlling them [7].
Mapping the probability of fire occurrence allows identification of those areas in which fires are more likely to occur, regardless of time or moment, which is referred to by the term danger in the context of risk management [8]. The danger is ultimately an indicator (quantitative or qualitative) of the probability that an area will burn [9]. To assess this probability spatially, two aspects need to be considered: first, the location where fires have occurred in the past (burned area) and the factors that facilitate the presence and spread of these fires [10].

Fire Database
The first stage of the modeling process involved compiling an inventory of fires in the region, because the prediction of future occurrences is based on the assumption that future fires in the same location can be predicted by analyzing data from past occurrences [24]. For historical information on fires in the study area ( Figure 2), data was collected on burned areas during the dry season (which occurs between the months of December and March [23]), using data for a period of 5 years [25]; in this case this included the last five years, that is, from 2015 to 2019 (with the exception of 2020 because this new information was used to evaluate the outcome of the zoning process resulting from the model). The product MCD64A1 (version 6) has monthly information at a global level of burned areas with a spatial resolution of 500 m [26]. The algorithm uses reflectance information from Moderate Resolution Imaging Spectroradiometer (MODIS) images (500 m) in conjunction with data from MODIS active fires (1 km) to generate the burned areas by month.

Fire Database
The first stage of the modeling process involved compiling an inventory of fires in the region, because the prediction of future occurrences is based on the assumption that future fires in the same location can be predicted by analyzing data from past occurrences [24]. For historical information on fires in the study area ( Figure 2), data was collected on burned areas during the dry season (which occurs between the months of December and March [23]), using data for a period of 5 years [25]; in this case this included the last five years, that is, from 2015 to 2019 (with the exception of 2020 because this new information was used to evaluate the outcome of the zoning process resulting from the model). The product MCD64A1 (version 6) has monthly information at a global level of burned areas with a spatial resolution of 500 m [26]. The algorithm uses reflectance information from Moderate Resolution Imaging Spectroradiometer (MODIS) images (500 m) in conjunction with data from MODIS active fires (1 km) to generate the burned areas by month.

Data Collection and Pre-Processing
The first stage of the modeling process involved compiling an inventory of fires in the region, because the prediction of future occurrences is based on the assumption that future fires in the same location can be predicted by analyzing data from past occurrences [24]. For historical information on fires in the study area ( Figure 2), data was collected on burned areas during the dry season (which occurs between the months of December and March [23]), using data for a period of 5 years [25]; in this case this included the last five years, that is, from 2015 to 2019 (with the exception of 2020 because this new information was used to evaluate the outcome of the zoning process resulting from the model). The product MCD64A1 (version 6) has monthly information at a global level of burned areas with a spatial resolution of 500 m [26]. The algorithm uses reflectance information from Moderate Resolution Imaging Spectroradiometer (MODIS) images (500 m) in conjunction with data from MODIS active fires (1 km) to generate the burned areas by month. The quantitative evaluation of the probability of the occurrence of fires took into account (in addition to the location of past fires) geo-environmental and anthropogenic predisposing factors [10].

Factors Relating to the Occurrence of Fires
The quantitative evaluation of the probability of the occurrence of fires took into account (in addition to the location of past fires) geo-environmental and anthropogenic predisposing factors [10]. In this work 14 variables were preliminary identified in relation to four types of factors [17]: topographic, presence of human activities, vegetation, and climate, which were all identified and selected based on the availability of information sources. The correct (or incorrect) selection of factors is reflected in the evaluation of the prediction of the resulting models [27].
In relation to topographic variables, the 30 m Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) was used [28], from which information was extracted for the elevation variable and its derivatives (slope and aspect or direction of the slope). To establish the anthropic presence in the ecoregion, the layers of two entities that allow access and mobility were used: land routes (Global Roads Inventory [29]) and river network (HydroSHEDS: Hydrological Data and Maps Based on Shuttle Elevation Derivatives at Multiple Scales [30]). In addition, we used the CSP gHM Global Human Modification data set that provides a measure of the intensity of human modification on the landscape based on five types of stressors: human settlements, agriculture, transportation, mining/electricity production, and electrical infrastructure [31].
Vegetation indices are used in the field of remote sensing to provide a quantitative and qualitative approximation of the vegetation cover using spectral measurements [32]. Using these indices provides an indication of the state of the vegetation or its moisture content. The information corresponding to the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) indices was derived from the MOD13A1.006 data set that provides the value of the vegetation indices at the pixel level (500 m), which is calculated from information on surface reflectance, after masking water, clouds, cloud shadows, and heavy aerosols [33]. The Normalized Difference Water Index (NDWI) and Visual Atmospheric Resistance Index (VARI) indices were obtained from Landsat 8 image processing. The corresponding equations of the four above-mentioned indices are included in Table 1.  [38] GREEN − RED GREEN + RED − BLUE For information on climate variables, we used the WorldClim dataset (version 2.0), which contains monthly climate data for the 1970-2000 period, with high spatial resolution (approximately 1 km 2 ) [39]. The information used from this data set was average temperature ( • C) precipitation (mm), solar radiation (kJ m 2 /day), and wind speed (m s −1 ) for the months comprising the dry season (December, January, February, and March).

Preprocessing
Google Earth Engine (GEE) is a cloud-based platform that facilitates access to high-performance computing resources to process large volumes of geospatial information without being limited to the characteristics of a local machine [40]. GEE processing services were accessed via R Studio Environment using the R package rgee [41] to collect and pre-process information for factors relating to the probability of occurrence. Some base information layers were downloaded from external sources (weather and roads), but these were also incorporated into the GEE platform to unify the location and availability of the information layers.
For the specific calculation of the slope and aspect variables, the elevation layer and the respective GEE methods were used as base information. In the case of land access roads and rivers, distance rasters were constructed.
For the NDVI and EVI, the average value of these indexes was calculated for the months of December, January, February, and March of the years 2015 to 2019, from the product MOD13A1.006. For the calculation of the NDWI and VARI indices, compositions of Landsat 8 images were constructed (product LANDSAT/LC08/C01/T1_SR) using a cloud masking algorithm for all available images for the study area (1048 images) and averaging the values to derive a single image that represented the average reflectance values of the dry season (December to March) for the years 2015 to 2019. Then, the corresponding equations were applied to calculate the indices, selecting the appropriate bands according to the Landsat 8 satellite.
For the climatic variables (four images for each variable, corresponding to the months of December, January, February, and March) a single image was constructed for each variable, which represented the average value for the dry season.
All information layers were resampled at 500 m in the projected world Mercator coordinate system (EPSG:3395) and cut by the limit of the study area. Additionally, a pair of masks (water surfaces and urban centers) were applied to the information layers so that the pixels corresponding to these two categories did not represent information (no data). The water surface information was obtained from the Joint Research Centre (JRC) Yearly Water Classification History data set, version 1, which provides information on the location and distribution of water bodies globally [42]. In the case of urban centers, the 2018 cover layer of data set MCD12Q1.006 was reclassified, which refers to land cover resulting from the supervised classification of MODIS [43]. The initial list of variables is presented in Table 2. At this point, all geographic information layers, which are GEE-type objects, were converted to R-type objects for further modeling using the R language (R version 3.5.3).

Variable Selection
When applying machine learning algorithms it is necessary to make a careful selection of variables because, not only the results in the prediction process from classification models are improved, but also the computation costs for the calculations are reduced and the interpretation of the input data is facilitated [44].
Multicollinearity occurs when two or more predictor variables are highly correlated. This can result in a less accurate estimate due to the effect of an independent variable on the dependent variable, compared to when the independent variables have no correlation between them [45]. To estimate multicollinearity among fire explanatory variables, the variance inflation factor (VIF) was used, which evaluates how much the variance of an estimated value of the regression coefficient increases when predictors are correlated [46]. Specifically, within the environment of R, this analysis contrasts a predictor variable against all others; if one of the variables presents a strong correlation with at least one of the other variables, its correlation coefficient will be close to one and the VIF value for this variable will be large [47]. In general, it is considered that a VIF value greater than 10 represents a significant correlation between the variables [48]. Therefore, those variables with VIF greater than 10 were excluded from the subsequent modeling process, that is, the wind variable and the NDVI and EVI indices (Table 3).
In addition to the VIF, Pearson's correlation analysis was applied to identify linear correlation relationships between pairs of variables (in the new filtered set of variables). The correlation coefficient of Pearson allows measurement of the association between variables. When a correlation is present, the change in the magnitude of a variable is associated with a change in the magnitude in another variable, either in the same direction (positive coefficient) or in the opposite direction (negative coefficient). This coefficient is scaled and takes values between −1 and 1, where 0 is equivalent to the case in which no correlation exists [49]. A correlation coefficient greater than or equal to 0.7 is considered a correlation indicator that can lead to distortion of the modeling process and affect future predictions [45]. The results suggested an important positive correlation between NDWI and VARI ( Figure 3), thus, it was decided to use only the first of these because it is an index widely used in the monitoring of fires and has been proven to have a strong correlation with their occurrence [50]. Figure 4 shows the spatial distribution of the final set of variables used in the process of spatial modeling of the probability of the occurrence of fires in the study area. Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 19  Figure 4 shows the spatial distribution of the final set of variables used in the process of spatial modeling of the probability of the occurrence of fires in the study area.

Random Forest Algorithm
The random forest is an algorithm that averages the predictive values of a large number of classification trees or individual regressions that are determined from a portion (usually 2/3) of the

Random Forest Algorithm
The random forest is an algorithm that averages the predictive values of a large number of classification trees or individual regressions that are determined from a portion (usually 2/3) of the data used for training the model; the remaining sample is used for estimating how well the model performs [51]. From the information layer of burned areas for 2015-2019, a stratified sampling was conducted, in which, for each class (burned area/non-burned area), the same number of points were taken (randomly), for a total of 500,000 points. Of these points, 70% (350,000) were used for model training and the configuration of important model parameters (ntree: number of trees and mtry: number of variables randomly sampled as candidates for each division). The main idea of the algorithm is to combine many decision trees using a series of startup data and choose explanatory variables (Table 3, Figure 5) in each node of the tree [52]. In the context of machine learning, mapping the probability of occurrence of a fire can be interpreted as a binary classification problem in which each pixel can be classified into two classes: fire or no fire [53]. The fire class corresponds to the value of 1, while the non-fire class is 0. The total number of predictions for each class (result of the prediction made by each of the decision trees that make up the random forest algorithm), normalized over the total number of predictions, allows probabilistic results to be obtained [10], which are interpreted as the probability. In the probability of fire occurrence map, the value of each pixel represents the probability that a fire will occur in the future. of 1, while the non-fire class is 0. The total number of predictions for each class (result of the prediction made by each of the decision trees that make up the random forest algorithm), normalized over the total number of predictions, allows probabilistic results to be obtained [10], which are interpreted as the probability. In the probability of fire occurrence map, the value of each pixel represents the probability that a fire will occur in the future.

Tuning Model
The most important parameters to be specified in R for the random forest algorithm are the number of decision trees (ntree) and the number of variables randomly sampled as candidates for each division (mtry) [10]. The cross-validation method (CV) consists of subdividing the data set (in this case, the training set) into 10 samples, of which 9 were used to train the model and 1 to validate it; this process was repeated 10 times. The CV method was applied using different value options of the mtry and ntree parameters using the R caret package to select the combination of values that presents the best precision, estimated by the accuracy metric, which refers to the percentage of data correctly classified [54]. The combinations of parameters were tested with respect to four levels of ntree (50, 100, 500, and 1000) and ten levels of mtry (from 1 to 10). The maximum value of ntree tested was 1000, following the recommendation of previous research to maintain stable results [51,55]. The optimal value of the parameters was mtry = 6 and ntree = 1000 ( Figure 5).

Performance Assessment
Validation is the most important component of the modeling process to ensure the results of the models have scientific relevance [56]. Of the original stratified sampling for model construction, we used the remaining 30% (150,000 points) for model validation. To calculate the importance of the variables, we used the Mean Decrease in Accuracy (MDA) [51], which is one of the metrics most used in random forest models [57]. This metric quantifies the importance of a variable by measuring the change in the prediction accuracy when the values of a certain variable are randomly changed from their original observations. Therefore, the change in precision determines the importance of a variable [55]. This measure allows ranking the variables hierarchically according to their importance within the model. The MDA metric was calculated using the importance function available in the random forest package in R [58].
The results of the fire occurrence probability model based on the random forest were verified using the value of the area under the Receiver Operating Characteristics (ROC) curve (AUC) [59]. A

Tuning Model
The most important parameters to be specified in R for the random forest algorithm are the number of decision trees (ntree) and the number of variables randomly sampled as candidates for each division (mtry) [10]. The cross-validation method (CV) consists of subdividing the data set (in this case, the training set) into 10 samples, of which 9 were used to train the model and 1 to validate it; this process was repeated 10 times. The CV method was applied using different value options of the mtry and ntree parameters using the R caret package to select the combination of values that presents the best precision, estimated by the accuracy metric, which refers to the percentage of data correctly classified [54]. The combinations of parameters were tested with respect to four levels of ntree (50, 100, 500, and 1000) and ten levels of mtry (from 1 to 10). The maximum value of ntree tested was 1000, following the recommendation of previous research to maintain stable results [51,55]. The optimal value of the parameters was mtry = 6 and ntree = 1000 ( Figure 5).

Performance Assessment
Validation is the most important component of the modeling process to ensure the results of the models have scientific relevance [56]. Of the original stratified sampling for model construction, we used the remaining 30% (150,000 points) for model validation. To calculate the importance of the variables, we used the Mean Decrease in Accuracy (MDA) [51], which is one of the metrics most used in random forest models [57]. This metric quantifies the importance of a variable by measuring the change in the prediction accuracy when the values of a certain variable are randomly changed from their original observations. Therefore, the change in precision determines the importance of a variable [55]. This measure allows ranking the variables hierarchically according to their importance within the model. The MDA metric was calculated using the importance function available in the random forest package in R [58].
The results of the fire occurrence probability model based on the random forest were verified using the value of the area under the Receiver Operating Characteristics (ROC) curve (AUC) [59]. A ROC curve plots the changes in true prediction rates against false positive prediction rates; the best possible prediction corresponds to a value AUC of the ROC curve of 1, which represents 100% sensitivity (no false negatives) and 100% specificity (no false positives) [56]. Table 4 shows the interpretation of the AUC metric values in relation to the model performance. The term sensitivity refers to the proportion of pixels that represent burned areas and are correctly classified as burned areas; conversely, specificity reflects the pixels that do not correspond to a burned area and are correctly classified as such [53]. Additionally, to test the reliability of the fire probability occurrence map produced by the implementation of the random forest algorithm, the burned area ratio was derived for the classified map and new burned area data (2020). For this purpose, the burned area information layer for the 2020 dry season was overlaid with the result of the occurrence probability zoning process to calculate the burned area presented for each of the probability zones [62].

Probability of Fire Occurrence
Finally, a prediction was made on the set of raster layers (the predictor variables associated with the occurrence of fire, Figure 4) using the trained random forest model. The new data was evaluated against all decision trees built in the random forest model, in which each tree was assigned a label (fire or no fire), and the label with the most votes was finally selected [63]. The result (in terms of probability) was an index with continuous values between 0-1 that represents the probability of fire occurrence. There are different categorization schemes (e.g., quantile, natural breaks) and each gives rise to different results [64]. In the case of natural breaks, this method is used repeatedly to classify probability indices [25], establishing grouping limits by searching for patterns that are inherent in the data [65]. The result of the probability of occurrence was reclassified into five classes using the natural breaks method.
A general workflow of the applied methodology is summarized in Figure 6.

Predictive Performance and Variable Importance
According to the results presented in Figure 7, the most important variable within the model is the NDWI index (selecting this variable shows the most important change in the accuracy of the model), followed by temperature in second place of importance, and anthropogenic modification in third place. The order of importance follows a group with very similar values in which the variables of precipitation, distance to roads, solar radiation, slope, and elevation are found. Finally, the variables of less importance in the model are the aspect and the distance to rivers.

Predictive Performance and Variable Importance
According to the results presented in Figure 7, the most important variable within the model is the NDWI index (selecting this variable shows the most important change in the accuracy of the model), followed by temperature in second place of importance, and anthropogenic modification in third place. The order of importance follows a group with very similar values in which the variables of precipitation, distance to roads, solar radiation, slope, and elevation are found. Finally, the variables of less importance in the model are the aspect and the distance to rivers.

Predictive Performance and Variable Importance
According to the results presented in Figure 7, the most important variable within the model is the NDWI index (selecting this variable shows the most important change in the accuracy of the model), followed by temperature in second place of importance, and anthropogenic modification in third place. The order of importance follows a group with very similar values in which the variables of precipitation, distance to roads, solar radiation, slope, and elevation are found. Finally, the variables of less importance in the model are the aspect and the distance to rivers. The accuracy of the fire occurrence model was evaluated using a ROC curve, a common method in the evaluation of the quality of probabilistic prediction models [25]. The result (Figure 8) shows that the AUC of the ROC curve is 0.943. The accuracy of the fire occurrence model was evaluated using a ROC curve, a common method in the evaluation of the quality of probabilistic prediction models [25]. The result (Figure 8) shows that the AUC of the ROC curve is 0.943.

Probability of Fire Occurrence Map
The result of applying the random forest model to predict the occurrence of fires is a value that expresses the probability of each pixel being burned in the future, under the assumption of a set of predisposing variables [10]. This result was reclassified using the method of natural breaks, resulting in the following classes of probability of occurrence of fires in the study area: very low probability (0-0.16), low probability (0.16-0.36), moderate probability (0.36-0.57), high probability (0.57-0.79), and very high probability (0.79-1), as shown in Figure 9.

Probability of Fire Occurrence Map
The result of applying the random forest model to predict the occurrence of fires is a value that expresses the probability of each pixel being burned in the future, under the assumption of a set of predisposing variables [10]. This result was reclassified using the method of natural breaks, resulting Remote Sens. 2020, 12, 3921 12 of 18 in the following classes of probability of occurrence of fires in the study area: very low probability (0-0.16), low probability (0.16-0.36), moderate probability (0.36-0.57), high probability (0.57-0.79), and very high probability (0.79-1), as shown in Figure 9.

Probability of Fire Occurrence Map
The result of applying the random forest model to predict the occurrence of fires is a value that expresses the probability of each pixel being burned in the future, under the assumption of a set of predisposing variables [10]. This result was reclassified using the method of natural breaks, resulting in the following classes of probability of occurrence of fires in the study area: very low probability (0-0.16), low probability (0.16-0.36), moderate probability (0.36-0.57), high probability (0.57-0.79), and very high probability (0.79-1), as shown in Figure 9.   Table 5 shows the results of the area corresponding to each zoning level. The very low category covers the largest area (28.2%, 103,982.0 km 2 ), followed by low (23.2%, 85,254.3 km 2 ), very high (17.6%, 64,846.0 km 2 ), moderate (17.2%, 63,267.0 km 2 ), and high (13.8%, 50,879.5 km 2 ). By overlaying the map of occurrence probability with the burned area of the 2020 dry season, we found that approximately 73% of this occurs in the categories of high and very high occurrence probability, validating the high reliability presented by the model (Figure 10).

79-100
Very high 64,846.0 17.6 By overlaying the map of occurrence probability with the burned area of the 2020 dry season, we found that approximately 73% of this occurs in the categories of high and very high occurrence probability, validating the high reliability presented by the model (Figure 10).

Discussion
In a recent review document [3] about the applications of machine learning in forest fire science and management, 298 publications were identified (between the years 1996 and 2019), with an important increase during the past 5 years. Among the references, in 71 cases machine learning algorithms were implemented to identify areas susceptible to the occurrence of fire events. The paper highlights that this type of algorithm has been highly successful due to its ability to learn from data and model hidden relationships which, in turn, often present better results than classical statistical approaches [10].This research presents the first model of probability of fire occurrence specific to the ecoregion of the Colombian-Venezuelan plains. According to the accuracy assessment results undertaken using the metric of the AUC of the ROC curve (value of 0.94), the random forest model

Discussion
In a recent review document [3] about the applications of machine learning in forest fire science and management, 298 publications were identified (between the years 1996 and 2019), with an important increase during the past 5 years. Among the references, in 71 cases machine learning algorithms were implemented to identify areas susceptible to the occurrence of fire events. The paper highlights that this type of algorithm has been highly successful due to its ability to learn from data and model hidden relationships which, in turn, often present better results than classical statistical approaches [10]. This research presents the first model of probability of fire occurrence specific to the ecoregion of the Colombian-Venezuelan plains. According to the accuracy assessment results undertaken using the metric of the AUC of the ROC curve (value of 0.94), the random forest model shows excellent performance [60,61]. Some works have demonstrated the high predictive capacity of random forests to model the occurrence of fires, presenting higher performance than other types of machine learning algorithms, such as boosted regression trees (BRT), support vector machines (SVM) [66], and maximum entropy (ME) [52].
It is important to take into consideration the seasonality of the fire regime. In this particular case, we focus on one of the two "macro-stations" that the area of study presents [10], specifically during the dry season, which occurs between the months of December and March [23].
The result of the model validation allows us to infer that the choice of variables and their combination worked well to predict the occurrence of fires in the study area during the dry season. The correct choice of variables is the first (and fundamental) step for a successful modeling process [67]. Furthermore, the quality of the model was confirmed by validating the result of the zoning of probability of occurrence with burned area data for 2020, for which around 73% of the new burned area was shown in areas whose category of probability of occurrence is high and very high, indicating a high reliability of the model to predict new occurrences.
Regarding the factors inside the model that influence the occurrence of fire, the most important variable was the NDWI. This index is used to estimate the vegetation cover [37]. It is sensitive to changes in moisture content [68]. Different indices work best according to the type of vegetation, but the moisture content of the fuel is best represented by this particular index [69]. Previous research has shown that NDWI provides better results for estimating the variation in moisture content of living fuel (type herbaceous) and predicting the risk of fire and behavior assessment in the case of savannah ecosystems [70].
The second most important variable was temperature (mean). This might be the result of the direct relationship between fuel humidity and temperature [71]. Increased temperature can lead to increased evaporation from plant cover, which in turn causes a decrease in moisture content, increasing the likelihood of fires by facilitating ignition [72]. Other research has shown the importance of the average temperature, particularly during the dry season (which is when more fires usually occur) and its greater influence on fire occurrence than other climatic variables, such as rainfall or humidity [71]. Indeed, in the dry seasons the potential for the occurrence of fires can be explained to a great extent by the high temperatures and low humidity [73].
The third variable in order of importance was human modification. Although fires may occur due to natural causes, they do not justify the scale or the alarming increase in fires that occurs in vegetation cover. The historical origin of such fires can be defined as the product of a "social aggression that has been taking place towards the forest" [74]. Within the modeling of the probability of occurrence of fires, it is of great importance to consider sources of ignition that are mainly of anthropic origin.

Conclusions
The spatial prediction of fire occurrence can be used as a benchmark to better allocate resources in the context of fire management and prevention. This research can support the selection of critical monitoring areas (to better spatially organize resources); organize management practices, such as slash-and-burn (among others) to promote the care and maintenance of the components of the ecosystem; and prevent and minimize the negative effects of fire. We believe that this type of input can be used in decision making, particularly related to questions of where to take action, that is, to answer the question of which locations need special focus [75]. In addition, prioritizing the integral management of fires based on the intrinsic relationship between the humidity of the fuel and high temperatures, the management of vegetable fuel is recommended as a mechanism to control the occurrence of fires, especially in zones of higher temperature. To a significant extent, these two factors determine the dynamics of the occurrence of fires in the ecoregion of the Colombian-Venezuelan plains. Regional scale modeling allows the identification of sites where it would be worthwhile to increase research detail to direct efforts to build specific models at local scales involving spatial information that is not available at smaller scales. For example, weather conditions can be an input variable that allows the construction of models at short temporal scales to predict the imminent presence of fires, which would be beneficial in monitoring and control tasks.
Our research confirms that the combination of data from remote sensing and automatic learning algorithms (specifically the random forest) represents a good tool to model the probability of fire occurrence. Based on this, our research identifies those areas that are more susceptible to these events. The design presented here of the model of fire probability can serve as a guide for other researchers to build models based on basic cartographic data and information from free access global databases, focused on regional analysis and using free tools that allow replicability. Specifically, the processing capacity of Google Earth Engine was used to collect and organize information from different sources. This data was accessed using the R environment, and the method provides an important compatibility between information systems and programming languages, and optimizes the use of information from open access sources.