A Comparison of Two Machine Learning Classification Methods for Remote Sensing Predictive Modeling of the Forest Fire in the North-Eastern Siberia

: The problem of forest ﬁres in Yakutia is not as well studied as in other countries. Two methods of machine learning classiﬁcations were implemented to determine the risk of ﬁre: MaxENT and random forest. The initial materials to deﬁne ﬁre risk factors were satellite images and their products of various spatial and spectral resolution (Landsat TM, Modis TERRA, GMTED2010, VIIRS), vector data (OSM), and bioclimatic variables (WORLDCLIM). The results of the research showed a strong human inﬂuence on the risk in this region, despite the low population density. Anthropogenic factors showed a high correlation with the occurrence of wildﬁres, more than climatic or topographical factors. Other factors a ﬀ ect the risk of ﬁres at the macroscale and microscale, which should be considered when modeling. The random forest method showed better results in the macroscale, however, the maximum entropy model was better in the microscale. The exclusion of variables that do not show a high correlation, does not always improve the modeling results. The random forest presence prediction model is a more accurate method and signiﬁcantly reduces the risk territory. The reverse is the method of maximum entropy, which is not as accurate and classiﬁes very large areas as endangered. Further study of this topic requires a clearer and conceptually developed approach to the application of remote sensing data. Therefore, this work makes sense to lay the foundations of the future, which is a completely automated ﬁre risk assessment application in the Republic of Sakha. The results can be used in ﬁre prophylactics and planning ﬁre prevention. In the future, to determine the risk well, it is necessary to combine the obtained maps with the seasonal risk determined using indices (for example, the Nesterov index 1949) and the periodic dynamics of forest ﬁres, which Isaev and Utkin studied in 1963. Such actions can help to build an application, with which it will be possible to determine the risk of wildﬁre and the spread of ﬁre during extreme events. ﬁre-ﬁghting measures in the territory of the republic. The results can help create an application that can be used to determine the risk of ﬁre and the spread of ﬁre during a disaster. This work has the potential to lay the foundations for the future of a fully automated application of ﬁre risk assessment in the Sakha Republic.


Introduction
The disturbance regime in the boreal forest is extremely variable. Every year in Siberia, millions of hectares of forest are burned. Forest fires are one of the main factors causing not only long-term, harmful changes in plant ecosystems, but also contribute to the deterioration of living conditions in society, especially in the event of wildfires. Taiga fire is a natural phenomenon. Fires determine the normal, ecological functioning of the forest in this region. Forest fires are an inseparable part of the natural cycle. After the fire, there are favorable conditions for the young generation of trees. Wildfires are important for the indigenous peoples of Siberia. The territory after the wildfire turns into

Yakutia
To correctly understand fire regimes in Yakutia, it is important to know the features of their origin. The geographical, geological, climatological, and ecological position in the landscape make it possible to explain the complexity of the fire phenomenon. Yakutia is a very specific region, with occurrence extremely high and low temperatures, a thick layer of permafrost, specific geological structures, and the occurrence of light taiga forests dependent on fire regimes. Due to the high complexity of the territory, it was decided to study wildfires on two scales to examine the dependencies between fires and their factors on the regional and global scale. At the macroscale, the territory of research was the territory of the Sakha Republic; on a microscale, one of the regions of the republic was chosen, which was the Nyurbinskii district ( Figure 1). This region was chosen due to the high number of fire incidents in recent years.

Yakutia
To correctly understand fire regimes in Yakutia, it is important to know the features of their origin. The geographical, geological, climatological, and ecological position in the landscape make it possible to explain the complexity of the fire phenomenon. Yakutia is a very specific region, with occurrence extremely high and low temperatures, a thick layer of permafrost, specific geological structures, and the occurrence of light taiga forests dependent on fire regimes. Due to the high complexity of the territory, it was decided to study wildfires on two scales to examine the dependencies between fires and their factors on the regional and global scale. At the macroscale, the territory of research was the territory of the Sakha Republic; on a microscale, one of the regions of the republic was chosen, which was the Nyurbinskii district ( Figure 1). This region was chosen due to the high number of fire incidents in recent years. The Republic of Sakha (Yakutia) is in northeastern Siberia and occupies 1/5 of the territory of the Russian Federation. The territory of the republic from the east and south is closed by mountain ranges; in the north, it has access to the Arctic Ocean. The relief and geological structure are distinguished by a complex and diverse structure. The orography of the territory determines the characteristics of a sharply continental climate, permafrost, soil, vegetation, wildlife, grasslands, and grazing land, and influences the nature of human economic activity [14]. A characteristic feature of the climate of Yakutia is a sharp continentality, which is manifested in large annual fluctuations in temperature and a relatively small amount of precipitation. The main factor of this state is the Siberian anticyclone. In the study area, the average air temperature during the winter months is −30 °C, and the average temperature of the coldest month (January) is −35 °C. The average annual precipitation is 200-400 mm [15]. The permafrost thickness in the territory is sometimes more than 100 m. Seasonal thawing varies between 0.8 and 3.3 m, depending on the landscape and the type of soil. Permafrost in the conditions of Yakutia has an impact on all soil processes [16]. Sakha can be divided into three great vegetation belts. About 40% of Yakutia is located above the Arctic Circle, and all of this is covered with permafrost, which greatly affects the region's ecology and limits the forests in the southern region. The arctic and subarctic tundra define a middle region where lichens and mosses grow like big green carpets and are favorite pastures for deer. In the southern part of the tundra belt The Republic of Sakha (Yakutia) is in northeastern Siberia and occupies 1/5 of the territory of the Russian Federation. The territory of the republic from the east and south is closed by mountain ranges; in the north, it has access to the Arctic Ocean. The relief and geological structure are distinguished by a complex and diverse structure. The orography of the territory determines the characteristics of a sharply continental climate, permafrost, soil, vegetation, wildlife, grasslands, and grazing land, and influences the nature of human economic activity [14]. A characteristic feature of the climate of Yakutia is a sharp continentality, which is manifested in large annual fluctuations in temperature and a relatively small amount of precipitation. The main factor of this state is the Siberian anticyclone. In the study area, the average air temperature during the winter months is −30 • C, and the average temperature of the coldest month (January) is −35 • C. The average annual precipitation is 200-400 mm [15]. The permafrost thickness in the territory is sometimes more than 100 m. Seasonal thawing varies between 0.8 and 3.3 m, depending on the landscape and the type of soil. Permafrost in the conditions of Yakutia has an impact on all soil processes [16]. Sakha can be divided into three great vegetation belts. About 40% of Yakutia is located above the Arctic Circle, and all of this is covered with permafrost, which greatly affects the region's ecology and limits the forests in the southern region. The arctic and subarctic tundra define a middle region where lichens and mosses grow like big green carpets and are favorite pastures for deer. In the southern part of the tundra belt along the rivers, scattered dwarf Siberian pines and larches grow. Below the tundra is a vast region of the taiga zone [2].
According to the state report "On the state and environmental protection of the Republic of Sakha (Yakutia) in 2018, the forested area occupies 82% of the territory of Yakutia, but the forest is 54% of the territory. The area of the forest zone is 252,820.0 million hectares. The forest cover is very diverse, from 11.5% in the Verkhoyansk district to 91.7% in South Yakutia. In the Nyurbinsky district, the forest area is 4416.3 thousand hectares and is more than 84% of the total area. The timber reserved is about 9.2 milliard m 3 and 96% of it is coniferous. The average timber reserve per one hectare is 58 m 3 . The biggest average timber reserves are of Pinus Sibirica stands (188 m 3 /ha) and Picea spp. (130 m 3 /ha). The average timber reserves of Pinus sylvestris is 104 m 3 /ha, Larix spp. is 62 m 3 /ha, and Betula Pendula 41 m 3 /ha.
A total of 95.6% of forest species in Yakutia are coniferous. The major species are Larix (4 main types: Larix cajanderi, L. gmelinii, L. sibirica, and L. czekanowskii (L. sibirica x L. gmelinii)), which represent 77.5% of the total forest resources. The second main species is Pinus sylvestris (6.5% of the total area) and Pinus obovata (0.24% of the total area). In southwest Yakutia, Pinus sibirica and Abies sibirica occur. Picea ajanensis is characteristic of the south mountainous regions. Other mountainous regions are occupied by Pinus pumilo (4.6% of the total forest area). The main deciduous tree species are Betula pendula, B. pubescens, B. ermannii, Populus staveolens, P. tremula, Chosenia arbutifolia, and Salix spp. Deciduous trees occupy only two million hectares, which are 1.24% of the total forest area.
Tree stands are well adapted to growing under extremely hard conditions of a dry climate and occurrence of permafrost. Forests are also very well adapted to the recovery process after fires. The process is determined by several factors. The most important factors are high seed production, good seed germination, and highly adaptive potential. Larix cajanderi is highly adapted to the existence with frequent fires. This species is pouring seeds in late summer in the ripening year. Seeds are being covered by needle litter and snow, where it creates the right conditions for sprouting and uses the spring moisture. The best conditions for forest growth are during the first years after a fire. Litter is destroyed, and the soil is enriched with ashy elements. The upper soil horizon's moisture is increased due to inflow from lower horizons [10].
Nyurbinsky ulus (region) is in the middle of the Vilyi River and occupies the territory adjacent to both Vilyi and its main tributary Marha. The region has an area of approximately 52.4 thousand sq km. Nyurbinsky region is part of the West Yakutia natural zone. It is characterized by plains and plateaus and the key elements are soil and vegetation on which relief has a big influence. According to Kurzhuev, the region is located on the Viluy Plateau, which is part of the Central Yakut Plateau. There is a characteristic occurrence of cryolithozone relief forms such as alases, yuryakhs, and bulgunyakhs. The number of days with snow cover is 210-225 days. The average annual temperature in the region is negative and in Markha it is −11.1 • C. The coldest month is January with an absolute minimum of -60 • C, and the warmest is July (the absolute maximum is +37 • C). The average amount of precipitation per year is 260-280 mm [14]. The maximum is in July. In the geobotanical regionalization of Yakutia, the Nyurbinsky region is situated in the boreal region in the taiga zone-a subzone of the middle taiga forests-sub-province, Central Yakutian middle taiga. The area of the region is dominated by Larix forests and Pinus Sylvestris forests. The river's valleys are characterized by rich meadows as well as common steppe and forest-steppe landscapes. Dry belts of alas vegetation are represented by the Carex duriuscula steppes [14].

Fires Data
In the Republic of Sakha (Yakutia), open data on forest fires are not available. There is a database of forest fires at the Ministry of Environmental Protection (https://minpriroda.sakha.gov.ru), but these data are tabular, and without georeferencing, it is impossible to create a GIS database necessary for this type of research. These data are understated due to the addition to the database of only those fires in which there was action taken to extinguish the fire. Fires that are far from human activity often do not extinguish, so are not classified in the database. The most popular global data source on fire data is the Fire Information for Resource Management System (FIRMS). Data available in the service are collected from the VIIRS 375 m, 750 m, and MODIS Collection 6 Active Fire Product. The data are collected from 2002 until now. For the studies, we chose data from the MODIS Collection 6 sensor because of their longer availability and enough spatial resolution. We used data between 2001 and 2018 from the FIRMS fire archive. The shapefiles were in the Geographic WGS84 projection. The confidence values ranged from 0% to 100% and ranged in one of three fire classes (low-confidence fire, nominal-confidence fire, or high-confidence fire) [17].

Factors Data
Forest fire regimes are extremely diverse and vary due to the spread of fires and climate changes [8].
In difficult to manage forest areas like in Yakutia, it is necessary to properly characterize the factors that are causing forest fires. Researchers have used several different variables to assess wildfire risk [18][19][20][21]. Due to the lack of data, it was decided to investigate only long-term fire factors data like constant, long-term factors such as slope, aspect, fuel, climate, NDVI, etc. Constant factors are those factors that do not change rapidly, but gradually, in the long perspective. Constant factors can be calculated in medium-or long-term periods before the fire season [3]. In Yakutia, there is a very big data shortage. If some data are already collected, it is very difficult to access them. For these reasons, mostly global datasets were used.
Variables were divided into four groups: (1) meteorological (precipitations, temperature, maximum summer temperature, radiation); (2) NDVI; (3) landform (elevation, slope, slope direction); and (4) human activity (distance from roads, distance from settlements, distance from rivers). All data types were in other formats and had different spatial resolutions (Table 1). To collect bioclimatic variables, the WorldClim dataset of global climate layers was used. The WorldClim dataset has a spatial resolution of about 1 km 2 [22]. WorldClim is a set of global climate layers (gridded climate data), specifically developed for ecological modeling on GIS. Currently, WorldClim provides several datasets for different temporal scenarios (past, current, and future conditions). In this work, data for the current condition scenario  were used. WordClim bioclimatic variables are analysis-ready data so the preprocessing was not necessary.
NDVI is greatly used in the evaluation of the phenology and productivity of the vegetation. Onigemo et al. claims that the values of NDVI obtained through images at the peak of drought were related to the content moisture and fresh phytomass, showing its potential to estimate fire risk [23]. Illaera et al. found a good correlation between the NDVI values and the location of wildfires [24]. To assess NDVI in macro-and microscale, it was decided to use NDVI from two sources: MODIS and LANDSAT images. The Landsat data were used for the determination of NDVI in the Nyurbinsky region. As the region of interest, the following paths and rows were defined: 129-15; 129-16; 130-14; 130-15; 130-16; 131-15. There were selected available images from the year with the smallest cloud cover. Images without cloud cover at the peak of the growing season (July) were available only for Landsat 5 in 2009. Before creating the mosaic, atmospheric correction was necessary. It allowed for improving images from level 1 to level 2. For NDVI calculations, bands 3 and 4 were used by the formula described by J.W. Rouse in 1973 [25]. To determine the average NDVI in the vegetation season for Yakutia between 2001 and 2018 with MODIS, dataset MOD13A2 Version 6 was used. MOD13A2 provides NDVI values with a resolution of 1 km. The product is derived with a monthly interval. There were selected images at the peak of the growing season (July) from each year. At the territory of Yakutia, there are the following tiles, which were defined as region of interests: h21v01, h22v01, h23v01, h21v02, h22v02, h23v02, h24v02, h23v03, h24v03, h25v03. After the selection of the data yearly, an NDVI mosaic was created and the mean NDVI calculated between 2001 and 2015.
To model landform factor data, it was decided to use one of the most used in these types of works, digital elevation models: the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010). It incorporates the current best available global elevation data. GMTED2010 is commonly used for radiometric and geometric correction, cover mapping, and extraction of drainage features for hydrologic modeling [26]. There was a necessity for mosaicking. For the GMTED, the following entities were used: GMTED2010N50E120, GMTED2010N50E150, GMTED2010N70E120, and GMTED2010N70E150. From the DEM, the information about elevation, slope, and slope direction was derived.
To investigate the relationship between the fire and the presence of a human, it was necessary to obtain information on the distance of fires from roads, buildings, and rivers. The data in shape format were downloaded from Open Street Map (OSM). According to previous works on this subject [27,28], to achieve this goal, the Euclidean distance was used. Euclidean distance is a straight line between two points in Euclidean space. The locations were converted to raster. The resolution of the raster was defined by the shortest of the width or height of the extent of the input feature, in the input spatial reference, divided by 250.
After the preprocessing, two regions of interest (ROI) were defined for each obtained raster. Each of them was clipped using the forest cover of the Nyurbinsky region and Republic of Sakha (Yakutia) forest cover. The forest cover was made available by the Institute of Biological Problems of Cryolithozone in Yakutia. The same ROI was made for fire points between 2001-2015 from the FIRMS dataset.

Fire Factors Analysis
The first step after the preprocessing (Figure 2) was the classification of the values in each raster, for ordering data and eliminating individual pixels strongly deviating from the average pixels were probably deviating due to measurement error or instrument inaccuracy. Due to a large amount of data, the surface to cover (3,084,000 km 2 ), and computing capabilities, it was necessary to simplify the data to some extent but retain their characteristics. Jenks's optimization method was used. The Jenks classification is designed to identify data clusters as well as maintain a representation of all data in the set. The optimization lasts as long as the limits of the intervals are obtained, respecting the principle of the smallest possible differentiation of the observations contained in them, with the greatest distance between the intervals at the same time. This method aims to reduce the variance within classes and maximize the variance between classes. This is done by striving to minimize the average deviations of each class from the middle class while maximizing the deviations of each class from the means of other groups [29]. New values in rasters have been extracted to each fire point from the FIRMS dataset of over 17 years in the territory of Yakutia and the Nyurbinsky region.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 19 from the means of other groups [29]. New values in rasters have been extracted to each fire point from the FIRMS dataset of over 17 years in the territory of Yakutia and the Nyurbinsky region. The next step was to calculate the area of each class in each factor. Then, all fires between 2001-2018 were summed, separately for each category in each obtained earlier categorized raster. With these data, it was possible to determine the regression, and correlation between the dependent variables and explanatory variables. The same calculation scheme was repeated for each variable. In the last step, Pearson correlation and coefficient of determination between the fire data and factors data were calculated for each model.

Long-Term Fire Risk Modeling
The next step was to build risk models using two machine learning methods ( Figure 3). The training dataset was built from the FIRMS fire data between 2001-2015. All registered fire points at the territory in the Republic of Sakha Yakutia and Nyurbinsky region in this period were used. The variables were divided, according to the previously obtained correlation coefficient. As the predictors, two types of datasets were examined. A dataset with a coefficient of correlation equal or higher than satisfactory (higher than 0.6) and a dataset with all 11 previously selected fire factors. All predictor rasters have been resampled and converted to grid format. Such a set of training data and predictor data were created, separately for the territory of Yakutia and the territory of the Nyurbinsky region. The next step was to calculate the area of each class in each factor. Then, all fires between 2001-2018 were summed, separately for each category in each obtained earlier categorized raster. With these data, it was possible to determine the regression, and correlation between the dependent variables and explanatory variables. The same calculation scheme was repeated for each variable. In the last step, Pearson correlation and coefficient of determination between the fire data and factors data were calculated for each model.

Long-Term Fire Risk Modeling
The next step was to build risk models using two machine learning methods ( Figure 3). The training dataset was built from the FIRMS fire data between 2001-2015. All registered fire points at the territory in the Republic of Sakha Yakutia and Nyurbinsky region in this period were used. The variables were divided, according to the previously obtained correlation coefficient. As the predictors, two types of datasets were examined. A dataset with a coefficient of correlation equal or higher than satisfactory (higher than 0.6) and a dataset with all 11 previously selected fire factors. All predictor rasters have been resampled and converted to grid format. Such a set of training data and predictor data were created, separately for the territory of Yakutia and the territory of the Nyurbinsky region. In the simulation, two types of prediction methods were selected, referring to the works of authors such as Peters [11], Parisien [30] (maximum entropy prediction model) and Oliveira [6] (random forest prediction model). In their studies, the authors showed a good correlation between the risk models obtained when using machine learning and forest fires. The authors demonstrated the superiority of methods using machine learning over traditional ones.
The maximum entropy prediction model is a widely used and accepted statistical method to produce predicting probability distributions. The model is adapted in diverse topics such as thermodynamics, economics, forensics, imaging technologies, and recently, ecology. Maximum entropy can provide accurate predictions of patterns in macroecology and help identify the mechanisms that matter most [31]. The algorithm is widely used for mapping species distributions [32] and conservation planning [31]. In recent years, the model of maximum entropy began to be used in the ecology of fires. Maximum entropy is a density estimation method based on a probability distribution. It is a presence-only machine learning algorithm that iteratively contrasts environmental predictor values at occurrence locations with those of a large background sample taken across the study area [33]. Maximum entropy has proved to be an enormously powerful tool for reconstructing images from many types of data [34].
The biggest advantage of a random forest is that it is a very flexible method, and it can be used in different types of problems. The random forest algorithm is a fully nonparametric machine learning method for data analysis. Classification and regression random forest is competitive with the best available methods and superior to most methods in common use [35]. The application of random forest can be an effective methodology to predict fire occurrence in different sites. The random forest algorithm is a technique developed by Breiman (2001) [35]. It combines a large set of decision trees, which is the biggest advantage compared to a simple decision tree algorithm. Each tree is trained by a set of variables, which are randomly selected from the training dataset.
All classifications were carried out using SAGA-6.4.0 and its modules "Maximum Entropy Presence Prediction", "Random Forest Presence Prediction (ViGrA) Classification". After the processing, two types of models were obtained: presence prediction and presence probability maps. Presence prediction maps are only meant to determine if there is a possibility of a fire. All raster pixels are classified into two classes: • Absence-there is the possibility of a forest fire.

•
Presence-there is no possibility of the forest fire.
Presence probability maps are meant to determine how high the possibility is of the fire. All raster pixels are classified into six classes: In the simulation, two types of prediction methods were selected, referring to the works of authors such as Peters [11], Parisien [30] (maximum entropy prediction model) and Oliveira [6] (random forest prediction model). In their studies, the authors showed a good correlation between the risk models obtained when using machine learning and forest fires. The authors demonstrated the superiority of methods using machine learning over traditional ones.
The maximum entropy prediction model is a widely used and accepted statistical method to produce predicting probability distributions. The model is adapted in diverse topics such as thermodynamics, economics, forensics, imaging technologies, and recently, ecology. Maximum entropy can provide accurate predictions of patterns in macroecology and help identify the mechanisms that matter most [31]. The algorithm is widely used for mapping species distributions [32] and conservation planning [31]. In recent years, the model of maximum entropy began to be used in the ecology of fires. Maximum entropy is a density estimation method based on a probability distribution. It is a presence-only machine learning algorithm that iteratively contrasts environmental predictor values at occurrence locations with those of a large background sample taken across the study area [33]. Maximum entropy has proved to be an enormously powerful tool for reconstructing images from many types of data [34].
The biggest advantage of a random forest is that it is a very flexible method, and it can be used in different types of problems. The random forest algorithm is a fully nonparametric machine learning method for data analysis. Classification and regression random forest is competitive with the best available methods and superior to most methods in common use [35]. The application of random forest can be an effective methodology to predict fire occurrence in different sites. The random forest algorithm is a technique developed by Breiman (2001) [35]. It combines a large set of decision trees, which is the biggest advantage compared to a simple decision tree algorithm. Each tree is trained by a set of variables, which are randomly selected from the training dataset.
All classifications were carried out using SAGA-6.4.0 and its modules "Maximum Entropy Presence Prediction", "Random Forest Presence Prediction (ViGrA) Classification". After the processing, two types of models were obtained: presence prediction and presence probability maps. Presence prediction maps are only meant to determine if there is a possibility of a fire. All raster pixels are classified into two classes: • Absence-there is the possibility of a forest fire.

•
Presence-there is no possibility of the forest fire.
Presence probability maps are meant to determine how high the possibility is of the fire. All raster pixels are classified into six classes: • Very low-very low possibility of the fire. To obtain only six possibility classes, it was necessary to reclassify the models. For reclassification, the natural breaks method was used.
For the validation of the results, raw fire points data between 2015 and 2018 in the Republic of Sakha (Yakutia) and the Nyurbinsky region were used. To carry out statistical analysis pixel count in each probability class in the presence probability maps and each class at the presence prediction maps were summed. Fire points in each class were summed and divided by the pixel sum in each class. This process was carried out to consider the surface of each class when verifying models. Next, the percentage share of fires in each class was calculated. The last step was regression and correlation analysis.

Fire Factors
In the first part of the studies, attempts were made to uncover the main factors affecting the possibility of a wildfire. Of the 11 preselected factors, not all of them showed a good correlation with fire points (Table 2). In Yakutia (macroscale), the correlation above 70% was shown by factors such as solar radiation, maximum summer temperature, NDVI, elevation, slope, distance from roads, distance from settlements, and distance from rivers.
The problem was studied in the scale of the region using the example of the Nyurbinsky. In this example, only five factors out of 11 showed a good correlation. A very strong relationship between fires and precipitation was demonstrated. As the amount of precipitation falls, the number of points of ignition increases, which was not observed on a macroscale. The distribution of NDVI in microscale is almost the same as for the Yakutia. With an increase of the slope, fewer fires are observed, as in the macroscale.

Fire Risk
The next step was to create fire hazard models using two types of modeling methods. Fire risk models in the territory of Yakutia are shown in Figures 4 and 5. Results of the modeling differed, but both methods gave satisfying results. For the territory of the Republic of Sakha (Yakutia), high coefficients of correlation for each prediction method can be observed (Tables 3 and 4). Coefficients were slightly higher for the random forest prediction method. In this method, the model with 11 variables did not give considerably better results than the model with nine variables.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 19 The next step was to create fire hazard models using two types of modeling methods. Fire risk models in the territory of Yakutia are shown in Figures 4 and 5. Results of the modeling differed, but both methods gave satisfying results. For the territory of the Republic of Sakha (Yakutia), high coefficients of correlation for each prediction method can be observed (Tables 3 and 4). Coefficients were slightly higher for the random forest prediction method. In this method, the model with 11 variables did not give considerably better results than the model with nine variables.   The next step was to create fire hazard models using two types of modeling methods. Fire risk models in the territory of Yakutia are shown in Figures 4 and 5. Results of the modeling differed, but both methods gave satisfying results. For the territory of the Republic of Sakha (Yakutia), high coefficients of correlation for each prediction method can be observed (Tables 3 and 4). Coefficients were slightly higher for the random forest prediction method. In this method, the model with 11 variables did not give considerably better results than the model with nine variables.    When modeling using the maximum entropy prediction model, better results gave a dataset with nine variables. In both models, the highest number of fires between 2015-2018 is in the high, very high, and extreme probability classes. In the very low-risk class, there were less than 1% of fire points. Presence prediction models showed a superior random forest method over the maximum entropy method. In the random forest method, almost 100% of the fire points were in the presence class. The maximum entropy method was characterized by worse results. In the absence, the class located more than 10% of the fire points from the validation dataset.
Modeling carried out at the territory of the Nyurbinsky region showed slightly different results than in Yakutia (Figures 6 and 7). Correlation coefficients were not as high and did not exceed 0.9 (Table 5). According to the coefficient of correlation, the smallest error was characterized by a model of maximum entropy using six predictor variables. The coefficient of correlation for 11 predictors was only slightly lower and was equal to 0.89.  Figure 6. Results of the long-term wildfire presence probability modeling in Nyurbinksy (microscale). Figure 6. Results of the long-term wildfire presence probability modeling in Nyurbinksy (microscale).
It was observed that in the low and very low fire risk classes, there were no fire points. Significantly worse results were shown by the random forest prediction method. In both cases of the random forest method (six and 11 predictors), the coefficient of correlation did not exceed 0.82. Over 20% of all fire points were at low-risk classes. Similar results were observed by analyzing the presence prediction maps ( Table 6). The random forest prediction method was characterized by more than 30% of fire points in the absence class. The results were distributed in a similar way in the maximum entropy prediction model, which is characterized by 11 predictors. The best results were observed in the maximum entropy classification using six predictive variables. In this model, more than 90% of the points were in the presence class.   It was observed that in the low and very low fire risk classes, there were no fire points. Significantly worse results were shown by the random forest prediction method. In both cases of the random forest method (six and 11 predictors), the coefficient of correlation did not exceed 0.82. Over 20% of all fire points were at low-risk classes. Similar results were observed by analyzing the presence prediction maps ( Table 6). The random forest prediction method was characterized by more than 30% of fire points in the absence class. The results were distributed in a similar way in the maximum entropy prediction model, which is characterized by 11 predictors. The best results were observed in the maximum entropy classification using six predictive variables. In this model, more than 90% of the points were in the presence class. It was observed that in both regions (Yakutia and Nyurbinsky), the random forest presence prediction method gave more accurate results. Even though the Nyurbinsky region incorrectly classified a larger number of fire points, it can be unambiguously stated that it is wrong. This is because ( Figures 5 and 7) in the presence maximum entropy, presence prediction method, almost all of the territory of Yakutia and Nyurbinsky was classified as territory with the presence of fires. The results of the random forest presence prediction method showed a much narrower territory of the possibility of the occurrence of fires and yet it classified more points correctly in the territory of Yakutia.

Discussion
In fire studies, fire risk is one of the major topics and there are many different approaches to this subject. Blanchi et al., in their work about a methodological approach in fire risk studies, collected more than 50 works connected with fire probability cartography. Risk mapping has less than twenty years. Previously, there were rather preferred descriptive approaches. Since the 1990s, there has been an increasing interest in this field [36].
The study presented the possibilities of using different types of GIS and remote sensing data in modeling the wildfire risk. Results allow us to reflect various aspects of fire studies. The difficulties in fire risk assessment were to point out and clarify possibilities to define wildfire risk. There are many different approaches in hazard mapping based on different datasets, scales, and algorithms. The multi-approach is relevant in fire management studies [37,38]. Gai et al. [39] developed a spatially weighted index model for fire risk assessment. You et al. [40] integrated a Forest Resource Inventory Database based on four aspects of topographical, human activity, climate, and forest characteristic factors. Goldarag et al. [41] used neural networks and logistic regression for fire risk assessment.
The biggest challenge in the study was to collect the necessary data due to poor exploration of the area of studies. GIS technologies in Yakutia are under development [42]. For this purpose, we chose a semi-probabilistic mode of modeling, which gave possibilities of combining historical fire data and physical fire mechanisms [43].
The results of the analysis showed that the fire risk assessment in the Republic of Sakha (Yakutia) is not a problem that can be easily solved. The specificity of the studied territory is significantly different from other parts of the world that are facing the problem of wildfires. Boreal forests of Yakutia are perfectly adapted to extremely severe climatic conditions. Fires have been affecting the natural functioning of the forests of the region for centuries. Fires not only affect the climatic conditions, but also the formation of the terrain [36].
Results of the correlation analysis in macroscale showed that with increasing radiation, the risk of fires increases, and the same happened with increasing maximum summer temperature. Lim et al., in their study, highlighted that fire data showed a high correlation with climate factors [44]. NDVI was highly correlated with fires (R = 0.91). With increasing NDVI, the number of points of ignition increased. This situation was associated with the accumulation of combustible materials along with an increase in forest biomass. Elevation and slope correlated inversely proportional. According to Pourghasemi et al., slope and aspect are some of the most important factors controlling forest fire occurrence [45]. Kurbatsky et al. wrote that long drought conditions in the Yakutian valleys caused intense droughts and accumulation of fire fuels, which caused strong fires [46]. As the slope increased, we observed a decreasing number of wildfires. This situation may be due to the strongly inclined slopes that act as a barrier to the spread of fire. Additionally, on inclined slopes, many combustible materials cannot accumulate. Ghorbanzadeh et al. claimed that environmental variables are not the only reason for fire susceptibility and risk [47]. A very strong connection between the increase in the number of fire points and the human factors was observed. In places with proximity to transport routes, the number of fires increases, and the same thing happens if we consider the settlements. This situation indicates a strong anthropogenic impact on the risk of fire. There are not so many fires near water lines, which can be caused by the presence of moist areas close to the rivers.
The largest correlation at the microscale was observed when taking rainfall into account. As the rainfall increased, the number of fires increased. The other climatic factors did not show a strong correlation. NDVI, similar to the macroscale, strongly influences the number of fires. The properties of combustible materials relate to type, phytomass, condition, and moisture, among which the moisture content is the most important for fire protection [48]. In a completely different way, anthropogenic factors were distributed on the microscale than on the macroscale. The Nyurbinsky region has a relatively high population density compared to the entire territory of Yakutia, which may affect the differences between the results. Most fires continue to occur along roads, but the closer the settlements are, it occurs less frequently. This situation may be because fires that approach the villages pose a threat to people and fire-fighting starts quickly (fires do not reach large sizes), while those that are far from populated areas remain without any action as they do not pose a great threat. Along roads that are far from villages, fires can reach serious sizes. Additionally, villages in this region are usually not surrounded by forests, but by meadows called "alas", with lower susceptibility to ignition.
Human activity has a greater influence on the fire regimes and differs in the macro-and microscale. The results showed that distance from settlements (R = −0.97, and R = 0.86), distance from rivers (R = 0.91, and R = −0.54), and distance from roads (R = −0.80, and R = −0.82) had a great importance in fire risk assessment. Ajin et al. [49] showed a high correlation between fire occurrence and distance from roads and settlements. The study by Werf showed that in residential areas and near roads, more human activities are witnessed, and human activity is the most significant factor in the fire outbreak [50]. Studies of the territory of Yakutia in residential areas and near roads show that more human activities are witnessed, and the human activity is the most significant factor in fire outbreaks.
Very large differences were observed between the results of the research on the macro-and microscale. Often the results were quite the opposite. It is probably related to the specificity of the Yakutia region. The Niyurbinsky region is quite highly urbanized compared to other regions of Yakutia, which may affect the results. Additionally, the regional climatic conditions differed from the average for the entire region. The Nyuribnsky region is characterized by flat terrain, and most of Yakutia is mountainous, which can also affect the conditions for fires. To analyze the macroscale, it is necessary to take into account a wide variety of conditions: climatic, geographical, and human influence. However, on a microscale, these conditions are more homogeneous.
The analysis of the different methods of wildfire risk assessment allows for the identification of the high danger areas in Nyurbinsky and the entire Yakutia. The maximum accuracy was demonstrated by the random forest method with 11 predictors (R = 0.98) for the entire territory of Yakutia. The analysis showed that the random forest method gave more accurate results and a much narrower area of the possibility of fires than the MaxENT method, which allows us to propose that this model is preferable. A model created using the maximum entropy method has a small differentiation into zones, which does not allow for its use in practice as uninformative. Parisien used the MaxENT and random forest model to predict fire ignition across the USA [30] with satisfactory results. The MaxENT model was used to assess fire risk in India's Ghats Mountains [34]. In their studies, the authors showed a good correlation between the risk models obtained when using machine learning and forest fires. Due to their power, versatility, and ease of use, random forests are quickly becoming one of the most popular machine learning methods [48]. The studies confirmed that the use of GIS and remote sensing technologies can be used to assess the long-term risk and probability of fires in Yakutia. The results showed that reducing the number of factors during modeling did not show a significant impact on the results in both methods. In the few last years, an increase in the number and severity of fires has been observed as has more years with extreme fire seasons [9]. This situation may also be affected by climate change. On the other hand, a larger number of fires may have an impact on the climate, permafrost ecosystems, and the functioning of indigenous people. In connection with this situation, it is necessary to work on new fire risk assessment methods in the Arctic and subarctic zone of Yakutia. Methods should use global databases of fire, climate, and human activities. For building new models that include all these factors, it is necessary to use different types of datasets, like OSM data, climatic datasets, and remote sensing data from different sensors (MODIS, Landsat, Sentinel). According to Gigović et al., the random forest model could be used at the regional level for forest fire susceptibility mapping [51]. Banerjee et al. claim that the MaxENT method can be used as a decision support tool for stakeholders of forest resources [52]. Pham et al. claim that models that consider climate variables, vegetation, and human influences can explain fire risk better than those that only account for some of these factors [53].
The objective of work on modeling long-term wildfire risk in the Nyurbinsky region of Yakutia was accomplished. The accuracy of the results depends on the data used, and here available. The results were valid within the limits caused by the data used; the approach was chosen as was the research objective. Despite these limitations, it was possible to obtain interesting results that enabled us to answer the main questions posed by the problem.

Conclusions
The problem of the forest fires in Yakutia is not as well studied as in other countries. The results of the research have shown a strong human influence on the risk in this region despite the low population density. Anthropogenic factors showed a high correlation with the occurrence of wildfires more than climatic or topographical factors. Other factors affect the risk of fires at the macroscale, and others the microscale, which should be considered when modeling.
The random forest method showed better results in the macroscale, however, the maximum entropy model was better in the microscale. The exclusion of variables that do not show high correlation does not always improve the modeling results. The random forest presence prediction model is a more accurate method and significantly reduces the risk territory. The reverse is the method of maximum entropy, which is not as accurate and classifies very large areas as endangered.
Further study of this topic requires a clearer and conceptually developed approach to the application of remote sensing data. Therefore, this work makes sense to lay the foundations for the future of a completely automated fire risk assessment application in the Republic of Sakha. The results can be used for fire prophylactics and planning fire prevention. In the future, to determine the risk well, it is necessary to combine the obtained maps with the seasonal risk determined using indices (for example, the Nesterov index 1949) and the periodic dynamics of forest fires, which Isaev and Utkin studied in 1963. Such actions can help build an application with which it will be possible to determine the risk of wildfire and the spreading of fire during extreme events.
Author Contributions: S.G. and P.J. participated in the conceptualization of the manuscript. S.G. and P.J. participated in the methodology development and software development. P.J. conceived the designed the experiments. S.G. and P.J. wrote the manuscript. S.G. provided supervision on the design and implementation of the research. All authors contributed to the review and improvement of the manuscript. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.