Spatio-Temporal Knowledge Graph Based Forest Fire Prediction with Multi Source Heterogeneous Data

: Forest ﬁres have frequently occurred and caused great harm to people’s lives. Many researchers use machine learning techniques to predict forest ﬁres by considering spatio-temporal data features. However, it is difﬁcult to efﬁciently obtain the features from large-scale, multi-source, heterogeneous data. There is a lack of a method that can effectively extract features required by machine learning-based forest ﬁre predictions from multi-source spatio-temporal data. This paper proposes a forest ﬁre prediction method that integrates spatio-temporal knowledge graphs and machine learning models. This method can fuse multi-source heterogeneous spatio-temporal forest ﬁre data by constructing a forest ﬁre semantic ontology and a knowledge graph-based spatio-temporal framework. This paper deﬁnes the domain expertise of forest ﬁre analysis as the semantic rules of the knowledge graph. This paper proposes a rule-based reasoning method to obtain the corresponding data for the speciﬁc machine learning-based forest ﬁre prediction methods, which are dedicated to tackling the problem with real-time prediction scenarios. This paper performs experiments regarding forest ﬁre predictions based on real-world data in the experimental areas Xichang and Yanyuan in Sichuan province. The results show that the proposed method is beneﬁcial for the fusion of multi-source spatio-temporal data and highly improves the prediction performance in real forest ﬁre prediction scenarios.


Introduction
Forest fires are one of the major interference factors in forest ecosystems, which affect biodiversity, species composition, and ecosystem structure [1]. They threaten forest resources and the safety of lives and property [2]. Thus, the prediction of forest fires plays an important role in disaster risk analysis. This paper proposes a spatio-temporal knowledge graph framework that can improve machine learning-based forest fire prediction methods. The proposed framework can deeply extract and analyze features of a large amount of historical spatio-temporal data, particularly for the rapid changes in fire information and incomplete fire data.
Commonly used approaches include fire indices and mechanism models for forest fire predictions that consider the closely related occurrence factors of forest fires. Salavatiet et al. [3] conducted research in the city of Sanandaj, located in the west of Iran. In this study, fire risk potential is assessed using Weights of Evidence (WoE) and Statistical Index (SI) models. Chen et al. [4] considered precipitation as an important factor affecting the probability of forest fire occurrence. In their work, they present a method for better representing the effect of precipitation on predicting forest fires. Ge et al. [5] proposed a comprehensive index of forest fire drivers for forest fire predictions using a hierarchical analysis considering topographic, vegetation, meteorological, and human activity factors.
Recently, quite a few studies have shown that machine learning-based methods can discover more complex data patterns for forest fires than the traditional mechanism-or statistic-based models. Jaafari et al. [6] employed WoE Bayesian modeling to investigate the spatial relations between historical fire events in the Chaharmahal-Bakhtiari Province of Iran. Sakr et al. [7] presented a forest fire risk prediction algorithm based on Support Vector Machines (SVM). To predict forest fires, Binh Thai Pham et al. [8] compared the ability of Bayes Network (BN), Naive Bayes (NB), Decision Tree (DT), and Multvariate Logistic Regression (MLP) to map fire susceptibility in Phu Mat National Park, Ninh An Province, Vietnam. Ma et al. [9] built a forest fire probability model based on the Logistic model and the Random Forest (RF) model with the forest thermal anomaly data monitored by satellites from 2010 to 2017. The model analyzes the driving factors of forest fires in the Shanxi Province of China from four aspects: meteorology, topography, vegetation, and human activities. Singh et al. [10] utilized the RF approach for assessing the impacts of the climatic and anthropogenic factors for influencing fire occurrence probability and mapping the spatial distribution of fire risk. Prapas et al. [11] considered daily fire danger prediction as a machine learning task using historical Earth observation data from the last decade to predict the next-day's fire danger. Their Deep Learning-based method provides nationwide daily fire danger maps, with a much higher spatial resolution than the existing operational solutions. The previous research provides a good foundation for forest fire prediction using machine learning methods.
Remote sensing technology can capture a large amount of dynamic information in large observation ranges. The forest fire prediction studies make good use of remote sensing techniques. The MODIS (Moderate-resolution Imaging Spectroradiometer) Fire and Thermal Anomalies data and the VIIRS (Visible Infrared Imaging Radiometer Suite) Fire data show active fire detections and thermal anomalies [12]. Cui et al. [13] used the Terrestrial Water Storage Change (TWSC) generated by Gravity Recovery and Climate Experiment (GRACE) data to analyze the influence of climate change on forest fires in the region between 2003 and 2016. Piralilou et al. [14] evaluated the effects of coarse (Landsat 8 data and SRTM data) and medium (Sentinel-2 data and ALOS data) resolution spatial data on wildfire susceptibility prediction using models based on RF and SVM.
There are many data sources for forest fire predictions. There are barriers for data such as time, space, type, resolution, and coordinate systems. There is a difficult critical problem that needs to fuse the large-scale multi-source heterogeneous data related to forest fires for deeply discovering data patterns. In particular, the spatio-temporal features and attribute features extracted from the data need to be semantically defined. The semantics of the fused data can be beneficial to the spatio-temporal query of data, as shown in Table 1. In addition, the proposed forest fire prediction method requires expertise in fire prediction, which includes forest fire occurrence factor determination, forest fire prediction model optimization strategies, and forest fire prediction model accuracy evaluation. The existing forest fire prediction methods rarely apply semantic descriptions for fire prediction expertise. Therefore, we cannot directly match model parameters or hyper-parameters that are most suited for real-world forest fire prediction scenarios. To address the above issues, we focus on forest fire expertise modeling and rule-based semantic reasoning for forest fire predictions.
In this paper, we propose a forest fire prediction method by considering meteorological, topographic, vegetation, and human activity factors that are closely related to forest fire occurrence. This method builds a knowledge graph for Forest Fire Prediction (KGFFP) to fuse heterogeneous forest fire data from multiple sources. This method differs from forest fire predictions that are only based on machine learning methods. It formalizes the expertise in the field of forest fire as semantic rules and obtains predicting data from KGFFP through a rule-based reasoning method. The predicting data fit specific machine learning models that work as cores of forest fire predictions. The proposed method builds on traditional machine learning-based forest fire prediction. It can fuse multi-source heterogeneous spatio-temporal data and formalize domain expertise. The proposed method helps to improve the efficiency of predictive analysis based on semantic inference rules. Figure 1 shows the comparison between the proposed method and the forest fire predictions based on machine learning methods. Table 1. Comparison between the proposed method and the commonly used data fusion methods for forest fire prediction.

Commonly Used Methods The Method Proposed in This Paper
Through data processing, the time, space, type, resolution, and coordinate systems of data are unified.
From the semantic point of view, the temporal, spatial, and attribute features of the data are fused.
The spatio-temporal data is stored in a relational database and needs to be added, deleted, queried, and changed based on the graph database.
The spatio-temporal data is stored in the graph database and needs to be added, deleted, queried, and changed based on the graph database.
Multi-table joins are required for in-depth searches in relational databases, resulting in low query efficiency.
Based on the graph database, in-depth queries can be made from a limited space-time region. The efficiency of querying various geographic entities and the relationship between geographic entities in forest fire prediction scenarios is high.
for forest fire prediction.

Commonly Used Methods
The Method Proposed in This Paper Through data processing, the time, space, type, resolution, and coordinate systems of data are unified.
From the semantic point of view, the temporal, spatial, and attribute features of the data are fused.
The spatio-temporal data is stored in a relational database and needs to be added, deleted, queried, and changed based on the graph database.
The spatio-temporal data is stored in the graph database and needs to be added, deleted, queried, and changed based on the graph database.
Multi-table joins are required for indepth searches in relational databases, resulting in low query efficiency.
Based on the graph database, in-depth queries can be made from a limited space-time region. The efficiency of querying various geographic entities and the relationship between geographic entities in forest fire prediction scenarios is high.
In this paper, we propose a forest fire prediction method by considering meteorological, topographic, vegetation, and human activity factors that are closely related to forest fire occurrence. This method builds a knowledge graph for Forest Fire Prediction (KGFFP) to fuse heterogeneous forest fire data from multiple sources. This method differs from forest fire predictions that are only based on machine learning methods. It formalizes the expertise in the field of forest fire as semantic rules and obtains predicting data from KGFFP through a rule-based reasoning method. The predicting data fit specific machine learning models that work as cores of forest fire predictions. The proposed method builds on traditional machine learning-based forest fire prediction. It can fuse multi-source heterogeneous spatio-temporal data and formalize domain expertise. The proposed method helps to improve the efficiency of predictive analysis based on semantic inference rules. Figure 1 shows the comparison between the proposed method and the forest fire predictions based on machine learning methods. This paper has the following contributions: (1) we build a knowledge graph-based multi-source heterogeneous data fusion method; (2) we define ontologies for modeling domain expertise of the forest fire risk analysis as semantic rules; (3) we propose a rulebased reasoning method for obtaining the corresponding predicting data required by the machine learning-based forest fire prediction methods from KG according to the specific situations; (4) we show experiments for demonstrating the benefits of the proposed method in the aspects of multi-source heterogeneous spatio-temporal data fusion and machine learning-based forest fire predictions.

Knowledge Graph-Based Forest Fire Prediction System (KGFFP)
A knowledge graph is a semantic modeling method for data representation and modeling through defining entities, concepts, and semantic relations. The rule-based reasoning over knowledge graphs can analyze the entities and relations over knowledge graphs by the specified semantic rules.
The architecture of the proposed forest fire prediction method consists of three basic parts: KGFFP, machine learning-based forest fire prediction algorithms, and a controller which controls rule inference and program flow. Sections 2.1 and 2.2 mainly introduce the construction of KGFFP. The machine learning-based forest fire prediction algorithms in this paper directly use the framework proposed by other existing papers. The controller focuses on the rule-based reasoning for the semantic rules. Section 2.3 shows the implementation of the proposed method. The occurrence of forest fires is closely related to meteorological, topographical, vegetation, and human activity factors [15,16]. In this paper, we build a conceptual model for forest fire prediction considering the above factors. It provides mechanism information for forest fire prediction, including data that are used in KGFFP, as shown in Figure 2. The data type will be a difference between the changes in the research objects and the actual situations.
domain expertise of the forest fire risk analysis as semantic rules; (3) we propose a rulebased reasoning method for obtaining the corresponding predicting data required by the machine learning-based forest fire prediction methods from KG according to the specific situations; (4) we show experiments for demonstrating the benefits of the proposed method in the aspects of multi-source heterogeneous spatio-temporal data fusion and machine learning-based forest fire predictions.

Knowledge Graph-Based Forest Fire Prediction System (KGFFP)
A knowledge graph is a semantic modeling method for data representation and modeling through defining entities, concepts, and semantic relations. The rule-based reasoning over knowledge graphs can analyze the entities and relations over knowledge graphs by the specified semantic rules.
The architecture of the proposed forest fire prediction method consists of three basic parts: KGFFP, machine learning-based forest fire prediction algorithms, and a controller which controls rule inference and program flow. Sections 2.1 and 2.2 mainly introduce the construction of KGFFP. The machine learning-based forest fire prediction algorithms in this paper directly use the framework proposed by other existing papers. The controller focuses on the rule-based reasoning for the semantic rules. Section 2.3 shows the implementation of the proposed method.

Conceptual Model for Forest Fire Prediction
The occurrence of forest fires is closely related to meteorological, topographical, vegetation, and human activity factors [15,16]. In this paper, we build a conceptual model for forest fire prediction considering the above factors. It provides mechanism information for forest fire prediction, including data that are used in KGFFP, as shown in Figure 2. The data type will be a difference between the changes in the research objects and the actual situations.

Mapping between Forest Fire Prediction and KGFFP Architecture
We propose the KGFFP architecture, as shown in Figure 3, to support the fusion of multi-source heterogeneous spatio-temporal data related to forest fires. The architecture

Mapping between Forest Fire Prediction and KGFFP Architecture
We propose the KGFFP architecture, as shown in Figure 3, to support the fusion of multi-source heterogeneous spatio-temporal data related to forest fires. The architecture can match parameters and data for a specific machine learning-based forest fire prediction method.
The KGFFP fuses the spatio-temporal data of forest fire by representing temporal features, spatial features, and attribute features of spatio-temporal data. Therefore, time ontology and space ontology need to be built for the conceptual layer of the knowledge graph. The attribute features of spatio-temporal data strongly depend on the data type (e.g., aspect is one of the important features of terrain data) that have specific hierarchical relations for forest fire predictions (e.g., topographic data include slope, aspect, and elevation). The conceptual model for forest fire predictions is defined as a part of the conceptual layer of the knowledge graph. The unstructured, structured, and semi-structured spatiotemporal data for forest fire predictions can be modeled as a part of the instance layer of the knowledge graph. The temporal features, spatial features, and attribute features of the spatio-temporal data are represented based on the conceptual layer of the knowledge graph. The concepts and actions used by the machine learning-based forest fire predictions are defined as an ontology. According to the concept definition in ontology, the domain knowledge is formalized into the semantic reasoning rules of the knowledge graph.
(e.g., aspect is one of the important features of terrain data) that have specific hierarchical relations for forest fire predictions (e.g., topographic data include slope, aspect, and elevation). The conceptual model for forest fire predictions is defined as a part of the conceptual layer of the knowledge graph. The unstructured, structured, and semi-structured spatio-temporal data for forest fire predictions can be modeled as a part of the instance layer of the knowledge graph. The temporal features, spatial features, and attribute features of the spatio-temporal data are represented based on the conceptual layer of the knowledge graph. The concepts and actions used by the machine learning-based forest fire predictions are defined as an ontology. According to the concept definition in ontology, the domain knowledge is formalized into the semantic reasoning rules of the knowledge graph. The KGFFP consists of the rule set (TBox) and the fact set (ABox). The TBox consists of the conceptual layer and the inference rules. The conceptual layer is the semantic basis for representing the hierarchical relationship of various factors regarding forest fire predictions. The inference rules are the logic basis for multi-source spatio-temporal data that supports semantic reasoning. The ABox constitutes the instance layer of the knowledge graph, which contains instances corresponding to various concepts in the concept layer. The ABox and the TBox together constitute the reasoning mechanism of the KGFFP. The KGFFP consists of the rule set (TBox) and the fact set (ABox). The TBox consists of the conceptual layer and the inference rules. The conceptual layer is the semantic basis for representing the hierarchical relationship of various factors regarding forest fire predictions. The inference rules are the logic basis for multi-source spatio-temporal data that supports semantic reasoning. The ABox constitutes the instance layer of the knowledge graph, which contains instances corresponding to various concepts in the concept layer. The ABox and the TBox together constitute the reasoning mechanism of the KGFFP.

Construction of Forest Fire Prediction Knowledge Graph
This section introduces the design of the concept layer, instance layer, and inference rules of KGFFP. We use Ontology Web Language (OWL) as the knowledge representation language as it is a widely used expressive language for knowledge graphs [17].

Design of Conceptual Layer
The conceptual layer of KGFFP defines the logical semantic relationship among multisource spatio-temporal data for forest fire predictions. It contains semantic concepts and their interrelationships. It ensures the consistency of semantic concepts inherent in multisource spatio-temporal data related to forest fire predictions. The concept layer of KGFFP includes time ontology, space ontology, an ontology for forest fire predictions, and machine learning model concept ontology.

•
Time ontology and space ontology We used the time ontology of Semantic Web Rule Language (SWRL) [18] to represent the common time concept of KGFFP. SWRL provides a uniform specification of the semantic representation of time. It ensures the comparability and computability of the temporal information of entities [5]. We built a space ontology based on the extensions of Geographic Query Language (GeoSPARQL) [19].

•
Ontology for forest fire prediction To define the conceptual model of forest fire predictions as forest fire prediction ontology, we propose a tree-like taxonomy for the conceptual partitioning of multi-source spatio-temporal data [5]. We constructed forest fire prediction ontology according to the KGFFP architecture, as in Figure 4.
KGFFP includes time ontology, space ontology, an ontology for forest fire predictions, and machine learning model concept ontology.

•
Time ontology and space ontology We used the time ontology of Semantic Web Rule Language (SWRL) [18] to represent the common time concept of KGFFP. SWRL provides a uniform specification of the semantic representation of time. It ensures the comparability and computability of the temporal information of entities [5].
We built a space ontology based on the extensions of Geographic Query Language (GeoSPARQL) [19].

•
Ontology for forest fire prediction To define the conceptual model of forest fire predictions as forest fire prediction ontology, we propose a tree-like taxonomy for the conceptual partitioning of multi-source spatio-temporal data [5]. We constructed forest fire prediction ontology according to the KGFFP architecture, as in Figure 4.

•
Machine learning model concept ontology To improve the task of the machine learning-based forest fire predictions, we construct the concept ontology of the machine learning model. It provides a conceptual framework for the optimization and prediction of the machine learning model, as shown in Figure 5. The architecture is applicable to different supervised machine learning models, including but not limited to RF and Deep Forest (DF).

•
Machine learning model concept ontology To improve the task of the machine learning-based forest fire predictions, we construct the concept ontology of the machine learning model. It provides a conceptual framework for the optimization and prediction of the machine learning model, as shown in Figure 5. The architecture is applicable to different supervised machine learning models, including but not limited to RF and Deep Forest (DF). In the model, is short for fire predict, a prefix for named entities in KGFFP. : is a class that represents the optimization stage of the machine learning model. Both : and : denote evaluation metrics, which are objects associated with : .
: is a class that represents the prediction stage of the machine learning model. : is the object of : and : , which represents a machine learning model. Its objects include the model name, prediction target, training area, and testing area. Both training and testing regions are associated with their temporal phase, spatial geometry, regions, and properties of the analytical data to which the model applies. The controller instantiates the above concepts when performing training or prediction. In the model, f p is short for fire predict, a prefix for named entities in KGFFP. f p : ModelOptimization is a class that represents the optimization stage of the machine learning model. Both f p : Precision and f p : Recall denote evaluation metrics, which are objects associated with f p : ModelOptimization. f p : ModelPredict is a class that represents the prediction stage of the machine learning model. f p : Model is the object of f p : ModelOptimization and f p : ModelPredict, which represents a machine learning model. Its objects include the model name, prediction target, training area, and testing area. Both training and testing regions are associated with their temporal phase, spatial geometry, regions, and properties of the analytical data to which the model applies. The controller instantiates the above concepts when performing training or prediction.

Design of Instance Layer
• Instantiation of Multi-source spatio-temporal data Multi-source heterogeneous data in the field of forest fire prevention often have different temporal characteristics. In this paper, multi-source heterogeneous forest fire data are classified into dynamic data and static data according to the update frequency and accessibility, as shown in Figure 2. Dynamic data are indexed according to the hierarchical relationship of year, month, day, hour, minute, and second, while static data usually have only one phase. Data with the same properties can be converted between dynamic and static data according to the changes in update frequency and accessibility.
The data classification can facilitate data modeling with different time resolutions and changes in time resolutions. We can query the multi-source spatio-temporal data by considering the time resolution. Figure 6 shows a multi-source spatio-temporal data query task. The task objective is to query the temperature, vegetation coverage, and slope in Xichang City at 9:00 on 30 March 2020. First, it determines whether the three types of data are dynamic data or static data. Temperature and vegetation coverage are treated as dynamic data, and the slope is classified as static data. Then, it queries for finding a slope with a unique phase. If it exists, the slope acquisition is successful; if it does not exist, the slope acquisition fails. Finally, it queries the temporal resolutions for temperature and vegetation coverage, respectively. When the time resolution of the temperature is an hour, the temperature on 30 March 2020 at 9:00 are obtained. When the temporal resolution of vegetation coverage is month, and the vegetation coverage in March 2020 are obtained.

Design of Inference Rules of Forest Fire Predictions
This section introduces methods for representing forest fire prediction expertise as semantic inference rules in knowledge graph.
There is a lot of expertise used in forest fire predictions. Domain knowledge includes data types related to forest fire prediction, data preprocessing and fusion methods, and prediction strategies of machine learning models. Usually, it is necessary to determine whether the existing situation satisfies the predetermined triggering conditions. If yes, the system will take corresponding response actions. Response actions will, in turn, generate new trigger events. For example, when a historical fire point is detected by the controller, it requires the land cover types corresponding to the spatio-temporal range where the fire point is located. If the land cover type is a built-up area, the fire point is rarely possible recognized as a forest fire. Therefore, we formalize domain expertise as a set of rules, each of which is described in the form of RuleObject. A schedule object defines the events that need the triggers and the actions. Class is entitled by the event name and action name. Classes are related to each other through defined data properties or object properties. Figure 8 shows the syntax of RuleObject. according to the test accuracy.
In the model prediction stage, first, it selects a machine learning model, instantiates : , and instantiates : ; next, it instantiates : and : ; finally, it determines : for model predictions.

Design of Inference Rules of Forest Fire Predictions
This section introduces methods for representing forest fire prediction expertise as semantic inference rules in knowledge graph.
There is a lot of expertise used in forest fire predictions. Domain knowledge includes data types related to forest fire prediction, data preprocessing and fusion methods, and prediction strategies of machine learning models. Usually, it is necessary to determine whether the existing situation satisfies the predetermined triggering conditions. If yes, the system will take corresponding response actions. Response actions will, in turn, generate new trigger events. For example, when a historical fire point is detected by the controller, it requires the land cover types corresponding to the spatio-temporal range where the fire point is located. If the land cover type is a built-up area, the fire point is rarely possible recognized as a forest fire. Therefore, we formalize domain expertise as a set of rules, each of which is described in the form of RuleObject. A schedule object defines the events that need the triggers and the actions. Class is entitled by the event name and action name. Classes are related to each other through defined data properties or object properties.  We extracted actions, sequences between actions, and trigger relations between actions from expertise. The controller defines the trigger events and actions of the RuleObject based on the above information. The principles of rule definition include (1) Expressiveness: Representing the association between different actions and different rule objects; (2) Reusability: Designing rules using defined data attributes and object attributes as many as possible. Figure 9 shows the rules of data extraction for machine learning models from multi-source spatio-temporal data (partial). We extracted actions, sequences between actions, and trigger relations between actions from expertise. The controller defines the trigger events and actions of the RuleObject based on the above information. The principles of rule definition include (1) Expressiveness: Representing the association between different actions and different rule objects; (2) Reusability: Designing rules using defined data attributes and object attributes as many as possible. Figure 9 shows the rules of data extraction for machine learning models from multi-source spatio-temporal data (partial).
Remote Sens. 2022, 14, x FOR PEER REVIEW 10 of 23 Figure 9. The rule set for extracting data for machine learning models from multi-source spatiotemporal data (partial).

Implementation of Spatio-Temporal Knowledge Graph for Forest Fire Prediction
This section shows the implementation of the proposed forest fire prediction method. Figure 10 shows the architecture. Figure 9. The rule set for extracting data for machine learning models from multi-source spatiotemporal data (partial).

Implementation of Spatio-Temporal Knowledge Graph for Forest Fire Prediction
This section shows the implementation of the proposed forest fire prediction method. Figure 10 shows the architecture.
The architecture consists of four basic layers. In the data resource layer, large-scale heterogeneous spatio-temporal data are collected from different data sources. In the knowledge extraction layer, we provide different automatic knowledge extraction methods respectively for the data with different structures. The structured data, semi-structured data, and unstructured data are automatically converted into GeoJSON format. In the knowledge storage layer, the conceptual framework of forest fire prediction ontology is constructed by the Protégé tool according to relations of data for forest fire prediction in the conceptual layer. The ontology data are stored by GraphDB. Based on the forest fire prediction ontology, the forest fire data are converted into triples for building the instance layer. The instance layer is stored by the Key-Value databases. In the analysis service layer, with the help of the KGFFP, it provides services of forest fire prediction by using the spatio-temporal semantic query technology.

Data Resource Layer
This layer is composed of multi-source heterogeneous raw data in the field of forest fire prediction. Forest fire prediction requires multi-source heterogeneous spatio-temporal data whose structures are different. Specifically, meteorology (temperature, humidity, wind direction, etc.) has structured data; fire point distribution and land cover type have semistructured data; and professional knowledge (text) has unstructured data. The topographic data are static data with the lowest updating frequency. Vegetation data and vegetation coverage data need to be updated according to seasonal changes. The update frequency of land cover data is low. The shorter the time difference between the update time and the forecast time, the closer the land cover data are to the actual data. Meteorological data are The architecture consists of four basic layers. In the data resource layer, large-scale heterogeneous spatio-temporal data are collected from different data sources. In the knowledge extraction layer, we provide different automatic knowledge extraction methods respectively for the data with different structures. The structured data, semi-structured data, and unstructured data are automatically converted into GeoJSON format. In the knowledge storage layer, the conceptual framework of forest fire prediction ontology is constructed by the Protégé tool according to relations of data for forest fire prediction in the conceptual layer. The ontology data are stored by GraphDB. Based on the forest fire prediction ontology, the forest fire data are converted into triples for building the instance layer. The instance layer is stored by the Key-Value databases. In the analysis service layer, with the help of the KGFFP, it provides services of forest fire prediction by using the spatio-temporal semantic query technology.

Data Resource Layer
This layer is composed of multi-source heterogeneous raw data in the field of forest fire prediction. Forest fire prediction requires multi-source heterogeneous spatio-temporal data whose structures are different. Specifically, meteorology (temperature, humidity, wind direction, etc.) has structured data; fire point distribution and land cover type have semi-structured data; and professional knowledge (text) has unstructured data. The topographic data are static data with the lowest updating frequency. Vegetation data and It is necessary to collect data according to the characteristics and update the frequency of the data in order to provide accurate and stable data resources for KGFFP.

Knowledge Extraction Layer
We used Protégé to design ontology for the conceptual layer, including time and space ontology. New concepts can be created including the hierarchical relations of classes, object attributes, and the data attributes of classes. The constructed ontology is stored as an RDF file.
We designed different triple transformation methods for different types of multi-source heterogeneous spatio-temporal data for forest fire predictions. It needed to be transformed to a unified coordinate system.
The commonly used vector data format in geographic information can be converted into a GeoJSON format using GDAL library. The time, space, and attribute information contained in GeoJSON can be converted into triples using the Arcpy [20] or GDAL library. It can respectively convert data from a raster gray value to an attribute in the vector data, and from vector format to GeoJSON format. Certain original data of discrete point distribution, e.g., weather station data are inconvenient for comparison with the distribution patterns of other spatial phenomena. Therefore, it uses an appropriate spatial interpolation model to generate raster-type interpolation results according to the distribution of point data. Then, the results are converted into GeoJSON format.
We built a converter that transforms Geometry in GeoJSON into a triple predicate and transforms Geometry values into objects that conform to the GeoSPARQL format specification; the keys of properties in GeoJSON correspond to the predicate, and the values correspond to the object of the predicate.

Knowledge Storage Layer
We used GraphDB to store knowledge. GraphDB is a highly efficient, robust, and scalable RDF database that can perform semantic reasoning at scale for massive loads, queries, and reasoning in real time.

Analysis Service Layer
We built the spatio-temporal semantic reasoning rules (RuleObject) according to the expertise and prediction algorithms. The rules are stored as triples. We used SPARQL for the spatio-temporal semantic query. In this way, it can realize the logical reasoning in ActionObject by the quantitative calculations of the query results.

Forest Fire Prediction Experiment Using KGFFP
This section presents experiments of KGFFP-based forest fire predictions through case study. These investigations show that the proposed method is beneficial for multi-source spatio-temporal data fusion, expertise formalization, and forest fire prediction.

Area of Experiment
We selected Xichang City and Yanyuan County in Liangshan Yi Autonomous Prefecture of China's Sichuan Province for a dynamic prediction experiment of forest fire disasters.
Xichang City is the capital of Liangshan Yi Autonomous Prefecture. It is located between 101 • 46 -102 • 25 east longitude and 27 • 32 -28 • 10 north latitude, with an area of 2882.9 square kilometers. Xichang City has high vegetation coverage and a dry climate, which is prone to forest fires. Historically, there have been forest fires in Xichang, which have caused casualties of rescuers. Therefore, we used Xichang City as the research area. Yanyuan County is adjacent to Xichang City. Yanyuan County is located between 100 • 42 09 -102 • 03 44 east longitude and 27 • 06 31-28 • 16 31 north latitude, with a total area of 8398.6 square kilometers. There is rich vegetation and many shrubs in the area. The forest fire disasters are easily caused by inducing factors such as low rainfall. Therefore, for Yanyuan country, effective forest fire predictions can significantly reduce the damage caused by forest fire. Table 2 shows the sources and details of the experimental data.

Predictive Model
We applied RF in the experiment that respectively builds decision trees for samples that are extracted by using the bootstrap resampling method. According to The Law of Large Numbers, RF has less overfitting [26][27][28]. The dependent variable of RF prediction represents whether the forest fire occurs. The value of 1 represents forest fire occurrence, while value of 0 represents no occurrence. Therefore, forest fire predictions with RF can be treated as a binary classification task.
We applied DF as one of the core machine learning models of the proposed forest fire prediction. In recent years, in the field of computer vision, DF first uses convolutional neural networks (CNNs) to extract deep features [29,30] and then uses RF as a classifier [31,32].

Construction of KGFFP
We built the concept layer of KGFFP using Protégé in the case study, including time ontology, space ontology, forest fire prediction hierarchical ontology, and machine learning model concept ontology. It provided a conceptual framework for modeling the time, space, and attribute characteristics of data.
In addition, we constructed the instance layer of the spatio-temporal knowledge graph. March and April are the periods that are prone to forest fires. In March 2019 and March 2020, Xichang City had a large-scale forest fire for two consecutive years. We selected the data from March and April 2015-2020 as the research object because there is no access to the data for these two years, i.e., the data from 2021 and 2022 are not used as research objects. We collected multi-source heterogeneous data related to forest fire prediction, including meteorological, terrain, vegetation, and human activity data, in March and April 2015-2020. For the mentioned multi-source heterogeneous data, we constructed a diversified knowledge extraction method, which converts the data into triples according to the semantics of the concept layer. It is driven by controller, and its functions include coordinate system conversion, vector data clipping, raster data clipping, raster calculator, conversion from vector data to raster data, conversion from raster data to vector data, slope calculation, aspect calculation, and relative humidity calculation. Figure 11 shows the schematic diagram of the knowledge extraction process of the aspect. First, we used the Arcpy library to convert the coordinate system of the elevation data from GCS_WGS_1984 to WGS_1984_UTM_Zone_48N and cut the elevation data according to the spatial range of the study area. Next, we used the Arcpy library to calculate the aspect according to the elevation and convert the coordinate system of the aspect data from WGS_1984_UTM_Zone_48N to GCS_WGS_1984. The aspect is described in positive degrees from 0 to 360, which is measured clockwise from the north. Then, it decomposes the aspect separately to the east-west and north-south directions. Next, we converted the numerical range of the data from real numbers to integers using the Arcpy library. We provided a trick for reducing the precision loss caused by the data conversion with the Arcpy library. We multiplied the raster values by 10,000 before conversion and carried out the reverse calculations after the conversion. Finally, we saved the final result as a JSON file. We transformed the model optimization process into spatio-temporal semantic reasoning rules in the case study with the following rule sets, as shown in Figure 12.

•
Definition of spatio-temporal semantic rule set 1: Extracting spatio-temporal data from KGFFP to make labeled and unlabeled datasets. • Definition of spatio-temporal semantic rule set 2: Inputting the data set into the machine learning model for model training or prediction with the support of inference rules. • Definition of spatio-temporal semantic rule set 3: Calculating the accuracy based on the prediction results and real data and evaluating the accuracy of the prediction model.
WGS_1984_UTM_Zone_48N to GCS_WGS_1984. The aspect is described in positive degrees from 0 to 360, which is measured clockwise from the north. Then, it decomposes the aspect separately to the east-west and north-south directions. Next, we converted the numerical range of the data from real numbers to integers using the Arcpy library. We provided a trick for reducing the precision loss caused by the data conversion with the Arcpy library. We multiplied the raster values by 10,000 before conversion and carried out the reverse calculations after the conversion. Finally, we saved the final result as a JSON file. Figure 11. Knowledge extraction process using aspect as an example.
We transformed the model optimization process into spatio-temporal semantic reasoning rules in the case study with the following rule sets, as shown in Figure 12.


Definition of spatio-temporal semantic rule set 1: Extracting spatio-temporal data from KGFFP to make labeled and unlabeled datasets.  Definition of spatio-temporal semantic rule set 2: Inputting the data set into the machine learning model for model training or prediction with the support of inference rules.

Predicting Forest Fires
This section introduces Sensitivity analysis of the proposed method and the results of the experiment.

Sensitivity Analysis
The optimal ratio of the number of non-fire points to the number of fire points is needed for reducing the susceptibility to probability calculation errors due to unbalanced sample numbers. We performed a sensitivity analysis method to test the ratio on the RFbased and DF-based forest fire models from 1.0 to 2.0 with a step size of 0.1. The performance of the model was evaluated by the metrics and . The closer the difference between the metrics and 0, the smaller the precision deviation caused by the imbalance in the number of fire samples and non-fire samples.

Predicting Forest Fires
This section introduces Sensitivity analysis of the proposed method and the results of the experiment.

Sensitivity Analysis
The optimal ratio of the number of non-fire points to the number of fire points is needed for reducing the susceptibility to probability calculation errors due to unbalanced sample numbers. We performed a sensitivity analysis method to test the ratio on the RF-based and DF-based forest fire models from 1.0 to 2.0 with a step size of 0.1. The performance of the model was evaluated by the metrics Precision and Recall. The closer the difference between the metrics and 0, the smaller the precision deviation caused by the imbalance in the number of fire samples and non-fire samples. needed for reducing the susceptibility to probability calculation errors due to unbalanced sample numbers. We performed a sensitivity analysis method to test the ratio on the RFbased and DF-based forest fire models from 1.0 to 2.0 with a step size of 0.1. The performance of the model was evaluated by the metrics and . The closer the difference between the metrics and 0, the smaller the precision deviation caused by the imbalance in the number of fire samples and non-fire samples. Based on rule set 2, we trained and tested the RF-based and the DF-based forest fire models, respectively, with a step size of 0.1. We then obtained the predicted results of RF-based and DF-based forest fire models. Based on rule set 2, we trained and tested the RF-based and the DF-based forest fire models, respectively, with a step size of 0.1. We then obtained the predicted results of RF-based and DF-based forest fire models.
Based on rule set 3, we compared the actual and predicted values of the Xichang fire point in March and April 2020 and evaluated the predicted results. The meanings of Precision and Recall are shown in Formulas (1) and (2).
where TP (True Positive) means that the prediction is correct and the sample is positive; FP (False Positive) means that the prediction is wrong and the sample is predicted to be positive, but the sample is actually negative; FN (False Negative) means that the prediction is wrong and the sample is predicted to be negative, but the sample is actually positive. For the RF model, when β = 1.3, the difference between Precision and Recall reaches the minimum value. Moreover, the values of Precision and Recall are relatively high, which is satisfactory. The definition of the metric F1 is shown in Formula (3). The result of the F1 of the RF-based forest fire model is 0.7839. For the DF-based forest fire model, when β = 1.5, the difference between Precision and Recall reaches the minimum value, and F1 = 0.7957. Figure 14 shows the experimental results. Table 3 shows the statistical information of samples. Figure 15 shows the probability map of forest fire risk for March and April 2020 for Experiment 1. The prediction results of between the prediction methods with RF and with DF are not drastically different because both are decision tree-based machine learning models. The prediction methods with both models are significantly different from the prediction results with SVM. In our experiment, we select RF and DF for forest fire models because they have relatively high prediction accuracies.
where F1 is the harmonic mean of Precision and Recall, P indicates Precision, and R indicates Recall. The larger the metric F1, the better the overall performance of the model. High prediction methods with RF and with DF are not drastically different because both are decision tree-based machine learning models. The prediction methods with both models are significantly different from the prediction results with SVM. In our experiment, we select RF and DF for forest fire models because they have relatively high prediction accuracies.
where 1 is the harmonic mean of and , indicates , and indicates . The larger the metric 1, the better the overall performance of the model. High 1 value means both and are good.       Figure 16 shows the metric F1 and the accuracy of the RF model and the DF model. Table 4 shows the statistical information of samples. Figure 17 shows the probability map of forest fire risk for March and April 2020 based on the data of Xichang City from March and April of 2010 to 2019. The prediction results between the prediction methods with RF and with DF are less different because both are decision tree-based machine learning models. The prediction methods with both models are significant different from the prediction results with SVM. In our experiment, we select RF and DF for forest fire models because they have relatively high prediction accuracies.  Figure 16 shows the metric 1 and the accuracy of the RF model and the DF model. Table 4 shows the statistical information of samples. Figure 17 shows the probability map of forest fire risk for March and April 2020 based on the data of Xichang City from March and April of 2010 to 2019. The prediction results between the prediction methods with RF and with DF are less different because both are decision tree-based machine learning models. The prediction methods with both models are significant different from the prediction results with SVM. In our experiment, we select RF and DF for forest fire models because they have relatively high prediction accuracies.

Experiment 2
Since Xichang City and Yanyuan County are adjacent in space, factors such as meteorology, topography, and vegetation are similar, to a certain extent. In that case, the data of Yanyuan County is used as a part of the training samples of Xichang City to predict the accuracy of fire points in Xichang City. Figure 13c,d shows a fire map of Xichang and Yanyuan in March and April 2015-2019 and a fire map of Xichang in March and April 2020 [12].
Positive (quantity is 150) and negative samples were constructed in the unfired area as a training set. Positive samples (quantity is 45) were constructed based on the fire point data in March and April 2020 in Xichang City, and negative samples were constructed in the unfired area as a test set. Figure 18 shows the metric F1 and accuracy of the RF model and DF model. Table 5 shows the statistical information of samples. Figure 19 shows the probability map of forest fire risk for March-April 2020 for Experiment 2. The prediction results between the prediction methods with RF and with DF are not drastically different because both are decision tree-based machine learning models. The prediction methods with both models are significant different from the prediction results with SVM. In our experiment, we select RF and DF for forest fire models because they have relatively high prediction accuracies. Since Xichang City and Yanyuan County are adjacent in space, factors such as me orology, topography, and vegetation are similar, to a certain extent. In that case, the d of Yanyuan County is used as a part of the training samples of Xichang City to predict accuracy of fire points in Xichang City. Figure 13c,d shows a fire map of Xichang a Yanyuan in March and April 2015-2019 and a fire map of Xichang in March and Ap 2020 [12].
Positive (quantity is 150) and negative samples were constructed in the unfired a as a training set. Positive samples (quantity is 45) were constructed based on the fire po data in March and April 2020 in Xichang City, and negative samples were constructed the unfired area as a test set. Figure 18 shows the metric 1 and accuracy of the RF mo and DF model. Table 5 shows the statistical information of samples. Figure 19 shows probability map of forest fire risk for March-April 2020 for Experiment 2. The predict results between the prediction methods with RF and with DF are not drastically differ because both are decision tree-based machine learning models. The prediction metho with both models are significant different from the prediction results with SVM. In o experiment, we select RF and DF for forest fire models because they have relatively hi prediction accuracies.

Discussion
Experiment 1 used two datasets for Xichang city. The first was fire data from March and April 2010 to 2014, and the second added the fire data of Xichang from 2015 to 2019. Experiment 2 was based on Experiment 1. The first dataset was used, the test set was unchanged, and the data of Yanyuan County from 2015 to 2019 was added to the training set of Xichang City from 2015 to 2019. Yanyuan County and Xichang City are geographically adjacent to each other.
The experimental results show that the 1 of the first dataset is lower than that of the second dataset in Experiment 1. We speculate that the first training data leads to insufficient training of the model. The experimental results show that the 1 of the RFbased forest fire model in Experiment 2 increases more than the Experiment 1 with the first dataset, while the 1 of the DF model decreases. Table 6 shows the result of these experiments. The method proposed in this paper has a certain novelty in multi-source heterogeneous spatio-temporal data fusion and formalization of domain expertise. Our method transforms multi-source heterogeneous spatio-temporal data semantically into spatiotemporal facts in KGFFP and fully preserves their temporal, spatial, and attribute characteristics. Compared with conventional machine learning-based forest fire prediction methods, our method is good at integrating multi-source data for a comprehensive analysis of historical fire points. It also supports obtaining relatively high-quality analytical datasets in cases where the information is noisy and insufficient. Compared with traditional machine learning-based forest fire prediction, our method has an advantage in that it can formalize domain expertise as inference rules of KGFFP. For forest fire predictions,

Discussion
Experiment 1 used two datasets for Xichang city. The first was fire data from March and April 2010 to 2014, and the second added the fire data of Xichang from 2015 to 2019. Experiment 2 was based on Experiment 1. The first dataset was used, the test set was unchanged, and the data of Yanyuan County from 2015 to 2019 was added to the training set of Xichang City from 2015 to 2019. Yanyuan County and Xichang City are geographically adjacent to each other.
The experimental results show that the F1 of the first dataset is lower than that of the second dataset in Experiment 1. We speculate that the first training data leads to insufficient training of the model. The experimental results show that the F1 of the RF-based forest fire model in Experiment 2 increases more than the Experiment 1 with the first dataset, while the F1 of the DF model decreases. Table 6 shows the result of these experiments. The method proposed in this paper has a certain novelty in multi-source heterogeneous spatio-temporal data fusion and formalization of domain expertise. Our method transforms multi-source heterogeneous spatio-temporal data semantically into spatio-temporal facts in KGFFP and fully preserves their temporal, spatial, and attribute characteristics. Compared with conventional machine learning-based forest fire prediction methods, our method is good at integrating multi-source data for a comprehensive analysis of historical fire points.
It also supports obtaining relatively high-quality analytical datasets in cases where the information is noisy and insufficient. Compared with traditional machine learning-based forest fire prediction, our method has an advantage in that it can formalize domain expertise as inference rules of KGFFP. For forest fire predictions, expertise-oriented data analysis can organize multi-source heterogeneous spatio-temporal data, promote the efficient flow of the prediction analysis process, and obtain high-quality prediction results. Comprehensively considering the above experimental results, we conclude as follows: (1) With the RF model and the DF model, the forest fire prediction method will improve the metric F1 when adding different years to the training set in the case that the test set remains unchanged.
(2) With the RF model, the forest fire prediction method will improve the metric F1 to a certain extent when adding samples from adjacent areas to the training set (samples from Yanyuan County are added to the Xichang training set) in the case that the test set remains unchanged. Therefore, when desiring to improve the metric F1 of the RF model, it needs to increase the training samples of the same month and different years in the same area or the samples of the same month and the same year in the adjacent area. (3) With the DF model, the metric F1 of the forest fire method cannot be improved when adding samples from adjacent areas to the training set (samples from Yanyuan County are added to the Xichang training set) in the case that the test set remains unchanged. Therefore, when desiring to improve the metric F1 of the DF model, it needs to increase the training samples of the same month and different years in the same area.
It is possible to expand our method for future requirements. If we continuously increase the samples of historical March and April to predict forest fires in March and April of 2020, it will lead to a decrease in prediction accuracy. The main reason is that the characteristics of the historical samples and that of 2020 are quite different. This is caused by the passage of time and the great changes in meteorology, topography, vegetation, and human factors. In addition, the experimental results show that it can improve the metric F1 of the RF model-based forest fire methods by adding fire points in adjacent regions as training samples, while the similarity of meteorology, topography, vegetation, and human factors between regions needs to be studied. Because the similarity of these factors represents the similarity of training and test samples, it needs a lot of experiments to support improving the metric F1.

Conclusions
In this paper, we focus on the fusion of multi-source heterogeneous data and the modeling of forest fire expertise. To address the problems, we propose a novel forest fire prediction method for fusing multi-source heterogeneous data in the field of forest fires. It preserves the semantics of describing spatio-temporal facts. We propose a method to formalize domain expertise as semantic rules for knowledge graphs. It helps to obtain the scenario-related data with spatio-temporal features from KGFFP by rule-based reasoning for the machine learning-based forest fire prediction methods.
The main ideas of the proposed method can also be applied to other disaster predictions by suitable extensions. When predicting landslide forecasting using our method, it needs to collect spatio-temporal data closely related to the occurrence of landslides to expand the knowledge graph. Based on the machine learning model concept ontology, our method can support landslide prediction using machine learning models. It needs to choose a suitable machine learning model suitable for landslide prediction. Based on the characteristics of this machine learning model, our method enriches the machine learning model concept ontology and inference rules. The controller triggers the inference rules to invoke the machine learning model for landslide prediction.
However, to clearly describe the proposed method, this paper takes forest fire as an example to introduce the new ideas. It contains mechanism, model, process, and functions of the method that can effectively improve the forest fire predictions. The contributions of this paper are as follows.
(1) KGFFP integrates multi-source heterogeneous data through semantic technology from the perspective of cross-domain data integration; (2) This paper proposes a method to model the domain expertise. It can effectively represent multi-source expertise with a triples form that can facilitate optimization and prediction of the machine learning models for forest fire prediction scenarios; (3) Relying on the proposed method, the machine learning-based forest fire prediction methods can be optimized according to historical data with satisfied accuracies. In the case of providing future forest fire-related data, it is expected to obtain better forest fire prediction results.
The occurrence of forest fires is greatly affected by temporal and spatial characteristics. The distribution of geographical data in different regions is very different. Therefore, the applications of forest fire prediction models are often limited to a specific spatial range, and it is difficult to transfer and reuse the method. The proposed method aims to integrate realworld multi-source heterogeneous spatio-temporal data and improve forest fire predictions. In the future, we will focus on domain adaptation in the field of computer vision in the forest fire prediction process. We will build a forest fire prediction model based on domain adaptation and multilayer perceptron. Additionally, to verify the generalization of the method, we will conduct model transfer experiments.