Development of Knowledge Graph for Data Management Related to Flooding Disasters Using Open Data

: Despite the development of various technologies and systems using artiﬁcial intelligence (AI) to solve problems related to disasters, difﬁcult challenges are still being encountered. Data are the foundation to solving diverse disaster problems using AI, big data analysis, and so on. Therefore, we must focus on these various data. Disaster data depend on the domain by disaster type and include heterogeneous data and lack interoperability. In particular, in the case of open data related to disasters, there are several issues, where the source and format of data are different because various data are collected by different organizations. Moreover, the vocabularies used for each domain are inconsistent. This study proposes a knowledge graph to resolve the heterogeneity among various disaster data and provide interoperability among domains. Among disaster domains, we describe the knowledge graph for ﬂooding disasters using Korean open datasets and cross-domain knowledge graphs. Furthermore, the proposed knowledge graph is used to assist, solve, and manage disaster problems.


Introduction
Despite the development of various technologies and systems using artificial intelligence (AI) to solve the problems related to disasters, difficult challenges still remain [1]. The quantity of disaster data is vast and consistently varies. A disaster seriously affects human lives and property. Furthermore, it can cause critical damage to the country in which it occurs [2]. Various studies have been proposed to prevent damage or predict disasters. Representatively, future atmospheric conditions are predicted using modeling and supercomputers. Disaster occurrence may also be predicted using AI technology based on previous disaster datasets [3][4][5][6][7]. When a disaster occurs, attempts to predict future situations are carried out through learning using various features, such as estimation of damage to property and buildings as well as economic damage [8][9][10]. These research results can be used to support decision making for disaster management organizations.
First of all, data are fundamental and essential for studies. However, most disaster data depend on the domain by disaster type, include heterogeneous data, and lack interoperability. In particular, disaster data or related information typically require a large number of and various types of datasets such as geographical data, rainfall data, building data, social property data, administrative district data, and so on. Because of these reasons, syntactic and sematic heterogeneity occurs despite several organizations having tried to cooperate with each other in terms of consistent data sharing [11,12]. Likewise, in the case of open data related to disasters, there are several issues in which the source and format of data are different because various data are collected by different organizations [13,14]. The vocabularies used for each domain are also inconsistent [15]. Therefore, it is difficult to access and use critical data or resources related to a disaster in a timely manner [16]. Owing to these reasons, it is hard to find and use suitable data for solving disaster issues.
As regards solving the abovementioned problems, previous studies have already suggested many methods to address disaster data silos and improve interoperability in terms of data management that defines and uses relationships between data [17][18][19][20]. The knowledge graph is one of the representative methods representing the relationship between data along with ontology. It is expressed as a triple consisting of a subject, a predicate, and an object. The subject and the object are represented by nodes as entities. The predicate represents the relationship between entities and is represented by edges. The knowledge graph is excellent at expressing the relationship between heterogeneous data in various domains. However, the previously proposed flooding disaster knowledge graphs are constructed using different standards and terms based on case studies for specific areas; hence, some issues are difficult to interoperate and have limitations in reusability and scalability.
In this study, we propose a knowledge graph for disaster data management related to flooding disasters using open data. Among several kinds of disaster, we focus herein on flood disaster data. As mentioned above, one of the main challenges is sharing and searching for suitable data for domain experts or researchers. Our proposed model improves interoperability, reusability, and scalability using universally existing concepts and schema in the knowledge graph. This study builds a richer concept and relations using actual open data in Korea.
The remainder of this paper is organized as follows: Section 2 presents the background of the study and the related works for flooding disaster knowledge graphs; Section 3 describes our proposed model and the knowledge graph for flooding disasters and presents the principal datasets selected from an open dataset related to flooding disasters; Section 4 discusses the performed experiment; and finally, Section 5 concludes this study.

Background and Related Works
This section introduces the background for the open knowledge graphs and summarizes previous studies on flooding disaster knowledge graphs.

Open Knowledge Graphs
The knowledge graph is excellent at expressing the relationship between large-scale heterogeneous data in multiple domains [21][22][23]. The knowledge graph is expressed as a triple comprising a subject, a predicate, and an object. The subject and the object are represented by nodes as entities. The predicate represents the relationship between entities and is represented by edges. The knowledge graph, first proposed by Google in 2012, aims to improve search results by using semantic search information accumulated from various sources [21]. It is currently used in semantic search engines, recommendation systems, voice search, question-and-answer systems, and AI systems in various domains.
Among the various previously proposed open knowledge graphs, representative studies of the cross-domain are Wikidata [24], DBpedia [25], and YAGO [26]. Freebase was excluded in this study because it was shut down in 2017, and data have been transferred to Wikidata [27]. Wikidata is a collaborative large-scale knowledge-graph-based global community [24] that aims to provide information commonly used in Wikimedia projects, such as Wikipedia. Each item has a unique ID preceded by the item name with a Q (e.g., Q8068 ("Flood")). Therefore, it can provide information in any language without needing translation. Consequently, Wikidata supports reusability and extensibility. DBpedia is an open knowledge graph built by extracting information from various Wikimedia projects [25]. DBpedia is linked to other knowledge graphs, such as Wikidata, Freebase, and OpenCyc, and acts as a hub for linked data. DBpedia describes 4.58 million entities, including persons, places, creative works, organizations, species, and disease domains. DBpedia has the advantage of automatic modification as Wikipedia changes and supports multiple languages and various domains. Meanwhile, YAGO was automatically built from other sources, such as Wikipedia, GeoNames, and WordNet. YAGO3 provides more than 10 million entities and contains more than 120 million facts about these entities as of 2019. One of the advantages of YAGO is that it has a manually evaluated accuracy of 95% [26].

Flooding Disaster Knowledge Graphs
Several studies have already proposed flood disaster knowledge graphs to solve the issue for heterogeneous data using ontology engineering [17][18][19][20]. In [20], the authors proposed a data management and utilization framework for urban flooding. To manage heterogeneous data, the relationship between data was defined in terms of time (e.g., before, after, during, equal, meet, overlap, and disjoint), space, and semantics (e.g., is-a, part-of, equivalent-of, synonymy-of, and relevant-of). The influence index of the factors influencing the flood disaster was calculated using the proposed framework. In [15], the authors approached the solution of data management and sharing problems in disaster response management. They designed a flood disaster response domain ontology using interlocking institutional worlds to solve this problem. Finally, [19] developed an information-centric flood ontology for structural and easy access to critical information, such as disasters. The developed ontology can be used in cyber infrastructure systems for natural disaster preparedness, monitoring, response, and recovery. Furthermore, the proposed methodology can be easily integrated with the domain knowledge of the expert system and used in voice-enabled intelligent applications through a web-based information platform. As mentioned above, most of the previous studies related to flooding disaster knowledge graphs constructed knowledge graphs based on scenarios or case studies. Therefore, although the common main purpose is to solve the issues of heterogeneous data, the knowledge graph is defined with different terms and relationship definitions, which imposes the limitation of difficult reuse or integration. Fortunately, there are recent several studies resulting from the beAWARE project within the European Horizon 2020 program, which aims to integrate and harmonize heterogeneous data from various sources such as sensors, social media, general public data, and data from the first responders for real-time crisis management due to climate-related natural disasters. Those results provide useful ontologies that can refer to the area of knowledge-based disaster management [27,28]. However, not only the integration of data from various sources and formats but also the issue of standardization and interoperability are still ongoing because various organizations still use different terms and data fields despite data having the same meaning.

Proposed Model
This section presents our proposed model. First, we describe a data management model for flooding disasters using open datasets. We then define elements for building the knowledge graph in the model. We propose the knowledge graph for flooding disasters according to the model.

Data Management Model for Flooding Disasters
We propose herein a data management model for flooding disasters to build a knowledge graph using open datasets. We tried to consider the reusability, scalability, and interoperability of the knowledge graph in the other disaster domains. To this end, first of all, open data needs to be identified and refined for their data fields that are used differently for each organization. We analyzed the data fields of Korean open data, extracted common concepts of different data sets, and analyzed the relationship between them [29]. Standardization of data fields is based on the extracted common concepts. For extensibility, our model reuses the vocabulary of the existing knowledge graph and schema using URI [30]. Figure 1 summarizes the proposed model and defined elements. On the basis of this model, we collected 24 datasets related to real flooding disasters in Korea from an open data provider [31]. Table 1 describes the principal datasets selected from the open dataset related to flooding disasters. As shown in Table 1, we extracted the data columns from each dataset required for constructing the knowledge graph through an analysis of domain experts. We then integrated and used the existing knowledge graph concept and schema that are universally used terms. Finally, the knowledge graph was built and stored in a database.

Dataset Title Data Columns-Descriptions
Precipitation data for the river basin

Knowledge Graph for Flooding Disaster
This section presents the proposed flooding disaster knowledge graph. First of all, our knowledge graph was constructed to consider reusability and scalability. The flooding disaster knowledge graph defines a hierarchical skeleton structure between the flooding concept and the existing knowledge graph and schema ( Figure 2). We linked the concept by limiting only Wikidata, DBpedia, and schema.org among various knowledge graphs on the basis of the Flooding reference concepts in KBpedia [32]. The flooding concept has the "same as" relationship with the Flood of Wikidata and has the same meaning. In addition, Wikidata's Natural Disaster and High Tide exist as super classes of Flooding. In other words, all concepts in the flooding disaster knowledge graph are subordinate to these classes. According to Flooding reference concept of KBpedia, we can infer that Wikidata's Hazard, DBpedia's Activity and Event, and schema.org's Action classes are closely related to the Flooding concept. We thus define their relationship as a new relationship called isCloselyRelated. Therefore, the flooding disaster knowledge graph improves the interoperability with existing knowledge bases, search accessibility, and expandability by linking and reusing the general schema with the terms of the existing knowledge graph.
We now explore the principal relationships with Flooding based on the collected open data (Figure 3). The three relationships first applied are "has cause", "has effect", and "subclass of". Using these relationships, we can expand the information from flooding to damage to human lives and property as well as other kinds of natural disasters, such as earthquakes and typhoons. Although the knowledge graphs developed in this study may appear similar to those released for other reference sites, we incorporated many details based on real open data provided by the Korean government. Moreover, the use of similar or the same relationships allows us to easily link our knowledge graphs with other references, which is a good characteristic of our knowledge graphs in terms of compatibility. We first analyzed what causes flooding while considering the causes of domestic and outside water immersion. We could then provide the natural disaster information and status of essential facilities near rivers and residential areas in addition to the list of flooding cases and definitions. We utilized the external data built in a knowledge graph format as much as possible. The left side of Figure 3 shows how we linked the external knowledge graph released by Wikidata. Moreover, we linked the information of possible damage due to flooding to the flooding data. For more detailed information, we included local and national open data; hence, our knowledge graphs are expected to be useful for residents and policy makers at the local region of interest. Accordingly, one can obtain more fruitful flooding information from those linked data with meaningful relationships within our knowledge graph. Those links can be further expanded to various natural disasters based on geophysical background knowledge, such as rainfall, typhoons, summer monsoons, storms, waves, tides, tsunamis, and earthquakes. We expect that these knowledge graphs can be widely used for a range of relevant data and knowledge. The relationships of "has cause" and "has effect" released by Wikidata were considered in our knowledge graph.

Experimental Results and Discussion
We performed an experiment to evaluate the proposed flooding disaster knowledge graph. The experiment aimed to show the usability and the interoperability of search results. Accordingly, the experiment compared the search results of open datasets (data.go.kr), Wikidata, and our knowledge graph according to the query types. We defined simple query types using the concept and relation of the knowledge graph. The query types also consisted of a few questions, but these questions cannot be directly queried from the experimental data. Therefore, we translated each suitable query language or retrieval method such as keyword mapping or manual searching. Table 2 shows three questions and query results. The result of Q1, which is a question to find a concept, showed 21 datasets as the search results when the open datasets were searched for flooding disaster data. We can find datasets such as Namdong-gu, Incheon_Status of flood risk areas, and Korea National Territory Information Corporation Flooding trace information flooding level line (annual). Wikidata returned 338 triples, whereas the flooding disaster knowledge graph returned 360 triples, including Wikidata triples. Q2 and Q3 are questions about the relationship between the data. In the case of open datasets, no query results were provided for Q2 and Q3 because no relation existed between the data. However, Wikidata showed five search results in Q2, whereas the flooding disaster knowledge graph returned more query results, including the search results of Wikidata. In the case of Q3, our knowledge graph returned more query results than did Wikidata. We proposed a model by reusing and extending the existing knowledge graph. In summary, the proposed model provided more information and improved the interoperability in the flooding disaster domain.
The knowledge graphs shown here were based on the data written in ASCII format. However, many more data on natural disasters have been written in binary format. Many operational weather/environment service centers in the world generate high-resolution global weather and environmental prediction data from their numerical forecasting models every day. We have an increasing number of data obtained from remote sensing instruments. Undoubtedly, the numerical forecast and satellite data from remote sensing instruments are useful, vast, and qualified for studies in the field of natural disasters. However, they are usually written in NetCDF and/or HDF5 formats, which are forms of binary format, and we did not collect any binary data for the knowledge graph development in this study. We will certainly address how to utilize such valuable data for our knowledge graphs in the future to give users of our knowledge graph more benefits.

Conclusions and Future Works
This study proposed a data management model and a knowledge graph for flooding disasters using open datasets. Among several kinds of disasters, we focused herein on flood disaster data. We mainly aimed to share and search for suitable data for domain experts or researchers. To this end, our knowledge graph defined a hierarchical skeleton structure between the Flooding concept and the existing knowledge graph and schema and expanded the concept and the relation using open data. We performed an experiment to evaluate the usability and interoperability of the query results. In summary, our proposed model improves interoperability, reusability, and scalability using universally existing concepts and schema in the knowledge graph. Furthermore, we built a richer concept and relations using the actual open data in Korea. The contribution of this study is to define a domain knowledge graph using the Korean open dataset. We defined and described the causes and effects of outside water and domestic immersion in consideration of not only the natural disaster flood data but also the geographical characteristics of Korea. We expect that our efforts to integrate open data related to floods will help users to solve and manage issues of disasters induced by flooding. Furthermore, this study opens up the possibility of extensibility to other disaster domains associated with flooding.
For the future research, various disaster domain data could be investigated or integrated. In addition, as mentioned in the discussion, we should expand the flooding disaster knowledge graph using various data formats.