A Method for Constructing Geographical Knowledge Graph from Multisource Data

: Global problems all occur at a particular location on or near the Earth’s surface. Sitting at the junction of artiﬁcial intelligence (AI) and big data, knowledge graphs (KGs) organize, interlink, and create semantic knowledge, thus attracting much attention worldwide. Although the existing KGs are constructed from internet encyclopedias and contain abundant knowledge, they lack exact coordinates and geographical relationships. In light of this, a geographical knowledge graph (GeoKG) construction method based on multisource data is proposed, consisting of a modeling schema layer and a ﬁlling data layer. This method has two advantages: (1) the knowledge can be extracted from geographic datasets; (2) the knowledge on multisource data can be represented and integrated. Firstly, the schema layer is designed to represent geographical knowledge. Then, the methods of extraction and integration from multisource data are designed to ﬁll the data layer, and a storage method is developed to associate semantics with geospatial knowledge. Finally, the GeoKG is veriﬁed through linkage rate, semantic relationship rate, and application cases. The experiments indicate that the method could automatically extract and integrate knowledge from multisource data. Additionally, our GeoKG has a higher success rate of linking web pages with geographic datasets, and its exact coordinates have increased to 100%. This paper could bridge the distance between a Geographic Information System and a KG, thus facilitating more geospatial applications.


Introduction
In the 2020s, the world has been experiencing the most significant challenges regarding natural disasters and worldwide epidemics. It is clear that these global problems are geospatial-they all occur at a particular location on or near the Earth's surface [1]. At the junction of artificial intelligence (AI) and big data, geographical artificial intelligence has attracted much attention worldwide, and plays an essential role in science and technologies [2,3]. As the backbone of AI, knowledge graphs (KGs) have shown their powerful capabilities in different kinds of intelligent applications, including data retrieval, integration, analysis, etc., [4]. Geographical knowledge graphs (GeoKGs), a kind of domain KG, can organize, interlink, and infer geospatial knowledge; hence they offer excellent opportunities to solve many problems in real life [5]. Geographical knowledge is a higher level of geospatial information, represented by ontology and the semantic web [6]. For example, the knowledge "The Yellow River is the longest river in China" is described as <Yellow River, Longest River, China>, in which Yellow River and China are entities, and Longest River is a relationship. Although most of these techniques focus on representing geographical knowledge, none involve representing knowledge in multisource data. Moreover, the existing KGs are seldom constructed from geographic datasets, thus posing several challenges to the construction of complete geographical knowledge [7]. The first challenge is that most of the existing KGs lack geographical entities and precise coordinates. The second challenge is that it is hard to extract knowledge from multisource data because these entities are organized in different data sources, and their geometry types are complex [8]. The last challenge faced is the diversified spatial relationships between entities, which KG does not consider [9,10].
Using a new GeoKG construction method based on geographic datasets, we expand the scope of usability of geographical data on the internet. Firstly, a system framework is presented to model domain knowledge in a schema layer and extract geographical knowledge in a data layer, thus guiding GeoKG construction.In the schema layer, concepts and their relationships are constructed to represent geographical knowledge. In the data layer, several methods are proposed to extract knowledge, including extracting geographical entities, transforming attributes into triples, extracting concepts and attributes from encyclopedias, and integrating knowledge from multisource data. Finally, the linkage rate and the semantic relationship rate are analyzed, and some application cases are exhibited to verify the ability to obtain related knowledge. The specific contributions are the following: • A GeoKG construction framework is proposed, which extracts knowledge from geographic datasets, and completes these knowledge sets using internet encyclopedias. At the same time, the framework can become a reference for other KGs involved in the same space; • A schema layer for our GeoKG is designed, by which geographic datasets can be formalized to represent geographical knowledge, thus restricting RDF triples in the data layer; • A geographical knowledge extraction method is proposed, by which the entities and attributes belonging to multiple layers, features, and geometries are composed, thus constructing coordinates and spatial relationships for GeoKG.
The remainder of the paper is structured as follows. Section 2 reviews related work regarding KGs and GeoKGs. In Section 3, the system framework for constructing GeoKGs is proposed. Section 4 exploits the linkage rate and semantic relationship rate, and then demonstrates application cases. Finally, the discussion, conclusions, and future work are discussed in Sections 5 and 6.

Geographical Knowledge Graph-Related Literature
In recent years, researchers have paid much attention to GeoKG, the fundamental techniques of which are knowledge representation and KG construction. As such, several research areas and backgrounds related to GeoKG are reviewed.

Introduction to Knowledge Graph
The KG was proposed by the Google Knowledge Graph project to devise a more intelligent search engine. It consists of concepts, entities, literature and relationships, and focuses on extracting and fusing knowledge from online encyclopedias. Furthermore, it enables the semantic search to understand the query intention better, thus providing more concise results. Taking the search sentence of "the length of the Yellow River" as an example, Google can return a knowledge card, and provide an accurate answer of 5464 km based on KG. As shown in Figure 1, when we search for "Yellow River" in the Google search engine, the related web pages will be presented on the left side, and the attributes (such as length, area, headstream, picture of the river, etc.) will be shown on the right side. Nowadays, KGs have become prevalent, and there are some famous KGs. In CN-DBPedia [11], for example, three Chinese internet encyclopedias (i.e., Baidu Baike, Chinese Wikipedia, and Hudong Baike) are used to extract knowledge. Over the past 30 years, many researchers have carried out related work. Hook summarized six application aspects of KGs [12] and Zeyua introduced KGs to explore the scientific literature distribution [13].

Geographical Knowledge Representation
Geographical knowledge representation can be considered the core idea in GeoKGs. In terms of the representation model, Zheng et al. [14] proposed a model based on spatiotemporal processes, while Kacprzyk et al. [15] represented a method employing chains of contexts and patterns of appropriate user behavior in visual analysis. To better represent these fields of knowledge, Mehdi Mekni [16] proposed a virtual geographic environment using a topologic graph of geographic datasets. Similarly, Laurini [17] presented a conceptual framework to manage geographic entities, relationships, and rules. Unlike prior work, Jiang et al. [5] divided geographical knowledge into factual knowledge and process knowledge to describe external characteristics and spatial transformation.
In the last few decades, the use of the semantic web and ontology in knowledge representation have developed considerably [18]. This has fostered a promising way to connect spatial data with KG, thus augmenting the application of geographic datasets [19]. Hence, many geographic datasets have been published in the form of Linked Data, some of which play a prominent role in the Linked Open Data cloud (https://lod-cloud.net/, accessed on 21 September 2021). More governmental agencies and large-scale data infrastructures run Linked Data initiatives, such as e-Government and open data communities in Europe [20]. Furthermore, Varanka and Usery [21] proposed that the geographic data released in RDF can be treated as knowledge; hence, most of the existing techniques focus on ontologies and rules. For instance, Janowicz [22] modeled semantic knowledge in geographic datasets, and Hofer et al. [23] formalized geographic operators. Additionally, Gould and Mackaness [24] used ontologies to formalize generalized cartography knowledge to facilitate the sharing, expansion, and reuse of mapping knowledge.

Geographical Knowledge Graph Construction
As an expanded KG, the GeoKG is a structured semantic knowledge base, which represents rich geographical knowledge in triples [2]. It is regarded as a promising tool to deal with many technical geographic challenges, such as named entity recognition, toponym disambiguation, and spatial reasoning [5]. In GeoKG, RDF triples are used to describe knowledge, and their visualization relies on a "node-edge" graph ( Figure 2). In detail, geographical concepts are represented by nodes, and edges demonstrate relationships in data properties (i.e., the relationship between entities and attributes) and object properties (i.e., the relationship between concepts and entities). As illustrated in Figure 2, the triples <Yellow River, is-a, River> and <Yellow River, inside, China> indicate object property relationships between concepts and entities, and triple <Yellow River, length, "5464 km"> represents a data property. Therefore, GeoKG could link different datasets based on RDF triples, thus enriching geographical knowledge [25]. Some GeoKGs have been constructed from geographic datasets, such as OSMonto, OSM Semantic Network, Yago2, etc., OSMonto [26] is an ontology for Open Street Map (OSM) tags, and the OSM Semantic Network [27] contains RDF triples extracted from OSM tags on Wiki websites. Although OSMonto and OSM Semantic Network extract a large number of concepts, they do not contain geographical entities or employ common-sense knowledge. In addition to concepts, Yago2 [28] extracts entities from Wikipedia. However, it does not contain a lot of geographical entities or Chinese information, because Wikipedia contains only a few Chinese pages. Furthermore, Liu et al. [29] showed that Linked Data have made considerable progress in publishing, retrieving, and integrating data. Based on Linked Data, LinkedGeoData could map OSM into RDF triples to devise a geographic data browser [30]. In terms of GeoKG construction, Chen et al. [2] presented a crowdsourced geographic knowledge graph that extracted different kinds of entities from OSM and enriched them with human geographic knowledge from Wikidata. Under the "One Belt One Road" initiative, Wu et al. [4] introduced the techniques for constructing a Chinese knowledge graph, which have greatly promoted the development of AI.
In summary, these proposed methods provide abundant semantics. However, general KGs lack geographical knowledge. Moreover, most of them are only constructed from internet encyclopedias, and they cannot extract knowledge from geographic datasets, thus lacking precise coordinates and spatial relationships.

The System Framework of GeoKG Construction
The general KG is constructed via a "down-top" approach [4]. In contrast, GeoKG is constructed via a "top-down" method, consisting of two stages: designing the schema layer, and extracting geographical knowledge in the data layer ( Figure 3). The schema layer is used to construct concepts and relationships. In the data layer, methods of extracting, integrating, and storing geographical knowledge are discussed sequentially. Then, concepts and relationships in the schema layer can be completed by generalizing knowledge in the data layer. The first stage of GeoKG construction is schema layer modeling. Hu et al. [31] proposed two design patterns to design the schema layer, including content and logical patterns. The content pattern is adopted to formalize relationships and the geographical concepts of entity, feature, geometry, coordinate, and reference system.
In the data layer, knowledge is automatically extracted from geographic datasets and Baidu Baike. Because it is extracted from multisource data, knowledge integration methods of linking and fusing are adopted to integrate equivalent entities and concepts. With current technology, a single database cannot directly store knowledge and geographic datasets. Neo4j is one of the best graph database management systems, and Spatialite is a database engine with a spatial plugin. Both are used to store extracted knowledge to meet application demands.

Available Data Analysis
The available data sources for constructing GeoKGs include geographic datasets and Baidu Baike.

Geographic Datasets
Geographic datasets are carriers of spatial information that meet the demands of production units and social masses. As a primary type of geographic data, vector data are hierarchical, block-divided, and feature-divided. They consist of two components: one managing spatial data (i.e., geometry) and the other managing thematic data. Vector data represent elements in the form of points, lines, and polygons based on mathematical projection, thus demonstrating locations explicitly. Furthermore, they can easily represent spatial distribution and topological structure because they are stored in a two-dimensional Cartesian coordinate system. Geographic datasets could also be stored in a spatial database in the form of several tables. In each table, rows represent features, columns display attribute values, and geometry columns express coordinates ( Figure 4). Therefore, it is easy to operate the spatial database through Structured Query Language (SQL). However, it is inefficient to query data across different tables, and semantics in geographic datasets are weak. Hence, new ways of organizing geographic datasets are required, and internet encyclopedias should be introduced to complete semantics in geographic datasets.

Baidu Baike
Baidu Baike is the most popular internet encyclopedia in China. It has some advantages, such as covering a wide range of fields, allowing users to edit almost all accessible pages, and expressing entities in the form of web pages. On each web page, labels, images, and information boxes are used to describe entity characteristics.

Schema Layer Modeling
According to Section 3.1, the schema layer is conceptualized and implemented to integrate geographical knowledge.

Features and Geometries
Knowledge sharing and cyclic utilization are the primary functions of the modeling schema layer. GeoSPARQL (http://www.opengis.net/ont/geosparql, accessed on 21 September 2021) ontology is introduced to express geographical knowledge about features and geometries. In the following, the prefixes geo and sf are used to represent the namespaces of GeoSPARQL and simple feature geometries, respectively.
As shown in Figure 5, there are some existing concepts and relationships in GeoSPARQL. To represent geographical knowledge in vector data, the class SpatialObject is created as an extended concept, and all the other concepts are inherited from it directly or indirectly. The object property spatialRelation is used to connect SpatialObject. The concepts Feature and Geometry are constructed as subclasses of SpatialObject, and Feature is linked to one or more Geometry using the object property hasGeometry. Two literals are associated with the concept Geometry via the data properties asWKT and EPSG, which store coordinates in well-known text (WKT) and the spatial reference system of the European Petroleum Survey Group (EPSG). Moreover, the concepts point, curve, surface, and geometry collection are inherited from Geometry to represent geometries in geographic datasets.

Entities
As shown in Figure 5, a prefix gkg is used to limit the knowledge scope, such as the concept gkg:GeoEntity and object property gkg:spatialRelation. Moreover, two disjointed subclasses, named gkg:GeoBaikeEntity and gkg:GeoDatasetEntity, are created to represent the geographical entities extracted from Baidu Baike and vector data, respectively. The object property sameAs represents the linkage between instances of two concepts, thus integrating geographical entities in multisource data. Furthermore, GeoDatasetEntity is connected to one or more Feature concept by the object property hasFeature. Therefore, an entity can represent its geographical information and semantics simultaneously.

Relationships Design
In addition to concepts, relationships also play a significant role in formalizing the real world. As shown in Figure 6, GeoKG consists of two types of geographical relationships: spatial and semantic relationships. The semantic relationship is divided into data property and object property. The object property consists of subclassOf, equivalentClass, is-a, and sameAs. SubClassOf and equivalentClass formalize parent-child relationships and equivalence between concepts, respectively. The relationship is-a associates concepts and instances, and sameAs defines the same geographical entities. Moreover, data properties (e.g., name, width, length, EPSG, etc.) are used to describe geographical entity's attributes. In addition to semantic relationships, topological, distance, and orientation relationships are crucial in GeoKG. In the following subsections, each of these spatial relationships will be described in detail.

Topological Relationship
The topological relationship is invariant under topological transformations, including rotation, scale adjustments, and translation [9]. It is inherited from gkg:spatialRelation, thus representing the proximity between geographical entities. As shown in Figure 7, the topological relationships between entities include intersect, disjoint, contain, within, equal, overlap, touch, and cross. In these relationships, disjoint, touch, intersect, and equal are symmetric, while equal, contain, and within are transitive. Contain and within and disjoint and intersect are inverse. Taking "A contains B" as an example, B is entirely inside of A, and neither the interior nor the boundary of B intersects A's exterior.

Distance Relationship
The distance relationship is defined as the minimum distance between two entities, and it is also inherited from gkg:spatialRelation. Both qualitative and quantitative distances are used in GeoKG. The quantitative distance is expressed by a data property with a precise value. Additionally, qualitative distance is divided into inner-city and inter-city, and these can be converted through thresholds. At the inter-city scale, the minimum speed of a high-speed train (i.e., 250 km/h in China) is used to calculate thresholds. As shown in Figure 8, running times of 20 min (about 25 km), 1 h (about 250 km), 2 h (about 500 km), 5 h (about 1200 km), and more than 5 h are qualitatively described as very close, close, medium, far and very far, respectively. At the inner-city scale, the distances of 3 km, 8 km, 15 km, and over 15 km are qualitatively considered very close, close, medium, and far, respectively. For example, the distance between Zhengzhou Railway Station and Zhengzhou East Railway Station is 11 km, qualitatively described as medium. Zhengzhou is about 130 km away from Luoyang, and the distance relationship is expressed as close.

Data Layer Construction
Geographical knowledge extraction and integration are used to construct a data layer to represent spatial location and morphological characteristics in GeoKG.

Concept Extraction from Geographic Dataset
Generally, concepts are mainly extracted from geographic datasets to complete the schema layer. Concepts of ground object categories (such as Expressway and Transporta-tionWarehousing) are created and connected to the schema layer based on layers in the geographic datasets. Then, in each layer, the attribute fields of geographic datasets (such as "Kind") are used to extract the subclass concepts of ground objects. For example, as shown in Figure 9, the field Kind is used to create concepts RailwayStation and BusStation, which belong to the concept TransportationWarehousing. Then, these relationships can be represented in triples-<TransportationWarehousing, is-a, GeoDatasetEntity>, <BusStation, is-a, Transportation Warehousing>, and <RailwayStation, is-a, TransportationWarehousing>.

Entity Extraction from Geographic Dataset
Aiming at dividing geographical entities into multiple features and geometries, geographical entity extraction rules are designed based on layers and attribute fields. The spatial database includes a list of tables, each of which contains many rows (i.e., geographic features). These rows are composed of fields; property fields express attributes, and geometry fields represent spatial location.
The technical challenge of entity extraction lies in combining geometries. When an entity is only composed of a point, it can be presented in the WKT format of POINT, whose basic unit is a pair of longitude and latitude. The entity geometry format will be POLYGON if the points form a closed-loop containing a list of points; otherwise, it will be LINESTRING. Furthermore, MULTIPOINT, MULTILINE, and MULTIPOLYGON are used to construct entities whose basic units are POINT, LINE, and POLYGON. When the entity is composed of multiple geometry types, its geometric format must be a collection of geometric types.
When combining geometries from layers, the correspondence between entity names and property fields is used to form pairs. As shown in Figure 10, the correspondence between the layer Expressway and its attribute field ID is used to build pairs, creating a triple <Name, has, Layer-Feature IDs>. Then, the fields Geometry, ID, and Name are connected to concepts GeoDatasetEntity, Feature, and Geometry, while other attributes are mapped to data properties. Finally, coordinates and spatial reference systems are also transformed to WKT and EPSG code in the data layer. Besides, entity name and its administrative region are used to identify the same name entities in different areas, thus distinguishing the same name entity in the data layer.

Knowledge Extraction from Encyclopedias
For geographical entities, spatial information and semantics are the main areas of concern. Knowledge in Baidu Baike is extracted by opening an encyclopedia entry based on an entity name and then locating elements using XPath. As shown in Table 1, the rules are designed for extracting titles, synonyms, information boxes, and overview pictures, thus completing entity semantics. The "attribute-value" pairs (including attribute name and value) in the information box are extracted. For an attribute value existing in the extracted entities, an object property is built from the attribute name. Additionally, if the attribute value does not exist in extracted entities, the data property is designed to describe entity semantics. In addition to completing the knowledge in the data layer, the schema layer is also completed based on concepts and relationships extracted from Baidu Baike. As shown in Figure 11, Zhengzhou is extracted as an entity, the attribute value 7446 km 2 is represented as the literal, and the attribute name Area is built as a data property. Additionally, Zhengzhou East Railway Station is built as an entity.

Knowledge Integration
From the above steps, geographical knowledge is extracted from multisource data (i.e., Baidu Baike and geographic datasets). Thus, it is necessary to integrate this knowledge in two ways: knowledge linking and knowledge fusion.
Knowledge linking aims to discover equivalence relationships between entities. The Baidu Baike website address is built with the entity name, and a specified web page about the entity can thus be acquired. Then, the relationship sameAs is created to link these entities. As shown in Figure 12, the entities are extracted from geographic datasets and Baidu Baike, and they are deposited into concepts gkg:GeoBaikeEntity and gkg:GeoDatasetEntity, respectively. Then, the relationship sameAs is built to represent the equivalent property. Differently from knowledge linking, geographical knowledge is fused in terms of attribute fields and values. In terms of attribute fields, fields with the same meaning but different names are unified based on statistics. As shown in Figure 13, the attribute fields Line Length and Mileage are both used to describe the entity length for line geometry, and Line Length is used as a final relationship in the RDF triple (i.e., <Lianluo Highway, Line Length, "4395 km">). In addition to attribute fields, the knowledge extracted from geographic datasets is considered the attribute value used to replace knowledge extracted from Baidu Baike, because of itss accurate geographical coordinates. Although knowledge fusion strategies are a bit simple, they comprise an approach to acquire more accurate geographical knowledge.

Knowledge Storage
Knowledge storage involves saving the acquired knowledge. In a relational database, storing RDF triples is redundant, and the JOIN operations demand more time. Similarly, graph databases cannot support the spatial index and real-time extraction of spatial relationships. Therefore, a single database cannot meet the actual needs. Spatialite is a spatial database with many advantages, such as small size, fast storage, high retrieval speed, and low cost. It is used to store geographical data and some structured semantics. As shown in Table 2, there are four tables in our database. The table GeoEntit stores geographical entities, including generated ID, name, added time, the collection of feature IDs and its corresponding layer name, Baidu Baike information, and geometry WKT. The table Ge-oField_Baike stores statistics about attribute fields extracted from Baidu Baike, including ID, name, frequency, and geographical IDs. The table GeoRelation stores relationships, including relationship ID, name, and frequency. The table re_Geo_Geo stores the RDF triples extracted from geographic datasets, including the triple ID, geographic entity IDs, and relationship ID. To combine knowledge acquired from geographic datasets and Baidu Baike, a graph database (Neo4j) is used to store relationships. In detail, the nodes store concepts, entities, and attribute values, while edges represent relationships.

Experiments and Evaluation
Transportation, warehousing, roads and administration (http://navinfo.com/digitalmap, accessed on 21 September 2021) are used as experimental data sources. Linkage rate, semantic relationship rate, and application cases are demonstrated to exploit the constructed GeoKG.

Linkage Rate
As shown in Table 3, the constructed GeoKG contains over 126,000 entities. There are three entity types: point-type, line-type, and polygon-type entities. More than 15,000 geographical entities are linked to Baidu Baike pages, accounting for 12.17% (i.e., over 15,000 sameAs relationships are constructed in the data layer). Additionally, there is an interesting phenomenon whereby the linkage rate varies significantly between geometry types. Polygon-type entities have a high linkage rate of 100%, while the linkage rates of point-type and line-type entities are as low as 10.22% and 24.23%, respectively.

Semantic Relationship Rate
The geographical entities extracted from geographic datasets are full of exact coordinates, represented by EPSG and WKT. Moreover, the semantics are enriched because of their linkage with Baidu Baike. As shown in Table 4, the semantic relationship rate of the Chinese name is 100% because of the opening of Baidu Baike pages based on entity names. Although the geographical location rate reaches more than 88%, there are only 215 entities with precise geographic coordinates, accounting for 1.39%. The missed coordinates can be completed using the geographic datasets, thus increasing the rate to 100%.  Figure 14 demonstrates the retrieval process based on GeoKG. The first step is to click on the map, thus identifying the nearest entity on the map. Then, entities are retrieved via their semantic and spatial relationships in the databases. Finally, information about these entities can be shown on the map or in a graph. The application cases based on GeoKGs are as follows.

Processing One Layer
By processing the railway layer in geographic datasets, semantics and exact geographic coordinates can be obtained. Taking Longhai Railway as an example, the knowledge card will be shown on the right side, containing its overview, pictures and semantics (Figure 15). At the same time, the spatial information is displayed on the map.

Processing Multiple Layers
In addition to knowledge about the clicked-on entity, information about the past administration can be obtained after processing the polygonal province layer in geographic datasets. As shown in Figure 15, the entity Longhai Railway (i.e., the black parts) and its past areas (i.e., the green parts) are represented on the map.
The relationship between point and line is hard to judge directly because of the deviation between point and line. Hence, the point-line relationship is acquired by GeoKG. In Figure 16, the railway stations in Longhai Railway are shown on the left side, and the details of Zhengzhou are represented on the right side (including a detailed map and a graph).

Discussion
The advantages and limitations of GeoKG construction are the main focuses of this study. The GeoKG integrates semantic characteristics with spatial characteristics to understand the real world.
The GeoKG is compared with two other KGs: CrowdGeoKG [2] and CKG [4]. Against the OBOR background, CKGs focus on extracting geographical entities from internet encyclopedias about the countries along OBOR. However, CKG does not consider geographic datasets, and lacks precise coordinates. Although CrowdGeoKG integrates knowledge from OSM and Wikidata, it lacks the support of extracting entities from geographic datasets (such as shapefile), and its linkage rate between OSM and Wikidata is only 6.62%. Compared to the above two methods, our GeoKG regards geographic datasets as a main data source and Baidu Baike as an assistant data source, whose linkage rate is increased to 12.17%. It also offers two more advantages. Firstly, the map and KG are integrated to simplify the GIS interactions. Secondly, a spatial database and a graph database are used to better support multi-source heterogeneous data fusion.
There are some design trade-offs of GeoKG. Firstly, geospatial cognition has the characteristics of levels and regions [32]. However, most of the spatial relationships extracted from two-dimensional space are erroneous, because they are in different levels or regions. In light of this, spatial relationships are constructed in the schema layer, and then extracted in real-time. To compensate for extraction time, the spatial database is introduced to improve efficiency. Although our method can acquire abundant geographical knowledge, it cannot extract knowledge from raster and trajectory data. Aiming at completing the GeoKG with more data sources, deep learning and image processing technology will be introduced to extract knowledge from these data.

Conclusions
KGs have attracted a lot of attention worldwide, and play an essential role in AI. However, general KGs lack geographical knowledge. In this paper, both geographic datasets and Baidu Baike are taken as data sources to extract geographical knowledge and semantics. In the schema layer, concepts and relationships are modeled to represent geographical knowledge based on GeoSPARQ. In the data layer, geographical knowledge is extracted, interlinked, and transformed into RDF triples. Then, both graph and spatial databases are used to store geographical knowledge. Furthermore, the GeoKG is verified through the linkage rate, coverage rate, and application cases. The results indicate that the method could automatically extract knowledge from multisource data and combine accurate spatial location with semantics. Additionally, our GeoKG has a higher success rate of linking web pages with geographic datasets, and the accuracy of its coordinates has increased to 100%.
In a word, GeoKGs have become a new research hotspot, which can integrate multisource geospatial data and promote GIS to realize the combination of accurate spatial location and semantics. Thus, they are of great significance to the extension of geographic data into knowledge.

Data Availability Statement:
The data of the current study are available from the corresponding author based on a reasonable request.