CIDOC CRM-Based Knowledge Graph Construction for Cultural Heritage Using Large Language Models

Wang, Yue; Zhang, Man

doi:10.3390/app152212063

Open AccessArticle

CIDOC CRM-Based Knowledge Graph Construction for Cultural Heritage Using Large Language Models

by

Yue Wang

and

Man Zhang

^*

School of Architecture and Urban Planning, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(22), 12063; https://doi.org/10.3390/app152212063

Submission received: 14 October 2025 / Revised: 4 November 2025 / Accepted: 12 November 2025 / Published: 13 November 2025

Download

Browse Figures

Versions Notes

Abstract

The cultural heritage of the Liao dynasty in Chifeng encompasses significant historical and cultural information that requires systematic digital preservation and management. However, heterogeneous data sources across museums, archives, and research institutions lack semantic interoperability, creating barriers for cross-system integration and knowledge discovery. This study proposes a standardized knowledge graph construction method by integrating the CIDOC Conceptual Reference Model version 7.2 with large language models. A unified ontology framework enables semantic consistency across diverse heritage data, while Generative Pre-trained Transformer-based models automatically extract structured triples from unstructured texts through prompt engineering and entity disambiguation, with the resulting knowledge graph implemented in Neo4j graph database. The constructed knowledge graph integrates 106 immovable cultural heritage records from Chifeng City with approximately 20 types of semantic relationships, forming a comprehensive semantic network covering people, places, events, time, and materials. K-means clustering reveals five cultural value themes, including “Nomadic Imperial Power System” and “Multi-Capital Governance Network”, while geospatial mapping identifies a “dual-core and ring-belt” distribution pattern for heritage protection zoning. This research demonstrates how international semantic standards can be integrated with artificial intelligence technologies to enable interoperable cultural heritage knowledge systems, providing practical implications for cross-institutional heritage management and archaeological survey planning.

Keywords:

CIDOC CRM; cultural heritage; geospatial analysis; knowledge graph; large language models; semantic interoperability

1. Introduction

With the development of digital technologies, historical and cultural heritage data has become increasingly diversified and heterogeneous. In traditional cultural resource management models, issues such as data silos in institutions like museums and archives have been pervasive, making it difficult to achieve information sharing and knowledge discovery. To address this challenge, both international and domestic policies have been proposed in recent years to promote the digital transformation of cultural heritage. The digitalization of cultural heritage information involves the construction of knowledge graphs, dynamic cataloging of artifact information, and data exchange.

UNESCO has established comprehensive guidelines for cultural heritage preservation and digital management, emphasizing global knowledge sharing and cross-cultural communication. Among these initiatives, CIDOC-CRM has emerged as the international standard for cultural heritage data interoperability, playing a pivotal role in advancing digital preservation and information exchange across institutions [1,2]. In parallel, China’s “14th Five-Year Plan for Cultural Heritage Protection and Technological Innovation” has accelerated national efforts toward heritage digitalization and informatization. The plan outlines the necessity of creating a national cultural heritage resource catalog, integrating cultural heritage resource information with national spatial planning, improving the national archaeological excavation management system, and establishing a standardized system for cultural heritage digitalization. These policies provide clear direction for the integration, preservation, and dissemination of cultural heritage information resources. Cultural heritage informatization faces challenges in exchanging information between heterogeneous sources. The mapping of metadata to standardized ontologies is critical, and ontologies play a mediating role in this process, enabling consensus on conceptual definitions and their semantic relationships [3]. CIDOC-CRM, as the core ontology framework for semantic interoperability in cultural heritage information, systematically constructs formal representations of both tangible cultural entities and intangible cultural phenomena and their interrelationships, thereby establishing standardized semantic interfaces for cross-system data exchange. This ontology model was officially adopted by the International Organization for Standardization (ISO) in 2006 as ISO 21127 [4] and was equivalently adopted in China in 2019 as GB/T 37965-2019 [5], “Information and Documentation—Reference Ontology for Cultural Heritage Information Exchange.” Knowledge graphs, as a semantic data fusion technology, utilize triples to represent factual relationships, offering new pathways for the integration and retrieval of cultural heritage information [6,7]. The artifact knowledge graph built based on the CIDOC-CRM conceptual reference model can uniformly represent events, entities, and attributes in the cultural heritage domain, thereby enhancing interoperability between different data sources.

Three critical gaps limit current cultural heritage digitalization research. First, while CIDOC-CRM provides a robust ontological framework for semantic interoperability, its application to specific regional heritage resources—particularly for culturally distinct dynasties such as the Liao in China—remains limited, with few studies demonstrating practical implementation and—validation. Second, although large language models have shown promising capabilities in general-domain knowledge extraction, their integration with domain-specific ontology frameworks like CIDOC-CRM for automated cultural heritage knowledge graph construction has been insufficiently explored. Existing approaches often rely on manual annotation and expert curation, which are time-consuming and difficult to scale for heterogeneous multi-source heritage data. Third, most cultural heritage knowledge graphs focus primarily on descriptive metadata without systematically integrating spatial distribution analysis and cultural value theme extraction, limiting their utility for heritage protection planning and archaeological research. To address these gaps, this study proposes an integrated approach that combines CIDOC-CRM standardization with large language model-based automated extraction to construct a comprehensive knowledge graph for Liao dynasty cultural heritage in Chifeng.

This study focuses on the intelligent information management of Liao dynasty artifacts in Chifeng City. By combining the CIDOC-CRM semantic model, we construct a cultural heritage resource knowledge graph based on CIDOC-CRM as the ontology. Through the integration of multi-source information, such as literature, books, and census data, and utilizing large language models for automated entity and relationship extraction, this study achieves in-depth integration and application of cultural resource information. Furthermore, through value theme induction and geographic spatial analysis, the study reveals the cultural logic and protection priorities of artifact distribution.

The main contributions of this study are as follows:

We propose a standardized knowledge graph construction method that integrates CIDOC Conceptual Reference Model version 7.2 with large language models, enabling automated extraction of structured knowledge from heterogeneous cultural heritage texts while maintaining semantic consistency across diverse data sources.
We design and implement a CIDOC-CRM-based ontology framework specifically tailored for Liao dynasty cultural heritage in Chifeng, mapping 106 immovable cultural heritage records with approximately 20 types of semantic relationships into a comprehensive knowledge network covering people, places, events, time, and materials.
We demonstrate the practical application of Generative Pre-trained Transformer-based models for domain-specific knowledge extraction in cultural heritage, addressing challenges such as entity nesting and the scarcity of annotated samples through prompt engineering and entity disambiguation techniques.
We reveal five distinct cultural value themes through K-means clustering analysis—“Nomadic Imperial Power System”, “Multi-Capital Governance Network”, “Ritual Order and Ancestor Worship”, “Buddhism Dissemination and Cultural Fusion”, and “Sacred Mountains and Frontier Landscape Order”—providing new insights into the cultural logic of Liao dynasty heritage.
We identify a “dual-core and ring-belt” spatial distribution pattern through geospatial mapping, offering evidence-based recommendations for heritage protection zoning, archaeological survey prioritization, and cultural tourism planning.

The remainder of this paper is organized as follows: Section 2 reviews related literature on cultural heritage knowledge graphs, CIDOC-CRM ontology applications, and large language model-based information extraction. Section 3 describes the research approach, including the research methodology, research object, and technical roadmap. Section 4 presents the detailed construction process of the knowledge graph based on CIDOC-CRM, covering ontology model construction, RDF triple extraction, and Neo4j implementation. Section 5 presents the results, including graph scale and coverage, visualization interface, value theme generation, and geospatial mapping analysis. Section 6 concludes the paper with a summary of findings, discussion of limitations, and directions for future research.

2. Literature Review

In recent years, substantial progress has been made worldwide in cultural–heritage digitization, particularly in applying the CIDOC–CRM ontology and integrating large language models (LLMs). The 2023 edition of ISO 21127 reinforces CIDOC–CRM’s role as the lingua franca for semantic interoperability across institutions [4,8]. In China, scholarship has increasingly centered on local resource characteristics to form a CIDOC–CRM–based ontology construction paradigm and to operationalize knowledge graphs across literature, museum holdings, and archaeological sources [9,10,11]. Representative domestic efforts include the large–scale ancient–text knowledge graph by Ouyang Jian et al., covering 650 k+ book titles, 220 k+ authors, ~1.5 M editions and 13 k+ toponyms—providing a comprehensive, interconnected description of bibliographic knowledge [12,13]. Zhuang Ying’s CRM--ACA leverages expert cataloging and semantic modeling of museum collections to build the Palace Museum’s heritage KG and to furnish an extensible intelligent framework for heritage information organization and use [14,15]. In parallel, deployment–oriented international profiles facilitate implementation: Linked Art engineers CRM–style event patterns in JSON–LD, while the Europeana Data Model (EDM) supplies aggregation–scale specifications and mapping guidelines—both emphasizing compatibility with CRM [16,17].

Nested entity recognition and limited training data pose unique challenges for heritage text processing. Researchers have addressed these issues by combining domain NER with knowledge-enhanced strategies [18,19]. For artifact corpora, Wang et al. created the FewRlicsData dataset and proposed RelicsNER, which supports robust span detection and subsequent CRM property alignment [20]. On relation/triple extraction, domestic work has applied ChatGPT-4 to classical-Chinese corpora, comparing prompt templates and light-tuning strategies; reported F1 scores of 56.07% and 30.50% on two datasets highlight domain-adaptation potential under limited supervision [21].

Knowledge graph and deep learning integration has advanced intelligent heritage data management globally. Huang et al. show that cultural heritage is a leading application area for KGs and the semantic web; using Palace Museum ceramics as a case, they report improvements in visualization, interconnectivity, and retrieval [6,22,23,24,25]. On standard-based integration, Câmara et al. mapped EU “COURAGE” data to CIDOC–CRM and validated quality via SPARQL/SHACL, demonstrating practical unification and sharing across heterogeneous sources [26]. Project-level alignment with Linked Art and EDM continues to grow, positioning CIDOC–CRM as the “semantic glue” among diverse resources [16,17].

LLM-based extraction has evolved from general text to domain literature. Evidence suggests that, under few-shot prompting, GPT-3 can approach supervised models on relation extraction [27], while modest fine-tuning of GPT-3/Llama-2 enables joint NER-RE to capture complex scientific records [19]. Broader evaluations find that zero/few-shot LLMs are inconsistent for NER but can surpass baselines on relation extraction with small prompt/tuning budgets—underscoring the need to report precision/recall/F1 and experimental configurations for reproducibility [28]. Architecturally, a metamodel-based extensible NLU design frames prompts, extraction schemas and post-processing rules as first-class artifacts, enabling provider-agnostic interfaces and maintainable scaling—principles directly applicable to end-to-end LLM+CRM pipelines [29]. However, recent comprehensive reviews highlight persistent challenges in deploying LLMs for domain-specific tasks, including data bias, prompt engineering complexity, and the need for robust evaluation frameworks tailored to specialized domains such as cultural heritage [30].

Current research exhibits complementary regional strengths: Chinese scholars prioritize ontology construction for museum collections and classical texts using few-shot learning [31], whereas international efforts focus on standardization and knowledge graph integration with deep learning models. The intersection—building internationally compliant heritage KGs in China and using LLMs to enhance cultural-text processing—will further accelerate the digitization and informatisation of heritage scholarship.

3. Research Approach

3.1. Research Methodology

This study adopts a combined approach of ontology modeling and automated information extraction. Firstly, based on the CIDOC-CRM conceptual reference model, semantic definitions and structural modeling are applied to the cultural heritage objects and related entities from the Liao dynasty period in Chifeng. Elements such as people, organizations, events, places, and objects are mapped to the corresponding categories within the CIDOC-CRM model, providing a unified description of the various attributes and relationships of the cultural artifacts. Following this, knowledge is extracted from multi-source heterogeneous texts, including literature, census tables, local chronicles, and related books. Large language models, such as GPT, are employed to automatically identify entities and relationships within the text and represent the facts in the form of subject-predicate-object triples. After normalizing the extracted triples, they are imported into a Neo4j graph database for data storage and querying. The entire methodological process covers key stages such as data preprocessing, conceptual modeling, automated extraction, and knowledge graph construction, ensuring the scientific rigor and practical applicability of the research results.

3.2. Research Object

Chifeng, located in the Inner Mongolia Autonomous Region, lies in the agricultural-pastoral zone and enjoys a favorable geographical position. Its nearly ten thousand years of cultural development have fostered a profound cultural heritage, with its cultural history traceable to the prehistoric Longshan culture. By the early 10th century, the Khitan Liao culture, which originated in this region, influenced the Central Plains. Through the construction of a cultural heritage resource model based on the CIDOC-CRM ontology, this study integrates entities related to Chifeng’s Liao dynasty period, including artifacts, excavation sites, historical figures, artifact attributes, and the discovery processes, into a unified framework [32]. This approach facilitates the structured representation of cultural heritage and its associated information. The data sources include various forms, such as archaeological census records, historical literature, local chronicles, and academic works. These data, originally dispersed in documents and tables, are extracted and integrated through this methodology, ultimately resulting in a knowledge graph that provides a comprehensive overview of the Liao dynasty cultural heritage in Chifeng, serving as a data foundation for historical research and heritage conservation.

3.3. Technical Approach

The research constructs a knowledge graph for Chifeng’s Liao dynasty cultural heritage resources based on the CIDOC-CRM reference model. The primary goal is to build a historical and cultural network centered around the heritage sites, enabling a holistic display of Liao dynasty artifacts and related historical and cultural information in Chifeng (Figure 1). The main data sources for cultural heritage resources include literature, archaeological census records, local chronicles, and books. After initial data collection, preprocessing is performed, including standardization, deduplication, and cleaning, to construct a corpus suitable for subsequent knowledge extraction. In the ontology modeling stage, domain-specific ontologies are designed based on CIDOC-CRM, mapping artifact types, places, people, events, and their attributes to the ontology, thus constructing a semantic framework for future extraction and integration. GPT and other large language models are then utilized for text analysis, enabling the automatic identification of entities and the extraction of subject-predicate-object triples [33]. Finally, the extracted triples are imported into the Neo4j graph database, where nodes and relationships are created, integrated, and queried to achieve a networked semantic representation of Chifeng’s Liao dynasty cultural heritage resources.

4. Construction of the Knowledge Graph Based on CIDOC CRM

4.1. Ontology Model Construction Approach and Methodology

The “Seven-Step Method” is an effective approach for constructing domain-specific ontologies. This study focuses on immovable cultural heritage from the Liao dynasty in Chifeng, integrating the CIDOC CRM ontology model to build a semantic knowledge graph for this specific domain.

The ontology construction process is divided into seven steps:

Defining the Domain and Objectives of Ontology Construction

The ontology is constructed for the domain of “Immovable Cultural Heritage of the Liao Dynasty in Chifeng”, focusing on material cultural resources such as historical sites, ancient buildings, inscriptions, and city ruins in the Chifeng region during the Liao period. The objective is to systematically integrate the spatiotemporal information, construction and restoration processes, cultural attributes, and current conditions of the related heritage resources through ontology modeling, forming a structured, computable, and visualizable knowledge system that supports the subsequent construction of the knowledge graph and digital management of cultural heritage.

2.: Selecting and Expanding the Applicable Ontology Framework

In the field of cultural heritage semantic modeling, the internationally recognized CIDOC CRM (Conceptual Reference Model) is widely adopted. Based on the core architecture of the CIDOC CRM v7.2 model, and referring to its classes and properties system, the model is appropriately trimmed and extended to account for the specific characteristics of Chifeng’s Liao dynasty heritage resources. Local semantic subclasses, such as Liao dynasty place names, Liao dynasty architectural types, and inscription information, are added to ensure the completeness of the semantic expressions within the project’s domain.

3.: Organizing the Terminology System of the Domain

Drawing on sources such as the Chifeng Cultural Heritage Records, “History of Liao”, local chronicles, and archaeological survey materials, a list of terms related to immovable Liao dynasty cultural heritage in Chifeng is compiled. These terms include, but are not limited to, Liao dynasty palace site ruins, Khitan religious buildings, stone inscriptions, moat site ruins, Liao dynasty city walls, and cliff inscriptions. These terms will serve as extensions or subclasses of the E55 Type class in the ontology.

4.: Defining Core Concept Classes and Hierarchical Structure

The top-level concept in the ontology is “Immovable Cultural Heritage Resources” (inherited from CIDOC’s E24 Physical Man-Made Thing), which includes subclasses such as historical sites, ancient buildings, city ruins, stone inscriptions, and monuments. Other core classes include: E53 Place (geographical location), E52 Time-Span (time range), E21 Person (builder), etc. A reasonable hierarchy and semantic network are established by integrating the temporal, spatial, person, and event information related to Liao dynasty artifacts.

5.: Defining Ontological Concept Properties

Based on the clarified class structure, corresponding properties are defined for each concept class, as exemplified in Table 1.

6.: Setting Constraints on Properties

Strict domain and range definitions are set for the properties defined in the ontology. Such constraints ensure data consistency and semantic integrity. For example, the domain of P4 “has time-span” is set to E22 Man-Made Object or E11 Modification, and the range is E52 Time-Span; the domain of P7 “took place at” is set to E5 Event, and the range is E53 Place; the domain of P108 “has produced” is E12 Production, and the range is E22 Man-Made Object.

7.: Ontology Instantiation, Data Storage, and Graph Visualization

After completing the ontology modeling, methods such as text mining, entity recognition, and rule extraction are used to extract triples conforming to the CIDOC structure from historical texts, which are then mapped to the ontology model for instantiation. Finally, the structured data is imported into the Neo4j graph database in RDF or TTL format, creating the knowledge graph for immovable cultural heritage of the Liao dynasty in Chifeng. This graph enables the visualization of entities, relationships, temporal, and spatial information, along with semantic query support [28]. The construction of the Chifeng Liao dynasty cultural heritage ontology is shown in Figure 2.

4.2. RDF Triple Extraction

A critical step in constructing the knowledge graph is the automatic extraction of RDF triples (subject-predicate-object) from texts related to Liao dynasty cultural heritage. This study designs a method based on large language models and domain-specific named entity recognition (NER), transforming unstructured text into triples that conform to the CIDOC CRM ontology semantics.

First, we collected multi-source textual data on Liao dynasty cultural heritage, including encyclopedia entries, archaeological reports, and historical research articles, covering structured, semi-structured, and unstructured formats [8]. The raw corpus is cleaned, segmented into sentences, and standardized, removing irrelevant information and splitting it into manageable segments for processing. Given the complexity of terminology and the nested nature of entities in the cultural heritage domain, domain-specific Named Entity Recognition (NER) models, such as the RelicsNER method [4], are often used. In this study, NER is combined with NLP techniques, using OpenAI’s GPT-5 model for entity recognition. For example, in the sentence, “The Lingyan Temple Pagoda was built in the fifth year of Liao Tianqing, located in Balin Left Banner, Chifeng City”, the GPT model can identify “Lingyan Temple Pagoda” as the artifact entity, “fifth year of Liao Tianqing” as the time entity (which can be standardized to 1115 AD), and “Balin Left Banner, Chifeng City” as the location entity [16].

Based on the identified entities, the study uses a combination of rule-based methods and large language models to extract the semantic relationships between entities, forming triples. To fully leverage the semantic understanding capabilities of the large model, a CIDOC-CRM-defined prompt is designed: a predefined list of entity and relationship types is provided to GPT, instructing it to extract the corresponding subject-predicate-object triples from the text and output them in CSV format. Extraction patterns are constructed based on the CIDOC CRM ontology, defining entity types (such as “artifact”, “person”, “place”, “time”, etc.) and relationship types (such as “creator”, “discovery location”, “period”, “material”, etc.), along with the subject and object types connected by each relationship, which serve as constraints for GPT’s information extraction. This pattern functions as a guide for the model. For instance, in the above example, under our prompt, the large model will extract the following candidate triples: (Lingyan Temple Pagoda, Creator, XX), (Lingyan Temple Pagoda, Construction Time, 1115 AD), (Lingyan Temple Pagoda, Location, Balin Left Banner, Chifeng City), and so on. Prompt details are provided in Appendix A. After entity extraction and triple construction, the extracted entities and relationships undergo semantic standardization to align with the class and property systems of the CIDOC CRM model. This process includes entity disambiguation and category mapping: entities that refer to the same concept (e.g., “Liao dynasty” and “Khitan”) are merged and linked to the unique entity node in the knowledge base. Entities are then categorized according to the CIDOC CRM, such as mapping person names to E21 “Person”, locations to E53 “Place”, time periods to E52 “Time-Span”, and events (such as construction or excavation) to E5/E7 “Event/Activity,” etc. For relationship predicates, they are similarly mapped to the attributes defined by CIDOC, for example, “creator” corresponds to “E12 Production Event” in CRM, which is linked to “E21 Person” through property P14, indicating that a person carried out the construction activity. “Discovery location” corresponds to “E7 Activity” linked to “E53 Place” via property P7, indicating that the discovery site was the location of an archaeological excavation activity. “Period” corresponds to the production or usage time of the artifact, connected to the time entity via property P4. Through this semantic alignment of entity types and relationships, the extracted knowledge seamlessly integrates into the CIDOC CRM ontology structure, ensuring consistent representation across data sources. The triples extracted in this study are stored in a unified format, such as <subject, predicate, object>, with each subject and object tagged with type labels for later import into a graph database or conversion to RDF format [17].

Despite automation of extraction and transformation, certain ontology alignment and entity-merging operations remain semi-manual. For example, automatically extracted entity names often have multiple homonyms or ambiguous contexts; a single label like “Hu Guo Temple” may refer to distinct cultural objects or administrative units across texts. These ambiguous matches require human expertise to select the correct CIDOC–CRM class and unify duplicates. In addition, entity deduplication and linking across heterogeneous sources (e.g., merging “Upper Capital” from chronicles with “Shangjing City Site” from archaeological reports) are manually validated to preserve historical accuracy. These semi-manual processes currently limit scalability, as human experts must check ambiguous matches and adjust mapping rules. To enhance scalability, future work will incorporate ontology alignment tools (e.g., OpenRefine RDF Extension, Silk, or Alignment APIs) and active-learning workflows to progressively automate entity reconciliation while retaining expert oversight. Incorporating such semi-automated alignment can reduce manual burden, ensure semantic consistency, and enable knowledge-graph construction at larger scale.

4.3. Node and Relationship Design for Knowledge Graph Construction Using Neo4j

To instantiate the extracted RDF triples into a knowledge graph, this study uses the Neo4j graph database to store and manage the knowledge. As a native graph database, Neo4j supports a flexible node-relationship data model, making it highly suitable for constructing complex artifact knowledge graphs according to the CIDOC CRM ontology structure [34].

In Neo4j, unique indexes or constraints are created for the primary node types (such as artifacts, people, places, etc.) based on attributes like name or identifier, to avoid duplicate nodes and ensure the uniqueness and consistency of the knowledge. The edges in the knowledge graph are designed as directed relationships, with names corresponding to the semantic properties in CIDOC CRM. Each relationship type defines the node types for both the source and target entities.

These relationships are implemented in Neo4j as relationship types, ensuring that each edge’s semantics correspond directly to the definitions in the CIDOC CRM property set. For example, the custom relationship type BUILT_BY (creator) corresponds to the composite relationship E22 through E12/P14 linked to E21 in CRM; DISCOVERED_AT (discovery location) corresponds to E22 through E7/P7 linked to E53 in CRM, etc. This design leverages the efficient query capability of graph databases for direct connections, while maintaining semantic consistency at the foundational level.

The extracted and standardized RDF triples are batch-imported into the Neo4j graph database [35]. In practice, this is done using Neo4j’s Cypher query language or batch import tools: first, node data is organized into CSV files (listing artifacts, people, places, etc., and their attributes), and relationship data is organized into tables of subject-object pairs and relationship types. The LOAD CSV statement in Cypher is then used to create nodes and relationships one by one.

5. Results Presentation

Through knowledge extraction and graph construction, this study has collected and cleaned 106 records of immovable cultural heritage in Chifeng City. The completed “Chifeng Liao Dynasty Cultural Heritage Knowledge Graph” reaches a certain depth in both scale and content coverage, effectively integrating heterogeneous information in the field of Liao dynasty cultural heritage. The study finally presents an overview of the Chifeng Liao dynasty cultural heritage resource graph and its visual interface.

5.1. Graph Scale and Knowledge Coverage

The knowledge graph constructed in this study integrates Liao dynasty cultural heritage knowledge from dozens of literature and sources, achieving broad coverage of major artifact entities and their associated relationships (Table 2). Approximately 20 types of relationships are included, covering key connections between artifacts and entities such as people, places, and time, such as “creator”, “discovery location”, “period affiliation”, “material”, “usage”, and more. Through these semantic relationships, the graph organically links the basic information, historical background, and archaeological data of the artifacts, forming a semantic network.

The knowledge graph adopts the CIDOC CRM standard, ensuring that each piece of knowledge has a clear semantic definition. This strict ontology alignment guarantees that knowledge from different sources can be semantically integrated within the graph, for example, unifying information from research studies and records from archaeological reports under the same semantic framework. Overall, the knowledge graph covers a wide range of entities and knowledge points in the field of Chifeng Liao dynasty cultural heritage, providing in-depth exploration of the multifaceted relationships of key artifacts (such as important tombs and temple buildings), thereby offering rich knowledge support for subsequent applications.

5.2. Quantitative Evaluation and Accuracy Analysis

The current evaluation is based on three historical Liao-related texts: “The Study of Song-Liao Diplomatic Relations”, “Geographical Notes on the Liao History”, and “History of Khitan Art”. Approximately 100 passages were selected from each book, totaling 300 passages as experimental data.

The reliability of information retrieval and extraction techniques must be validated through quantitative metrics. Traditional evaluation metrics include precision, recall, and F1 score. Among these, Precision refers to the proportion of correct triples in the system’s extraction results; Recall refers to the proportion of correctly extracted triples in the system compared to the gold standard; F1 is the harmonic mean of precision and recall. This evaluation framework is commonly used in information extraction tasks.

In the 300 passages, 975 knowledge triples were manually annotated, and 889 triples were automatically extracted. Of these, 702 were true positives (TP), 162 were false positives (FP), and 221 were false negatives (FN). By comparing these, we get

True Positives (TP) = 702: The number of correctly extracted triples by the model.

False Positives (FP) = 162: The number of triples extracted by the model but not found in the manual annotations.

False Negatives (FN) = 221: The number of triples present in the manual annotations but not extracted by the model.

Using the above formulas, we can calculate the overall precision, recall, and F1 score:

Precision = 702 ÷ (702 + 162) ≈ 0.812

Recall = 702 ÷ (702 + 221) ≈ 0.761

F1 score = 2 × 0.812 × 0.761 ÷ (0.812 + 0.761) ≈ 0.786

Table 3 presents the relationship-wise evaluation results, showing performance variations across different CIDOC-CRM properties.

Overall, the automatic extraction performs best in identifying time expressions, but there are more errors in distinguishing artifact types and identifying builders. This suggests that the model is more sensitive to explicit years but tends to have missed or incorrect extractions when dealing with polysemy and complex names.

To facilitate understanding of the sources of errors in automatic extraction, we present 10 typical passages from each of the three historical texts (Table 4, Table 5 and Table 6), showing the differences between manual annotations and model extraction results. For simplicity, we only list core triples and omit irrelevant relationships.

Table 4 presents examples from “The Study of Song-Liao Diplomatic Relations”. This text largely provides an overview of historical and geographical literature, lacking specific artifact descriptions. The automatic extraction algorithm struggles to distinguish between the main text and book reviews, leading to a significant number of mis-extractions.

Largely of an overview of historical and geographical literature, lacking specific artifact descriptions. The automatic extraction algorithm struggles to distinguish between the main text and book reviews, leading to a significant number of mis-extractions.

Table 5 presents examples from “Geographical Notes on the Liao History”, which contains detailed geographical descriptions and administrative evolution records. The extraction results show that the model performs better on place names but struggles with complex geographical relationships and historical administrative changes.

This book primarily discusses Khitan music, dance, rituals, and court systems, with little focus on specific architecture. However, the algorithm tends to extract artifacts based on words like “palace” or “hall”.

Table 6 demonstrates examples from “History of Khitan Art”. This book primarily discusses Khitan music, dance, rituals, and court systems, with little focus on specific architecture. However, the algorithm tends to extract artifacts based on keywords like “palace” or “hall”, resulting in false positives for artifact entities.

By comparing the automatic extraction and manual annotation of the three Liao history-related texts, we have constructed a gold standard dataset comprising 975 triples and evaluated the performance of a GPT-based model in extracting knowledge about Chifeng Liao Dynasty artifact resources. The results show that the model has an overall precision of approximately 0.81 and a recall rate of 0.76, indicating its potential usability in structured extraction, though there are still issues with missed and incorrect extractions. Specifically, the model tends to:

Entity boundary errors: Incorrectly classifying non-artifact phrases (such as “district” or “sixteen departments”) as artifacts;

Type confusion: Unstable classification of suffixes like “tomb” or “state”, often misclassifying ancient tombs as ruins or architecture;

Time errors: Occurrence of numerical substitution or omissions in time expressions;

Missing recognition: Frequently missing all triples in sentences containing short place names.

5.3. Performance of ChatGPT in Named Entity Recognition

To further evaluate the capability of large language models (LLMs) in extracting structured knowledge from historical texts, ChatGPT was applied to the same corpus from The Geography Section of the Liao History (Liaoshi·Dilizhi) for Named Entity Recognition (NER). The model outputs were compared against the manually annotated gold standard using standard metrics—Precision, Recall, and F1-score. The study selected 13 passages of text (as shown in Figure 3).

As shown in Table 7, under strict matching, ChatGPT achieved an F1-score of 0.41, outperforming spaCy (0.214) and CKIP-BERT (0.018). Under lenient matching (0.5 threshold), ChatGPT achieved an F1-score of 0.505, significantly outperforming spaCy (0.25) and CKIP-BERT (0.346). This indicates ChatGPT’s superior contextual understanding and generalization ability in identifying complex entities. The model demonstrated particularly strong performance in recognizing temporal and geographical entities, correctly identifying examples such as “Shangjing Linhuang Prefecture”, “Song Jingde Fourth Year”, and “Tonghe Twenty-Fifth Year”.

However, minor boundary inconsistencies were observed—e.g., merging multi-word expressions (“Liao reign 210 years”) or splitting entity boundaries. Nonetheless, ChatGPT achieved notably higher recall, suggesting stronger adaptability in weakly supervised or unstructured text scenarios.

Furthermore, ChatGPT showed semantic awareness in entity-type inference, such as categorizing “Liao reign” as a dynasty (ORG) and “Shengzong” as a person (PER), highlighting its inherent knowledge transfer capabilities.

In summary, ChatGPT outperforms traditional NER models in historical text processing, offering high interpretability and domain adaptability. It can serve as an essential component or semi-automatic annotation assistant in knowledge graph construction for cultural heritage research.

5.4. Visualization Interface and Example Display

The data from the knowledge graph is visually displayed through a front-end interface, allowing users to intuitively browse and understand the knowledge structure (Figure 4).

In this example, the central node represents a particular artifact entity, with various related entity nodes and relationship edges radiating around it. Nodes in different colors represent different types of entities. For example, the red nodes represent immovable Liao dynasty cultural heritage in Chifeng, purple nodes represent periods/dynasties, orange nodes represent places, and blue nodes represent artifact attributes (such as shape, decoration, etc.). Each edge is labeled with a relationship name (e.g., “period affiliation”, “original discovery location”, “design”, etc.), thus clearly showcasing the semantic relationship network centered around the artifact. Users can click on any node to expand and explore more related relationships, enabling gradual exploration. The right side of the interface provides a legend and control panel, allowing users to search, filter, and expand/collapse hierarchical content. For example, when users enter “Lingyan Temple Pagoda” in the search box, the corresponding node is highlighted, and related information nodes surrounding the “Lingyan Temple Pagoda” are displayed, such as its construction date, creator, location, and other relationships.

More importantly, leveraging the visualization and graph exploration capabilities of the Neo4j graph database, this study has made groundbreaking progress in the “exploratory application” of the knowledge graph. By visually analyzing the multiple relationships among nodes such as artifacts, places, periods, and people within the graph, researchers can not only quickly access basic information about a specific artifact but also discover knowledge connections that are difficult to identify in traditional tabular formats.

For example, when exploring the “Lingyan Temple Pagoda” as a central node, its related nodes include “fifth year of Liao Tianqing” (time), “Balin Left Banner, Chifeng City” (place), and “construction event”, which are semantically connected. These further indirectly link to people nodes like “Empress Dowager Li”, indicating the political and cultural significance of this building within the Liao dynasty royal family context. Similarly, through cluster analysis of architectural nodes such as “Huguo Temple”, “Chuiqing Temple”, and “Qing’an Zen Temple”, it is found that these buildings are primarily made of “stone” or “wood” and are spatially concentrated in the Balin Left Banner area of Chifeng. This suggests that the region had a high density of Buddhist architecture during the Liao dynasty, possibly reflecting the presence of a political or religious center.

Moreover, extending the “discovery location—place—time” triple relationship, the distribution timeline of major Liao dynasty sites can be constructed. Nodes such as “1058 AD” and “1077 AD” cluster around the construction records of several artifacts, indicating that the mid-to-late Liao dynasty was a period of active construction activity. The discovery of these knowledge connections is a key demonstration of how knowledge graphs exceed the capabilities of traditional databases.

5.5. Generation of Value Themes

By extracting graph structural features and calculating centrality indicators such as node degree and betweenness centrality, key heritage sites and events are identified. The degree of palace architecture nodes (such as “Kaihuang Hall” and “Ande Hall”) is significantly higher than that of general sites, indicating that these nodes are associated with multiple historical events and figures. Local subgraph analysis of nodes like “Lingyan Temple Pagoda” also reveals its strong connections with nodes such as “fifth year of Liao Tianqing” (time), “Balin Left Banner, Chifeng City” (place), and “construction event”, suggesting the political-religious significance of this building within the royal family context.

To avoid subjective classification, this study combines the semantic descriptions of each artifact (including attribute objects from the triples and summary texts) into text vectors. TF-IDF feature representation is used, and K-means clustering algorithm is applied for theme discovery. After evaluating the results using the silhouette coefficient, the optimal number of clusters is determined to be 5 (k = 5). The clustering results reveal that one cluster focuses on terms like “palace architecture”, “noble bed”, and “Yelü Abaoji”, while another cluster centers around terms like “tomb clusters”, “ancestral temples”, and “imperial tombs.” Other clusters include terms related to “Buddhist temples”, “ordination platforms”, and “monk inscriptions”, as well as clusters for “mountain ranges”, “hunting camps”, and “multi-capital system”, “regional and military governance institutions”.

By observing the keywords and representative entities in each cluster and combining them with the historical context of the Liao dynasty in Chifeng, the five clusters are named as follows: “Nomadic Imperial Power and the Oruoduo Administrative System”, “Multi-Capital System and Territorial Governance Network”, “Ritual Order and Ancestor Worship”, “Buddhism Dissemination and Cultural Fusion”, and “Sacred Mountains and Frontier Landscape Order.” These themes reflect the differences and complementarities in their historical development and spatial patterns (Table 8).

5.6. Geospatial Mapping

To study the spatial associations between cultural heritage artifacts of different value themes and to identify the historical artifact enrichment zones in Chifeng City, this study standardizes and stores the geographic information of immovable cultural heritage and maps it onto a GIS platform. The specific approach involves determining the geographic location and multi-source location estimation for each immovable artifact based on literature records, local chronicles, archives, and collection records. The estimation results are provided with a quantifiable uncertainty (in meters). For artifacts with uncertain coordinates, buffer zone processing (e.g., ±500 m) is applied in GIS analysis. The coordinates and their metadata (source, confidence, estimation method, uncertainty, etc.) are exported in standard formats such as GeoJSON/CSV and projected into a suitable local coordinate system before being imported into GIS for subsequent hotspot analysis, clustering identification, and spatiotemporal association studies (Table 9).

For immovable cultural heritage records without precise geographical coordinates in the literature or archives, we adopted a pragmatic georeferencing strategy. Following the practice of China’s Third National Cultural Relics Survey, these records were anchored to the centroid of the nearest township-level administrative unit. The centroid coordinates of each township were derived from the national administrative boundary dataset, ensuring that every record has a spatial proxy for analysis. This approach prevents the exclusion of valuable records from GIS-based hotspot and clustering analyses, while explicitly flagging these points as approximations. In subsequent analyses we assigned wider buffer zones to these approximated locations to account for their greater locational uncertainty and clearly documented the data source (the Third National Cultural Relics Survey) and estimation method in the metadata.

After all the artifacts’ coordinates are standardized and mapped to the CIDOC-CRM entity coordinates and imported into GIS, the spatial distribution of Chifeng’s Liao dynasty historical artifacts presents a “dual-core—belt” mixed aggregation pattern (as shown in Figure 5). First, the Upper Capital (Balin Left Banner) and the surrounding ancestral temples—tombs—palace complex form the first core; second, the Middle Capital (Ningcheng County) and the surrounding religious architectural complex form the second core. Between these two cores and on the outer edges, several rich aggregation zones centered around sacred mountains, rivers, or ancient roads are observed. This pattern reflects the overlapping and division of functions between Liao dynasty politics—ritual systems (palaces, ancestral temples) and religion—folk culture (temples, stone inscriptions) in spatial terms, and also proves that Chifeng served as both a political governance center and a hub for religious culture and transportation during the Liao dynasty. Based on this distribution feature, the research and protection strategies can be divided into: (1) strict control protection and archaeological priority excavation of core protection zones; (2) monitoring and digital management of belt zones and corridors; (3) several secondary enrichment points as candidates for cultural display and public education sites. These conclusions are supported by hotspot analysis (kernel density estimation) and spatial clustering (DBSCAN/k-means) results, and layered statistical verification is conducted based on temporal nodes (Time-Span) and artifact types, providing actionable decision-making support for subsequent protection prioritization, research site selection, and cultural tourism planning.

6. Conclusions

This study focuses on immovable cultural heritage from the Liao dynasty in Chifeng and, using the CIDOC-CRM international standard ontology framework, combines the knowledge extraction capabilities of large language models with the storage and visualization advantages of the Neo4j graph database to construct the “Chifeng Liao Dynasty Cultural Heritage Knowledge Graph”. A new method for knowledge organization of local cultural heritage is proposed: introducing CIDOC-CRM into the semantic modeling of Chifeng Liao dynasty artifacts and establishing a cross-source, cross-format data integration mechanism, effectively solving the problem of “data silos” in artifact information. The study combines large language models with ontology modeling, designing a process for entity relationship extraction and ontology alignment based on GPT, which effectively addresses challenges such as complex entity nesting and the scarcity of labeled samples in the cultural heritage domain, thus verifying the feasibility of large models in semantic extraction for cultural heritage.

The study constructs a multi-dimensional artifact knowledge graph. Through ontology modeling and triple extraction, a semantic network covering multiple knowledge categories such as people, places, events, time, and materials is formed, providing an initial revelation of the spatial distribution and cultural value structure of Chifeng Liao dynasty artifacts. By mapping artifact entities to GIS, a “dual-core–belt” aggregation pattern of Chifeng Liao dynasty artifacts is identified, providing spatial evidence for archaeological surveys, heritage protection zoning, and cultural exhibitions.

However, the study has several limitations that warrant acknowledgment. First, the accuracy and consistency of knowledge extraction require further verification, especially when dealing with complex sentence structures and ambiguous expressions, which still involve some errors. The current prompt engineering approach, while effective, lacks standardized quality control mechanisms for ensuring extraction reliability across diverse text types. Second, some artifacts lack precise temporal and spatial coordinates, which limits the accuracy of spatial clustering analysis and may affect the reliability of distribution pattern identification. Third, the sample scope is geographically and temporally constrained, focusing exclusively on Liao dynasty heritage in Chifeng, which limits the generalizability of the proposed methodology to other historical periods or regions. Fourth, the interpretability of the large language model’s decision-making process remains a challenge, as the “black box” nature of LLMs makes it difficult to trace the reasoning behind specific entity-relationship extractions. Fifth, the knowledge graph lacks a real-time update mechanism, meaning that newly discovered artifacts or updated research findings cannot be automatically integrated into the system. Finally, the evaluation framework employed in this study relies primarily on manual verification by domain experts, and there is a need for more systematic, domain-specific quality assessment metrics tailored to cultural heritage knowledge graphs.

Despite these limitations, this research demonstrates promising directions for future improvements. First, incorporating active learning and human-in-the-loop mechanisms could reduce manual intervention while maintaining quality control, gradually improving the model’s autonomous extraction capabilities through iterative expert feedback. Second, expanding the model’s application scope to other historical dynasties and geographical regions would validate its transferability and establish a more comprehensive Chinese cultural heritage knowledge system. Third, developing explainable AI techniques specifically for entity-relationship extraction could enhance the transparency and trustworthiness of the automated knowledge construction process. Fourth, establishing standardized evaluation metrics for cultural heritage knowledge graphs, incorporating both computational measures (precision, recall, F1-score) and domain-specific quality indicators (historical accuracy, semantic richness, ontological consistency), would provide more rigorous assessment frameworks. Fifth, implementing dynamic update mechanisms through continuous monitoring of academic publications, archaeological reports, and museum cataloging systems would ensure the knowledge graph remains current and comprehensive. Sixth, extending the system’s functionality beyond visualization to include intelligent question-answering, personalized recommendation systems, and virtual heritage experiences could enhance its practical value for researchers, educators, and the general public. Finally, promoting cross-institutional data sharing and collaboration through standardized CIDOC-CRM mappings could facilitate the construction of a national or even international cultural heritage knowledge network, advancing the field toward truly interoperable digital heritage ecosystems. These future directions not only address the current limitations but also point toward the broader potential of integrating international semantic standards with AI technologies for intelligent cultural heritage management.

Author Contributions

Conceptualization, Y.W. and M.Z.; methodology, Y.W.; software, Y.W.; validation, Y.W. and M.Z.; formal analysis, Y.W.; investigation, Y.W.; resources, M.Z.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, M.Z.; visualization, Y.W.; supervision, M.Z.; project administration, M.Z.; funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Natural Resources of the People’s Republic of China, grant number 2023YFC3803900-4-3.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CIDOC-CRM	CIDOC Conceptual Reference Model
GPT	Generative Pre-trained Transformer
KG	Knowledge Graph
LLM	Large Language Model
NER	Named Entity Recognition
RDF	Resource Description Framework
SHACL	Shapes Constraint Language
SPARQL	SPARQL Protocol and RDF Query Language

Appendix A

Please extract the following entities from the historical, archaeological, or geographical literature related to the Liao Dynasty in Chifeng:

1.: Immovable cultural heritage artifacts of the Liao Dynasty in Chifeng.
2.: Builders of these immovable cultural heritage artifacts.
3.: Restorers of these immovable cultural heritage artifacts.
4.: Construction dates of these immovable cultural heritage artifacts.
5.: Restoration dates of these immovable cultural heritage artifacts.
6.: Geographical locations (mountain and water names) related to these immovable cultural heritage artifacts.
7.: Administrative place names related to these immovable cultural heritage artifacts.
8.: Activity place names associated with these immovable cultural heritage artifacts (e.g., places like ‘Naba’).
9.: Events related to these immovable cultural heritage artifacts.
10.: Reigning years (or dynastic years) during the construction or restoration of these immovable cultural heritage artifacts.
11.: Materials used in constructing these immovable cultural heritage artifacts.
12.: Construction activities related to these immovable cultural heritage artifacts.
13.: Restoration activities related to these immovable cultural heritage artifacts.

Extraction Process:

Entity Extraction: First, extract the entities and assign them to appropriate subclasses.

The top-level concept should be named Chifeng Liao Dynasty Cultural Heritage Resources, with the subclass E27 Site for immovable cultural heritage artifacts.

The subclass E27 Site includes six sub-classes from CIDOC CRM: E53 Place, E57 Material, E55 Type, E52 Time-Span, E21 Person, and E7 Activity.

Entity Classification:

E21 Person includes sub-classes for Builder and Restorer.
E52 Time-Span includes sub-classes for Construction Time and Restoration Time, with a corresponding time reference table mapping extracted dates and reign periods to three historical periods:
907–969 AD (Liao Taizu Yelü Abaoji’s founding to Liao Muzong),
969–1031 AD (Liao Jingzong to Liao Shengzong),
1031–1125 AD (Liao Xingzong to the fall of the Liao Dynasty).
E53 Place includes three sub-classes: E53 Administrative Place Name, E53 Mountain/Water Place Name, and E53 Activity Place Name. Match the extracted administrative place names to the following keywords: Linhuang County, Changtai County, Dingba County, Baohua County, and other historical locations.
E55 Type includes Artifact Categories (e.g., Ancient Buildings, Tombs, Inscriptions, and others) and Person Categories (e.g., Emperors, Nobles, Servants, Priests, Tomb Owners).
E57 Material maps extracted artifact materials to keywords such as stone, wood, clay, and ceramics.
E7 Activity includes sub-classes for Construction Activities and Restoration Activities.

Extract Knowledge Triples: Then, identify the relationship types and extract the knowledge triples (Entity 1—Relationship—Entity 2). The relationship types include:

E27-P45-E57: Artifact Material (e.g., “Lingyan Temple Pagoda—made of—Stone”).
E27-P2-E55: Artifact Type (e.g., “Lingyan Temple Pagoda—type of—Ancient Building”).
E27-P4-E52: Construction Time (e.g., “Lingyan Temple Pagoda—constructed during—1115 AD”).
E27-P14-E21: Builder (e.g., “Lingyan Temple Pagoda—built by—Empress Dowager Li”).
E27-P12-E7: Construction Activity (e.g., “Lingyan Temple Pagoda—involved in—Construction”).
E27-P53-E53: Artifact Location (e.g., “Lingyan Temple Pagoda—located at—Balin Left Banner, Chifeng City”).
E7-P7-E53: Event Location (e.g., “Restoration of the Temple—took place at—Balin Left Banner”).
E7-P4-E52: Restoration Time (e.g., “Restoration of Lingyan Temple Pagoda—occurred during—1120 AD”).
E21-P21-E7: Role in Activity (e.g., “Empress Dowager Li—performed in—Restoration”).

Output Format: The extracted entities and triples should be output in the following format:

Entity 1—Relationship—Entity 2 (e.g., “Liao Shangjing Site—P2—Ancient Site”).
Entity Classification (e.g., “Liao Shangjing Site—E27 Ancient Site—Artifact Type Classification”; “Qingzhou—Administrative Place Name”; “947 AD—Construction Time”).

Place Reference Table: Include mappings such as “947 AD—Period from Liao Taizu Yelü Abaoji’s founding to Liao Muzong.”

References

Barzaghi, S.; Moretti, A.; Heibi, I.; Peroni, S. CHAD-KG: A Knowledge Graph for Representing Cultural Heritage Objects and Digitisation Paradata. arXiv 2025, arXiv:2505.13276. [Google Scholar] [CrossRef]
Felicetti, A.; Himmiche, A.; Somenzi, M. Knowledge Graphs and Artificial Intelligence for the Implementation of Cognitive Heritage Digital Twins. Appl. Sci. 2025, 15, 10061. [Google Scholar] [CrossRef]
The State Council of the People’s Republic of China. The 14th Five-Year Plan for Cultural Relics Protection and Technological Innovation. China Cultural Relics News, 9 November 2021; p. 003. [Google Scholar]
ISO 21127:2023; Information and Documentation—A Reference Ontology for the Interchange of Cultural Heritage Information. International Organization for Standardization (ISO): Geneva, Switzerland, 2023.
GB/T 37965-2019; Information and Documentation—Reference Ontology for Cultural Heritage Information Exchange. Standardization Administration of China: Beijing, China, 2019.
Mountantonakis, M.; Koumakis, M.; Tzitzikas, Y. Combining LLMs and Hundreds of Knowledge Graphs for Data Enrichment, Validation and Integration: Case Study: Cultural Heritage Domain. In Proceedings of the International Conference on Museum Big Data (MBD2024), Athens, Greece, 18–19 November 2024; Volume 4021, p. 3. [Google Scholar]
Fan, T.; Wang, H.; Hodel, T. CICHMKG: A Large-Scale and Comprehensive Chinese Intangible Cultural Heritage Multimodal Knowledge Graph. NPJ Herit. Sci. 2023, 11, 115. [Google Scholar] [CrossRef]
Doerr, M. The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata. AI Mag. 2003, 24, 75–92. [Google Scholar] [CrossRef]
Wang, Y. Research on the Construction and Application of Knowledge Ontology in the Field of Intangible Cultural Heritage Architecture. Master’s Thesis, Anhui Jianzhu University, Hefei, China, 2023. [Google Scholar] [CrossRef]
Zhang, J.; Ren, T. A Conceptual Model for Ancient Chinese Ceramics Based on Metadata and Ontology: A Case Study of Collections in the Nankai University Museum. J. Cult. Herit. 2024, 66, 20–36. [Google Scholar] [CrossRef]
Wang, Y.; Shi, D. Using CIDOC CRM to Construct Knowledge Ontology of Architectural Intangible Cultural Heritage. Comput. Eng. Appl. 2023, 59, 317–326. [Google Scholar]
Ouyang, J.; Liang, Z.F.; Ren, S.H. Research on the Construction of Knowledge Graph of Large-Scale Chinese Ancient Books. Libr. Inf. Serv. 2021, 65, 126–135. [Google Scholar] [CrossRef]
Yaman, B.; Randles, A.; McKenna, L.; Kilgallon, L.; Rincón-Yáñez, D.; Johnston, N.; Crooks, P.; O’Sullivan, D. Expanding the Virtual Record Treasury of Ireland Knowledge Graph. Semant. Web J. 2024. in press. Available online: https://www.semantic-web-journal.net/content/expanding-virtual-record-treasury-ireland-knowledge-graph (accessed on 29 October 2025).
Zhuang, Y. Knowledge Organization of Museum Collections for Artificial Intelligence: The Case of the Palace Museum’s Conceptual Reference Model for Ancient Chinese Movable Cultural Relics. Palace Mus. J. 2023, 11, 126–136+150. [Google Scholar] [CrossRef]
Xia, Y.; Yao, X.; Wang, J.; Hu, M. Leveraging Knowledge Graphs for Renaissance Costume Matching and Cultural Transmission. NPJ Herit. Sci. 2025, 13, 219. [Google Scholar] [CrossRef]
Linked Art Editorial Board. Linked Art Data Model, Version 1.0.0. Available online: https://linked.art/model/ (accessed on 29 October 2025).
Europeana Foundation. Europeana Data Model (EDM): Documentation. Available online: https://pro.europeana.eu/page/edm-documentation (accessed on 29 October 2025).
Dagdelen, J.; Dunn, A.; Lee, S.; Walker, N.; Rosen, A.S.; Ceder, G.; Persson, K.A.; Jain, A. Structured Information Extraction from Scientific Text with Large Language Models. Nat. Commun. 2024, 15, 1418. [Google Scholar] [CrossRef] [PubMed]
Foppiano, L.; Lambard, G.; Amagasa, T.; Ishii, M. Mining Experimental Data from Materials Science Literature with Large Language Models: An Evaluation Study. Sci. Technol. Adv. Mater. Methods 2024, 4, 2356506. [Google Scholar] [CrossRef]
Wang, Y.; Liu, J.; Wang, W.; Chen, J.; Yang, X.; Sang, L.; Wen, Z.; Peng, Q. Construction of Cultural Heritage Knowledge Graph Based on Graph Attention Neural Network. Appl. Sci. 2024, 14, 8231. [Google Scholar] [CrossRef]
Wu, M.C.; Liu, C.; Meng, K.; Wang, D.B. Construction of a Classical-Modern Chinese Translation Model Based on Pre-Trained Language Models. J. Inf. Resour. Manag. 2024, 14, 143–155. [Google Scholar] [CrossRef]
Huang, Y.; Yu, S.; Chu, J.; Fan, H.; Du, B. Using Knowledge Graphs and Deep Learning Algorithms to Enhance Digital Cultural Heritage Management. Herit. Sci. 2023, 11, 204. [Google Scholar] [CrossRef]
Yuan, H.; Li, Y.; Wang, B.; Liu, K.; Zhang, J. Knowledge Graph-Based Intelligent Question Answering System for Ancient Chinese Costume Heritage. NPJ Herit. Sci. 2025, 13, 198. [Google Scholar] [CrossRef]
Maree, M. Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery. Data 2025, 10, 52. [Google Scholar] [CrossRef]
Zhang, J.; Chan, J.C.F.; Zhao, Z.; Cheng, J.C.P. Heritage Building Information Management and Intelligent Querying by Multimodal Large Language Models and Knowledge Graph. In Proceedings of the Sixth International Conference on Civil and Building Engineering Informatics (ICCBEI 2025), Hong Kong, China, 8–11 January 2025. [Google Scholar]
Câmara, A.; Almeida, A.; Oliveira, J. Transforming the CIDOC-CRM Model Into a Megalithic Monument Property Graph. J. Comput. Appl. Archaeol. 2024, 7, 213–224. [Google Scholar] [CrossRef]
Gao, Q.; Li, M.; Yang, Q.; Zhu, W.; Wei, S. Knowledge-Graph Construction for Historic Architectural Complexes: A Case Study of the Grand View Garden in The Story of the Stone. In Proceedings of the Fourth International Conference on Remote Sensing, Surveying, and Mapping (RSSM 2025), Xi’an, China, 10–12 January 2025; SPIE: Xi’an, China, 2025; p. 136422C. [Google Scholar]
Zhao, Z.; Wang, D. Evaluation of Large Language Models for the Intangible Cultural Heritage Domain. NPJ Herit. Sci. 2025, 13, 439. [Google Scholar] [CrossRef]
Matic, R.; Kabiljo, M.; Zivkovic, M.; Cabarkapa, M. Extensible Chatbot Architecture Using Metamodels of Natural Language Understanding. Electronics 2021, 10, 2300. [Google Scholar] [CrossRef]
Patil, R.; Gudivada, V.N. A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs). Appl. Sci. 2024, 14, 2074. [Google Scholar] [CrossRef]
Zhong, G. Research on the Path of Intangible Cultural Heritage Knowledge Graph Empowering Art Design Education in Colleges and Universities. J. Contemp. Educ. Res. 2025, 9, 9. [Google Scholar] [CrossRef]
Liu, S.; Tan, N.; Yang, H. Research on the Construction of Knowledge Graph for Liao Dynasty Historical and Cultural Resources. J. Dalian Minzu Univ. 2021, 23, 73–80. [Google Scholar] [CrossRef]
Li, Y. Research on Cultural Relic Knowledge Extraction Method and Its Knowledge Graph Construction. Master’s Thesis, Chongqing University of Technology, Chongqing, China, 2024. [Google Scholar] [CrossRef]
Li, C.; Hou, X.; Qiao, X. A Low-Resource Named Entity Recognition Method for the Cultural Heritage Field Incorporating Knowledge Fusion. Acta Sci. Nat. Univ. Pekin. 2024, 60, 13–22. [Google Scholar] [CrossRef]
Li, L. Research on the Construction of a Digital System for Cultural Relics Based on Knowledge Graph. Master’s Thesis, Beijing Jiaotong University, Beijing, China, 2022. [Google Scholar] [CrossRef]

Figure 1. Technical roadmap for the knowledge graph of Chifeng Liao dynasty cultural relics based on CIDOC-CRM.

Figure 2. CIDOC-CRM entity-attribute relationships of “Chifeng Liao Dynasty Cultural Relics”.

Figure 3. The study selected 13 passages of text.

Figure 4. Chifeng Liao dynasty cultural heritage knowledge graph (partial).

Figure 5. Enrichment zones of Chifeng Liao dynasty cultural heritage.

Table 1. Properties defined for concept classes (partial).

No.	Relation	Explanation
P1	is identified by	Identifies artifact name, number, etc.
P4	has time-span	Describes the historical period of existence or restoration of the artifact
P7	took place at	Location where the event occurred

Table 2. Extraction of Liao dynasty cultural heritage resource triples in Chifeng (partial).

Entity	Relationship	Concept Class
Duer Mountain	belongs to	E27 Site
Lingyan Temple	belongs to	E27 Site
Qing’an Zen Temple	belongs to	E27 Site
Chuiqing Temple	belongs to	E27 Site
Xiuliang Valley	belongs to	E27 Site
Jishan Mountain	belongs to	E27 Site
Huguo Temple	belongs to	E27 Site
Mingxiu Pavilion	belongs to	E27 Site
Wang Ji	belongs to	E21 Person
Empress Dowager Li	belongs to	E21 Person
Liaoyang	belongs to	E53 Place
Tangchi County	belongs to	E53 Place
Gaizhou	belongs to	E53 Place
1077 AD	belongs to	E52 Time-Span
1058 AD	belongs to	E52 Time-Span
Stone	belongs to	E57 Material
Wood	belongs to	E57 Material

Table 3. Relationship-wise evaluation results.

Relationship	Gold Standard Triples	Predicted Triples	Correct Predictions	Precision	Recall
P2	657	600	460	0.80	0.760
P14	177	164	123	0.75	0.695
P4	141	125	119	0.952	0.844

Table 4. Example passages from “The Study of Song-Liao Diplomatic Relations”.

Excerpt of Paragraph	Human Triplet Example	GPT Extraction Example
In the seventh year of the Kaibao era of Emperor Taizu of the Song Dynasty… Prefect of Zhuozhou, Liao.	Zunyi Prefecture—P2—Ancient Ruins; Zunyi Prefecture—P4—Seven Years	No output
…Tomb of the Cavalry Commandant Teng Gong with inscriptions and Li Tao’s military service and official duties…	Lusi Kongteng Public Cemetery—P2—Ancient Tombs; Lu Ming and Li Lin Tombstones—P2—Ancient Ruins	Both types of cultural relics are classified as ancient ruins
…In the battle of Gaoliang River, the Song army was heavily defeated. The Liao forces took advantage of the situation to advance and attack, while the Green Border Prefecture troops, due to their scarcity, had little respite.	Xi You Prefecture—P2—Ancient Ruins; Gaoliang River—P2—Mountain Water Land Names; Green Border Prefecture—P2—Ancient Ruins	No output
…The western attack on Daizhou, where Zhang Qixian was defeated… The two nations then resumed peace negotiations.	Xi Zheng Dai Prefecture—P2—Ancient Ruins; Xi Zheng Dai Prefecture—P4—Twenty-Seven Years	Misread the time as “Twenty-Seven Years”
…The northern march to Chanzhou, to strengthen the military force… Sent envoys with a letter to the Song governor of Mozhou, Shi Pu.	Zhu Zheng Yang City—P2—Ancient Ruins; Bei Xin Jiang Prefecture—P2—Ancient Ruins; Ren’s Book Exemptions—P2—Ancient Ruins	Consistent with Artificial Results

Table 5. Example passages from “Geographical Notes on the Liao History”.

Excerpt of Paragraph	Human Triplet Example	GPT Extraction Example
	No actual cultural relics	Extracted more than ten forged ‘counties and districts’ and ‘evolutions’ as cultural relics
Tan Buzheng states: “The * History of Khitan Kingdom—Four Capitals says: ‘Shangjing Linhuangfu was the site of the large tribe.’ Furthermore, in the text of the Records, it says: ‘Emperor Taizu pacified Bohai and relocated its people to the large tribe.’ The large tribe likely refers to the commonly used term of that time.” … In fact, whether this arrangement has a certain institutional basis cannot be conclusively determined at this time…	Liao Shangjing Site—P2—Ancient City Site (E55) Liao Shangjing Site—P53—Baarin Left Banner, Chifeng. Liao Shangjing Site—P14—built by—Yelü Abaoji	District evolution and related—P2—Ancient architecture, etc.
……One is the Urjimuren River (some maps label it as Wolejimuren River or Uligimuren River, also formerly known as Erchimu River), which originates from Wulan Peak in the northern part of present-day Balin Left Banner. The upper reaches also have a source, called Haolete Gaole. The two sources converge at Xieli Hada, then flow southward, passing east of the Liao Shangjing city site, gradually heading southeast and entering the southwest part of Alukerqin Banner.”……	“U’erjimulun River flow”—P7—took place at—passes east of the Shangjing city site and then toward the southwest into Ar Horqin Banner	No output

Note: * indicates historical texts cited in the original source.

Table 6. Example passages from “History of Khitan Art”.

Excerpt of Paragraph	Human Triplet Example	GPT Extraction Example
Khitan Jiaofang music… Established a music and dance performance and management institution, primarily used for the “Jiaofang-type music” in various court activities…	Khitan Palace—P2—Ancient architecture; Khitan Palace—P14—Khitan	Missed the ‘mainly used for palaces’ P2 relationship
Guoyue is the traditional primitive music of the Khitan ethnic group… Generally used in court settings, palace banquets, and other such occasions.	No output	Generally used for palaces—P2—Others
…“History of Liao” contains a record of “Chunnabo Duck River.” In the first month, the emperor held a “Nabo” ceremony by the large lake near Duck River…	Chunnabo Duck River—P2—Mountain-water place names; Chunnabo Duck River—P14—Khitan	Consistent with the manual
…Taihe Festival for ascending and descending, Shuhe Festival for entering and exiting, Zhahe Festival for raising wine, Xiuhe Festival for offering food, Zhenghe for the empress receiving the imperial seal to proceed, Chenghe for the crown prince to proceed.	Xiangzong Temple—P2—Ancient architecture; Xiangzong Temple—P14—Khitan	The type was classified as ancient ruins, but the builder was written incorrectly.

Table 7. Conclusion.

Method	Strict	Lenient@0.5	P	R	F1
spaCy_zh	0.3	0.375	0.214	0.268	0.25
CKIP_bert_base_cn	0.009	0.264	0.018	0.5	0.346
ChatGPT	0.452	0.59	0.41	0.642	0.505

Table 8. Classification of value themes for Chifeng Liao dynasty cultural heritage resources.

Value Theme	Representative Relics in Chifeng Area	Historical and Theoretical Significance
Nomadic Imperial Power and the Oruoduo Administrative System	Upper Capital Imperial Palace Complex; Hongyi Palace, Jiqing Palace, Yanchang Palace, Zhangmin Palace, Xingsheng Palace, Yanqing Palace, Changning Palace, Chongde Palace, Dunmu Palace; Naobao ritual architecture: Shengfang Hall, Shouning Hall, Qingliang Hall, Hanliang Hall, Tianxiang Hall	Abaoji established the Oruoduo system, forming a “mobile central government”; material (felt tents) and system (palace—military governance) are isomorphic, witnessing the governance innovation of the steppe empire.
Multi-Capital System and Territorial Governance Network	Upper Capital City Site (Balin Left Banner, Lindong Town); Middle Capital City Site (Ningcheng County, Heicheng); Gangwa Kiln Site (Songshan District, Gangwa Kiln Village); Carving Inkstone Site (Songshan District); Longhua Prefecture Site (Han settler settlement); Xinzhu Prefecture Site (Han captive settlement); An De County and Dingba County-related remains	Chifeng as the “Upper Capital—Middle Capital” dual-core zone, institutionalized integration of Liao, Xian, Han, Bohai, and other ethnic groups and territorial resources.
Ritual Order and Ancestor Worship	Muyu Mountain Ancestral Temple and Chai Ce Ritual Site (Balin Left Banner); Taizu Tomb of Liao (Hada Yingge Valley, Balin Left Banner); Huai Tomb (Balin Right Banner); Qing Tomb (Qingyun Mountain, Balin Right Banner); Wei Guo Fuma Tomb (Songshan District); Yelü Cong Tomb (Karakqin Banner); Fengshui Mountain Khitan Tomb Group (Balin Left Banner)	Imperial tomb—ancestral temple—Chai Ce rituals form the legitimacy support of the Liao dynasty; the Chifeng area became the core region for ritual and burial systems.
Buddhism Dissemination and Cultural Fusion	Upper Capital Buddhist Temple Group: Anguo Temple, Tianxiong Temple, Hongfu Temple, Kaijiao Temple (all within the Upper Capital Inner City, Balin Left Banner); Puan Temple Site (Balin Left Banner, Suoboyin Gao Town); Guanyin Pavilion (Ningcheng County, Dachengzi Village)	Buddhist architecture and ancestral spirit worship coexist, reflecting the dual-track operation of the Liao dynasty’s system and belief.
Sacred Mountains and Frontier Landscape Order	Muyu Mountain (ancestral temple and Chai Ce rituals); Bieli Mountain, Tianti Mountain, Weidian (natural landmarks surrounding Upper Capital, Balin Left Banner); Heishan (Oruoduo gathering and hunting site, Karakqin); Bafangbei Wetland (ritual activity zone, within Chifeng); Huangshui River, Tieluochuan, and other frontier water areas	Chifeng forms a “capital—sacred mountain—imperial tomb” spatial organization, highlighting the governance aesthetics of the Liao dynasty, where “landscape is the stage for power.”

Table 9. Longitude and latitude of Chifeng Liao dynasty immovable cultural heritage (partial).

Artifact Name	Latitude	Longitude	Location	Value Theme	Historical Event
Xianzhou Old City	43.96	119.39	Northwest of Lindong Town, Balin Left Banner	Multi-Capital System and Territorial Governance Network	An important provincial city during the early Liao dynasty, showcasing the establishment of the early Khitan regime.
Songshan Guan Site	42.2	118.6	Longwangmiao Village, Chengzi Township, Songshan District, Chifeng City	Multi-Capital System and Territorial Governance Network	A postal station on the Liao emperor’s route, reflecting the transportation network of the steppe Silk Road.
Xuedian Site	42.3	119.3	Northeast of Tao’an County, Chifeng City	/	/
Kaihuang Hall	43.96	119.39	South Suburb of Upper Capital Imperial Palace Site, Lindong Town, Balin Left Banner	Ritual Order and Ancestor Worship, Multi-Capital System and Territorial Governance Network	Built in the style of Central Plains palace architecture, using rammed earth techniques; it has three main halls: Kaihuang, Ande, and Wulun. Received envoys from Jin and accepted the records of the Sixteen Prefectures of Youyun (during the second year of Huitong).
Ande Hall	43.96	119.39	South Suburb of Upper Capital Imperial Palace Site, Lindong Town, Balin Left Banner	Ritual Order and Ancestor Worship, Multi-Capital System and Territorial Governance Network	Built in the style of Central Plains palace architecture, using rammed earth techniques; it has three main halls: Kaihuang, Ande, and Wulun.
Wulun Hall	43.96	119.39	South Suburb of Upper Capital Imperial Palace Site, Lindong Town, Balin Left Banner	Ritual Order and Ancestor Worship, Multi-Capital System and Territorial Governance Network	Built in the style of Central Plains palace architecture, using rammed earth techniques; it has three main halls: Kaihuang, Ande, and Wulun. Hosted celebrations with officials and envoys (during the third year of Tianxian).
Eryi Hall	43.96	119.39	Northwest Corner of Upper Capital Imperial Palace Site, Lindong Town, Balin Left Banner	Ritual Order and Ancestor Worship	The emperor visited the hall (during the fourth year of Tianxian and the tenth year of Xianyong).
Yizheng Hall	43.96	119.39	East Side of the Central Hall Area, Upper Capital Imperial Palace Site, Lindong Town, Balin Left Banner	Ritual Order and Ancestor Worship	Hosted grand ceremonial rites (during the first year of Huitong).
Guande Hall	41.57	119.16	Southwest of the Imperial Palace Site in Zhongjing, Ningcheng County	Ritual Order and Ancestor Worship	The emperor visited the hall (during the tenth year of Qingning).
Tianxiong Temple	43.96	119.39	Southeast Corner of Upper Capital Inner City, Balin Left Banner	Buddhism Dissemination and Cultural Fusion, Ritual Order and Ancestor Worship	Established in 926 during the expansion of Upper Capital; enshrined a portrait of Liao Taizu Abaoji; one of the early Buddhist temples of the Khitan dynasty (art history).
Anguo Temple	43.97892	119.38618	Northern District of Han City, Upper Capital, Balin Left Banner	Buddhism Dissemination and Cultural Fusion	Liao Taizu visited this temple (during the fourth year of Tianzan).
Taizu Temple	43.88	119.11	West Side of Ancestral City, Balin Left Banner, Hada Yingge Township	Ritual Order and Ancestor Worship	Used for worshipping the ancestors of the Khitan royal family; in the fifth year of Tianqing (1115 AD), Yelü Zhangnu was sacrificed to the ancestral temple.
Kaijiao Temple	43.96	119.39	Southeast Corner of Upper Capital Inner City, Balin Left Banner	Buddhism Dissemination and Cultural Fusion	Founded by Khitan Taizu Yelü Abaoji in 902 AD; marks the beginning of the Khitan acceptance of Buddhism.
Liao Confucian Temple	43.96	119.39	Confucian District of Han City, Upper Capital, Lindong Town, Balin Left Banner	Ritual Order and Ancestor Worship	/
Puan Temple Site	43.97892	119.38618	3 km Southeast of Suoboyinggao Town, Balin Left Banner	Buddhism Dissemination and Cultural Fusion	An important Buddhist temple during the Liao dynasty, with clear remains reflecting the deep roots of Buddhism in the steppe.
Guanyin Pavilion	41.70736	118.89524	Dacengzi Village, Ningcheng County, Liao Zhongjing Dadingfu Site	Buddhism Dissemination and Cultural Fusion	An important building in the spread of Buddhism in the Liao dynasty, housing a statue of Guanyin, reflecting the Buddhist faith of the Khitan people.
Beida Temple	41.70736	118.89524	Northwest of Dacengzi Village, Ningcheng County, Liao Zhongjing Site	Buddhism Dissemination and Cultural Fusion	One of the important religious buildings of the Liao dynasty, used to worship Buddhist deities.
Qingzhou White Pagoda	43.5247	118.67202	3 km Southeast of Suoboyingga Town, Balin Right Banner	Buddhism Dissemination and Cultural Fusion	Built during the period of flourishing Buddhism, with wooden Buddhist statues unearthed in the tomb, showcasing Khitan Buddhist faith and sculpture art.
Songshan District Carving Inkstone Site	42.28527	118.92726	Chengzi Township, Songshan District, Chifeng City	Ritual Order and Ancestor Worship	/

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Zhang, M. CIDOC CRM-Based Knowledge Graph Construction for Cultural Heritage Using Large Language Models. Appl. Sci. 2025, 15, 12063. https://doi.org/10.3390/app152212063

AMA Style

Wang Y, Zhang M. CIDOC CRM-Based Knowledge Graph Construction for Cultural Heritage Using Large Language Models. Applied Sciences. 2025; 15(22):12063. https://doi.org/10.3390/app152212063

Chicago/Turabian Style

Wang, Yue, and Man Zhang. 2025. "CIDOC CRM-Based Knowledge Graph Construction for Cultural Heritage Using Large Language Models" Applied Sciences 15, no. 22: 12063. https://doi.org/10.3390/app152212063

APA Style

Wang, Y., & Zhang, M. (2025). CIDOC CRM-Based Knowledge Graph Construction for Cultural Heritage Using Large Language Models. Applied Sciences, 15(22), 12063. https://doi.org/10.3390/app152212063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CIDOC CRM-Based Knowledge Graph Construction for Cultural Heritage Using Large Language Models

Abstract

1. Introduction

2. Literature Review

3. Research Approach

3.1. Research Methodology

3.2. Research Object

3.3. Technical Approach

4. Construction of the Knowledge Graph Based on CIDOC CRM

4.1. Ontology Model Construction Approach and Methodology

4.2. RDF Triple Extraction

4.3. Node and Relationship Design for Knowledge Graph Construction Using Neo4j

5. Results Presentation

5.1. Graph Scale and Knowledge Coverage

5.2. Quantitative Evaluation and Accuracy Analysis

5.3. Performance of ChatGPT in Named Entity Recognition

5.4. Visualization Interface and Example Display

5.5. Generation of Value Themes

5.6. Geospatial Mapping

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI