Abstract
The cultural heritage of the Liao dynasty in Chifeng encompasses significant historical and cultural information that requires systematic digital preservation and management. However, heterogeneous data sources across museums, archives, and research institutions lack semantic interoperability, creating barriers for cross-system integration and knowledge discovery. This study proposes a standardized knowledge graph construction method by integrating the CIDOC Conceptual Reference Model version 7.2 with large language models. A unified ontology framework enables semantic consistency across diverse heritage data, while Generative Pre-trained Transformer-based models automatically extract structured triples from unstructured texts through prompt engineering and entity disambiguation, with the resulting knowledge graph implemented in Neo4j graph database. The constructed knowledge graph integrates 106 immovable cultural heritage records from Chifeng City with approximately 20 types of semantic relationships, forming a comprehensive semantic network covering people, places, events, time, and materials. K-means clustering reveals five cultural value themes, including “Nomadic Imperial Power System” and “Multi-Capital Governance Network”, while geospatial mapping identifies a “dual-core and ring-belt” distribution pattern for heritage protection zoning. This research demonstrates how international semantic standards can be integrated with artificial intelligence technologies to enable interoperable cultural heritage knowledge systems, providing practical implications for cross-institutional heritage management and archaeological survey planning.
1. Introduction
With the development of digital technologies, historical and cultural heritage data has become increasingly diversified and heterogeneous. In traditional cultural resource management models, issues such as data silos in institutions like museums and archives have been pervasive, making it difficult to achieve information sharing and knowledge discovery. To address this challenge, both international and domestic policies have been proposed in recent years to promote the digital transformation of cultural heritage. The digitalization of cultural heritage information involves the construction of knowledge graphs, dynamic cataloging of artifact information, and data exchange.
UNESCO has established comprehensive guidelines for cultural heritage preservation and digital management, emphasizing global knowledge sharing and cross-cultural communication. Among these initiatives, CIDOC-CRM has emerged as the international standard for cultural heritage data interoperability, playing a pivotal role in advancing digital preservation and information exchange across institutions [,]. In parallel, China’s “14th Five-Year Plan for Cultural Heritage Protection and Technological Innovation” has accelerated national efforts toward heritage digitalization and informatization. The plan outlines the necessity of creating a national cultural heritage resource catalog, integrating cultural heritage resource information with national spatial planning, improving the national archaeological excavation management system, and establishing a standardized system for cultural heritage digitalization. These policies provide clear direction for the integration, preservation, and dissemination of cultural heritage information resources. Cultural heritage informatization faces challenges in exchanging information between heterogeneous sources. The mapping of metadata to standardized ontologies is critical, and ontologies play a mediating role in this process, enabling consensus on conceptual definitions and their semantic relationships []. CIDOC-CRM, as the core ontology framework for semantic interoperability in cultural heritage information, systematically constructs formal representations of both tangible cultural entities and intangible cultural phenomena and their interrelationships, thereby establishing standardized semantic interfaces for cross-system data exchange. This ontology model was officially adopted by the International Organization for Standardization (ISO) in 2006 as ISO 21127 [] and was equivalently adopted in China in 2019 as GB/T 37965-2019 [], “Information and Documentation—Reference Ontology for Cultural Heritage Information Exchange.” Knowledge graphs, as a semantic data fusion technology, utilize triples to represent factual relationships, offering new pathways for the integration and retrieval of cultural heritage information [,]. The artifact knowledge graph built based on the CIDOC-CRM conceptual reference model can uniformly represent events, entities, and attributes in the cultural heritage domain, thereby enhancing interoperability between different data sources.
Three critical gaps limit current cultural heritage digitalization research. First, while CIDOC-CRM provides a robust ontological framework for semantic interoperability, its application to specific regional heritage resources—particularly for culturally distinct dynasties such as the Liao in China—remains limited, with few studies demonstrating practical implementation and—validation. Second, although large language models have shown promising capabilities in general-domain knowledge extraction, their integration with domain-specific ontology frameworks like CIDOC-CRM for automated cultural heritage knowledge graph construction has been insufficiently explored. Existing approaches often rely on manual annotation and expert curation, which are time-consuming and difficult to scale for heterogeneous multi-source heritage data. Third, most cultural heritage knowledge graphs focus primarily on descriptive metadata without systematically integrating spatial distribution analysis and cultural value theme extraction, limiting their utility for heritage protection planning and archaeological research. To address these gaps, this study proposes an integrated approach that combines CIDOC-CRM standardization with large language model-based automated extraction to construct a comprehensive knowledge graph for Liao dynasty cultural heritage in Chifeng.
This study focuses on the intelligent information management of Liao dynasty artifacts in Chifeng City. By combining the CIDOC-CRM semantic model, we construct a cultural heritage resource knowledge graph based on CIDOC-CRM as the ontology. Through the integration of multi-source information, such as literature, books, and census data, and utilizing large language models for automated entity and relationship extraction, this study achieves in-depth integration and application of cultural resource information. Furthermore, through value theme induction and geographic spatial analysis, the study reveals the cultural logic and protection priorities of artifact distribution.
The main contributions of this study are as follows:
- We propose a standardized knowledge graph construction method that integrates CIDOC Conceptual Reference Model version 7.2 with large language models, enabling automated extraction of structured knowledge from heterogeneous cultural heritage texts while maintaining semantic consistency across diverse data sources.
- We design and implement a CIDOC-CRM-based ontology framework specifically tailored for Liao dynasty cultural heritage in Chifeng, mapping 106 immovable cultural heritage records with approximately 20 types of semantic relationships into a comprehensive knowledge network covering people, places, events, time, and materials.
- We demonstrate the practical application of Generative Pre-trained Transformer-based models for domain-specific knowledge extraction in cultural heritage, addressing challenges such as entity nesting and the scarcity of annotated samples through prompt engineering and entity disambiguation techniques.
- We reveal five distinct cultural value themes through K-means clustering analysis—“Nomadic Imperial Power System”, “Multi-Capital Governance Network”, “Ritual Order and Ancestor Worship”, “Buddhism Dissemination and Cultural Fusion”, and “Sacred Mountains and Frontier Landscape Order”—providing new insights into the cultural logic of Liao dynasty heritage.
- We identify a “dual-core and ring-belt” spatial distribution pattern through geospatial mapping, offering evidence-based recommendations for heritage protection zoning, archaeological survey prioritization, and cultural tourism planning.
The remainder of this paper is organized as follows: Section 2 reviews related literature on cultural heritage knowledge graphs, CIDOC-CRM ontology applications, and large language model-based information extraction. Section 3 describes the research approach, including the research methodology, research object, and technical roadmap. Section 4 presents the detailed construction process of the knowledge graph based on CIDOC-CRM, covering ontology model construction, RDF triple extraction, and Neo4j implementation. Section 5 presents the results, including graph scale and coverage, visualization interface, value theme generation, and geospatial mapping analysis. Section 6 concludes the paper with a summary of findings, discussion of limitations, and directions for future research.
2. Literature Review
In recent years, substantial progress has been made worldwide in cultural–heritage digitization, particularly in applying the CIDOC–CRM ontology and integrating large language models (LLMs). The 2023 edition of ISO 21127 reinforces CIDOC–CRM’s role as the lingua franca for semantic interoperability across institutions [,]. In China, scholarship has increasingly centered on local resource characteristics to form a CIDOC–CRM–based ontology construction paradigm and to operationalize knowledge graphs across literature, museum holdings, and archaeological sources [,,]. Representative domestic efforts include the large–scale ancient–text knowledge graph by Ouyang Jian et al., covering 650 k+ book titles, 220 k+ authors, ~1.5 M editions and 13 k+ toponyms—providing a comprehensive, interconnected description of bibliographic knowledge [,]. Zhuang Ying’s CRM--ACA leverages expert cataloging and semantic modeling of museum collections to build the Palace Museum’s heritage KG and to furnish an extensible intelligent framework for heritage information organization and use [,]. In parallel, deployment–oriented international profiles facilitate implementation: Linked Art engineers CRM–style event patterns in JSON–LD, while the Europeana Data Model (EDM) supplies aggregation–scale specifications and mapping guidelines—both emphasizing compatibility with CRM [,].
Nested entity recognition and limited training data pose unique challenges for heritage text processing. Researchers have addressed these issues by combining domain NER with knowledge-enhanced strategies [,]. For artifact corpora, Wang et al. created the FewRlicsData dataset and proposed RelicsNER, which supports robust span detection and subsequent CRM property alignment []. On relation/triple extraction, domestic work has applied ChatGPT-4 to classical-Chinese corpora, comparing prompt templates and light-tuning strategies; reported F1 scores of 56.07% and 30.50% on two datasets highlight domain-adaptation potential under limited supervision [].
Knowledge graph and deep learning integration has advanced intelligent heritage data management globally. Huang et al. show that cultural heritage is a leading application area for KGs and the semantic web; using Palace Museum ceramics as a case, they report improvements in visualization, interconnectivity, and retrieval [,,,,]. On standard-based integration, Câmara et al. mapped EU “COURAGE” data to CIDOC–CRM and validated quality via SPARQL/SHACL, demonstrating practical unification and sharing across heterogeneous sources []. Project-level alignment with Linked Art and EDM continues to grow, positioning CIDOC–CRM as the “semantic glue” among diverse resources [,].
LLM-based extraction has evolved from general text to domain literature. Evidence suggests that, under few-shot prompting, GPT-3 can approach supervised models on relation extraction [], while modest fine-tuning of GPT-3/Llama-2 enables joint NER-RE to capture complex scientific records []. Broader evaluations find that zero/few-shot LLMs are inconsistent for NER but can surpass baselines on relation extraction with small prompt/tuning budgets—underscoring the need to report precision/recall/F1 and experimental configurations for reproducibility []. Architecturally, a metamodel-based extensible NLU design frames prompts, extraction schemas and post-processing rules as first-class artifacts, enabling provider-agnostic interfaces and maintainable scaling—principles directly applicable to end-to-end LLM+CRM pipelines []. However, recent comprehensive reviews highlight persistent challenges in deploying LLMs for domain-specific tasks, including data bias, prompt engineering complexity, and the need for robust evaluation frameworks tailored to specialized domains such as cultural heritage [].
Current research exhibits complementary regional strengths: Chinese scholars prioritize ontology construction for museum collections and classical texts using few-shot learning [], whereas international efforts focus on standardization and knowledge graph integration with deep learning models. The intersection—building internationally compliant heritage KGs in China and using LLMs to enhance cultural-text processing—will further accelerate the digitization and informatisation of heritage scholarship.
3. Research Approach
3.1. Research Methodology
This study adopts a combined approach of ontology modeling and automated information extraction. Firstly, based on the CIDOC-CRM conceptual reference model, semantic definitions and structural modeling are applied to the cultural heritage objects and related entities from the Liao dynasty period in Chifeng. Elements such as people, organizations, events, places, and objects are mapped to the corresponding categories within the CIDOC-CRM model, providing a unified description of the various attributes and relationships of the cultural artifacts. Following this, knowledge is extracted from multi-source heterogeneous texts, including literature, census tables, local chronicles, and related books. Large language models, such as GPT, are employed to automatically identify entities and relationships within the text and represent the facts in the form of subject-predicate-object triples. After normalizing the extracted triples, they are imported into a Neo4j graph database for data storage and querying. The entire methodological process covers key stages such as data preprocessing, conceptual modeling, automated extraction, and knowledge graph construction, ensuring the scientific rigor and practical applicability of the research results.
3.2. Research Object
Chifeng, located in the Inner Mongolia Autonomous Region, lies in the agricultural-pastoral zone and enjoys a favorable geographical position. Its nearly ten thousand years of cultural development have fostered a profound cultural heritage, with its cultural history traceable to the prehistoric Longshan culture. By the early 10th century, the Khitan Liao culture, which originated in this region, influenced the Central Plains. Through the construction of a cultural heritage resource model based on the CIDOC-CRM ontology, this study integrates entities related to Chifeng’s Liao dynasty period, including artifacts, excavation sites, historical figures, artifact attributes, and the discovery processes, into a unified framework []. This approach facilitates the structured representation of cultural heritage and its associated information. The data sources include various forms, such as archaeological census records, historical literature, local chronicles, and academic works. These data, originally dispersed in documents and tables, are extracted and integrated through this methodology, ultimately resulting in a knowledge graph that provides a comprehensive overview of the Liao dynasty cultural heritage in Chifeng, serving as a data foundation for historical research and heritage conservation.
3.3. Technical Approach
The research constructs a knowledge graph for Chifeng’s Liao dynasty cultural heritage resources based on the CIDOC-CRM reference model. The primary goal is to build a historical and cultural network centered around the heritage sites, enabling a holistic display of Liao dynasty artifacts and related historical and cultural information in Chifeng (Figure 1). The main data sources for cultural heritage resources include literature, archaeological census records, local chronicles, and books. After initial data collection, preprocessing is performed, including standardization, deduplication, and cleaning, to construct a corpus suitable for subsequent knowledge extraction. In the ontology modeling stage, domain-specific ontologies are designed based on CIDOC-CRM, mapping artifact types, places, people, events, and their attributes to the ontology, thus constructing a semantic framework for future extraction and integration. GPT and other large language models are then utilized for text analysis, enabling the automatic identification of entities and the extraction of subject-predicate-object triples []. Finally, the extracted triples are imported into the Neo4j graph database, where nodes and relationships are created, integrated, and queried to achieve a networked semantic representation of Chifeng’s Liao dynasty cultural heritage resources.
Figure 1.
Technical roadmap for the knowledge graph of Chifeng Liao dynasty cultural relics based on CIDOC-CRM.
4. Construction of the Knowledge Graph Based on CIDOC CRM
4.1. Ontology Model Construction Approach and Methodology
The “Seven-Step Method” is an effective approach for constructing domain-specific ontologies. This study focuses on immovable cultural heritage from the Liao dynasty in Chifeng, integrating the CIDOC CRM ontology model to build a semantic knowledge graph for this specific domain.
The ontology construction process is divided into seven steps:
- Defining the Domain and Objectives of Ontology Construction
The ontology is constructed for the domain of “Immovable Cultural Heritage of the Liao Dynasty in Chifeng”, focusing on material cultural resources such as historical sites, ancient buildings, inscriptions, and city ruins in the Chifeng region during the Liao period. The objective is to systematically integrate the spatiotemporal information, construction and restoration processes, cultural attributes, and current conditions of the related heritage resources through ontology modeling, forming a structured, computable, and visualizable knowledge system that supports the subsequent construction of the knowledge graph and digital management of cultural heritage.
- 2.
- Selecting and Expanding the Applicable Ontology Framework
In the field of cultural heritage semantic modeling, the internationally recognized CIDOC CRM (Conceptual Reference Model) is widely adopted. Based on the core architecture of the CIDOC CRM v7.2 model, and referring to its classes and properties system, the model is appropriately trimmed and extended to account for the specific characteristics of Chifeng’s Liao dynasty heritage resources. Local semantic subclasses, such as Liao dynasty place names, Liao dynasty architectural types, and inscription information, are added to ensure the completeness of the semantic expressions within the project’s domain.
- 3.
- Organizing the Terminology System of the Domain
Drawing on sources such as the Chifeng Cultural Heritage Records, “History of Liao”, local chronicles, and archaeological survey materials, a list of terms related to immovable Liao dynasty cultural heritage in Chifeng is compiled. These terms include, but are not limited to, Liao dynasty palace site ruins, Khitan religious buildings, stone inscriptions, moat site ruins, Liao dynasty city walls, and cliff inscriptions. These terms will serve as extensions or subclasses of the E55 Type class in the ontology.
- 4.
- Defining Core Concept Classes and Hierarchical Structure
The top-level concept in the ontology is “Immovable Cultural Heritage Resources” (inherited from CIDOC’s E24 Physical Man-Made Thing), which includes subclasses such as historical sites, ancient buildings, city ruins, stone inscriptions, and monuments. Other core classes include: E53 Place (geographical location), E52 Time-Span (time range), E21 Person (builder), etc. A reasonable hierarchy and semantic network are established by integrating the temporal, spatial, person, and event information related to Liao dynasty artifacts.
- 5.
- Defining Ontological Concept Properties
Based on the clarified class structure, corresponding properties are defined for each concept class, as exemplified in Table 1.
Table 1.
Properties defined for concept classes (partial).
- 6.
- Setting Constraints on Properties
Strict domain and range definitions are set for the properties defined in the ontology. Such constraints ensure data consistency and semantic integrity. For example, the domain of P4 “has time-span” is set to E22 Man-Made Object or E11 Modification, and the range is E52 Time-Span; the domain of P7 “took place at” is set to E5 Event, and the range is E53 Place; the domain of P108 “has produced” is E12 Production, and the range is E22 Man-Made Object.
- 7.
- Ontology Instantiation, Data Storage, and Graph Visualization
After completing the ontology modeling, methods such as text mining, entity recognition, and rule extraction are used to extract triples conforming to the CIDOC structure from historical texts, which are then mapped to the ontology model for instantiation. Finally, the structured data is imported into the Neo4j graph database in RDF or TTL format, creating the knowledge graph for immovable cultural heritage of the Liao dynasty in Chifeng. This graph enables the visualization of entities, relationships, temporal, and spatial information, along with semantic query support []. The construction of the Chifeng Liao dynasty cultural heritage ontology is shown in Figure 2.
Figure 2.
CIDOC-CRM entity-attribute relationships of “Chifeng Liao Dynasty Cultural Relics”.
4.2. RDF Triple Extraction
A critical step in constructing the knowledge graph is the automatic extraction of RDF triples (subject-predicate-object) from texts related to Liao dynasty cultural heritage. This study designs a method based on large language models and domain-specific named entity recognition (NER), transforming unstructured text into triples that conform to the CIDOC CRM ontology semantics.
First, we collected multi-source textual data on Liao dynasty cultural heritage, including encyclopedia entries, archaeological reports, and historical research articles, covering structured, semi-structured, and unstructured formats []. The raw corpus is cleaned, segmented into sentences, and standardized, removing irrelevant information and splitting it into manageable segments for processing. Given the complexity of terminology and the nested nature of entities in the cultural heritage domain, domain-specific Named Entity Recognition (NER) models, such as the RelicsNER method [], are often used. In this study, NER is combined with NLP techniques, using OpenAI’s GPT-5 model for entity recognition. For example, in the sentence, “The Lingyan Temple Pagoda was built in the fifth year of Liao Tianqing, located in Balin Left Banner, Chifeng City”, the GPT model can identify “Lingyan Temple Pagoda” as the artifact entity, “fifth year of Liao Tianqing” as the time entity (which can be standardized to 1115 AD), and “Balin Left Banner, Chifeng City” as the location entity [].
Based on the identified entities, the study uses a combination of rule-based methods and large language models to extract the semantic relationships between entities, forming triples. To fully leverage the semantic understanding capabilities of the large model, a CIDOC-CRM-defined prompt is designed: a predefined list of entity and relationship types is provided to GPT, instructing it to extract the corresponding subject-predicate-object triples from the text and output them in CSV format. Extraction patterns are constructed based on the CIDOC CRM ontology, defining entity types (such as “artifact”, “person”, “place”, “time”, etc.) and relationship types (such as “creator”, “discovery location”, “period”, “material”, etc.), along with the subject and object types connected by each relationship, which serve as constraints for GPT’s information extraction. This pattern functions as a guide for the model. For instance, in the above example, under our prompt, the large model will extract the following candidate triples: (Lingyan Temple Pagoda, Creator, XX), (Lingyan Temple Pagoda, Construction Time, 1115 AD), (Lingyan Temple Pagoda, Location, Balin Left Banner, Chifeng City), and so on. Prompt details are provided in Appendix A. After entity extraction and triple construction, the extracted entities and relationships undergo semantic standardization to align with the class and property systems of the CIDOC CRM model. This process includes entity disambiguation and category mapping: entities that refer to the same concept (e.g., “Liao dynasty” and “Khitan”) are merged and linked to the unique entity node in the knowledge base. Entities are then categorized according to the CIDOC CRM, such as mapping person names to E21 “Person”, locations to E53 “Place”, time periods to E52 “Time-Span”, and events (such as construction or excavation) to E5/E7 “Event/Activity,” etc. For relationship predicates, they are similarly mapped to the attributes defined by CIDOC, for example, “creator” corresponds to “E12 Production Event” in CRM, which is linked to “E21 Person” through property P14, indicating that a person carried out the construction activity. “Discovery location” corresponds to “E7 Activity” linked to “E53 Place” via property P7, indicating that the discovery site was the location of an archaeological excavation activity. “Period” corresponds to the production or usage time of the artifact, connected to the time entity via property P4. Through this semantic alignment of entity types and relationships, the extracted knowledge seamlessly integrates into the CIDOC CRM ontology structure, ensuring consistent representation across data sources. The triples extracted in this study are stored in a unified format, such as <subject, predicate, object>, with each subject and object tagged with type labels for later import into a graph database or conversion to RDF format [].
Despite automation of extraction and transformation, certain ontology alignment and entity-merging operations remain semi-manual. For example, automatically extracted entity names often have multiple homonyms or ambiguous contexts; a single label like “Hu Guo Temple” may refer to distinct cultural objects or administrative units across texts. These ambiguous matches require human expertise to select the correct CIDOC–CRM class and unify duplicates. In addition, entity deduplication and linking across heterogeneous sources (e.g., merging “Upper Capital” from chronicles with “Shangjing City Site” from archaeological reports) are manually validated to preserve historical accuracy. These semi-manual processes currently limit scalability, as human experts must check ambiguous matches and adjust mapping rules. To enhance scalability, future work will incorporate ontology alignment tools (e.g., OpenRefine RDF Extension, Silk, or Alignment APIs) and active-learning workflows to progressively automate entity reconciliation while retaining expert oversight. Incorporating such semi-automated alignment can reduce manual burden, ensure semantic consistency, and enable knowledge-graph construction at larger scale.
4.3. Node and Relationship Design for Knowledge Graph Construction Using Neo4j
To instantiate the extracted RDF triples into a knowledge graph, this study uses the Neo4j graph database to store and manage the knowledge. As a native graph database, Neo4j supports a flexible node-relationship data model, making it highly suitable for constructing complex artifact knowledge graphs according to the CIDOC CRM ontology structure [].
In Neo4j, unique indexes or constraints are created for the primary node types (such as artifacts, people, places, etc.) based on attributes like name or identifier, to avoid duplicate nodes and ensure the uniqueness and consistency of the knowledge. The edges in the knowledge graph are designed as directed relationships, with names corresponding to the semantic properties in CIDOC CRM. Each relationship type defines the node types for both the source and target entities.
These relationships are implemented in Neo4j as relationship types, ensuring that each edge’s semantics correspond directly to the definitions in the CIDOC CRM property set. For example, the custom relationship type BUILT_BY (creator) corresponds to the composite relationship E22 through E12/P14 linked to E21 in CRM; DISCOVERED_AT (discovery location) corresponds to E22 through E7/P7 linked to E53 in CRM, etc. This design leverages the efficient query capability of graph databases for direct connections, while maintaining semantic consistency at the foundational level.
The extracted and standardized RDF triples are batch-imported into the Neo4j graph database []. In practice, this is done using Neo4j’s Cypher query language or batch import tools: first, node data is organized into CSV files (listing artifacts, people, places, etc., and their attributes), and relationship data is organized into tables of subject-object pairs and relationship types. The LOAD CSV statement in Cypher is then used to create nodes and relationships one by one.
5. Results Presentation
Through knowledge extraction and graph construction, this study has collected and cleaned 106 records of immovable cultural heritage in Chifeng City. The completed “Chifeng Liao Dynasty Cultural Heritage Knowledge Graph” reaches a certain depth in both scale and content coverage, effectively integrating heterogeneous information in the field of Liao dynasty cultural heritage. The study finally presents an overview of the Chifeng Liao dynasty cultural heritage resource graph and its visual interface.
5.1. Graph Scale and Knowledge Coverage
The knowledge graph constructed in this study integrates Liao dynasty cultural heritage knowledge from dozens of literature and sources, achieving broad coverage of major artifact entities and their associated relationships (Table 2). Approximately 20 types of relationships are included, covering key connections between artifacts and entities such as people, places, and time, such as “creator”, “discovery location”, “period affiliation”, “material”, “usage”, and more. Through these semantic relationships, the graph organically links the basic information, historical background, and archaeological data of the artifacts, forming a semantic network.
Table 2.
Extraction of Liao dynasty cultural heritage resource triples in Chifeng (partial).
The knowledge graph adopts the CIDOC CRM standard, ensuring that each piece of knowledge has a clear semantic definition. This strict ontology alignment guarantees that knowledge from different sources can be semantically integrated within the graph, for example, unifying information from research studies and records from archaeological reports under the same semantic framework. Overall, the knowledge graph covers a wide range of entities and knowledge points in the field of Chifeng Liao dynasty cultural heritage, providing in-depth exploration of the multifaceted relationships of key artifacts (such as important tombs and temple buildings), thereby offering rich knowledge support for subsequent applications.
5.2. Quantitative Evaluation and Accuracy Analysis
The current evaluation is based on three historical Liao-related texts: “The Study of Song-Liao Diplomatic Relations”, “Geographical Notes on the Liao History”, and “History of Khitan Art”. Approximately 100 passages were selected from each book, totaling 300 passages as experimental data.
The reliability of information retrieval and extraction techniques must be validated through quantitative metrics. Traditional evaluation metrics include precision, recall, and F1 score. Among these, Precision refers to the proportion of correct triples in the system’s extraction results; Recall refers to the proportion of correctly extracted triples in the system compared to the gold standard; F1 is the harmonic mean of precision and recall. This evaluation framework is commonly used in information extraction tasks.
In the 300 passages, 975 knowledge triples were manually annotated, and 889 triples were automatically extracted. Of these, 702 were true positives (TP), 162 were false positives (FP), and 221 were false negatives (FN). By comparing these, we get
True Positives (TP) = 702: The number of correctly extracted triples by the model.
False Positives (FP) = 162: The number of triples extracted by the model but not found in the manual annotations.
False Negatives (FN) = 221: The number of triples present in the manual annotations but not extracted by the model.
Using the above formulas, we can calculate the overall precision, recall, and F1 score:
Precision = 702 ÷ (702 + 162) ≈ 0.812
Recall = 702 ÷ (702 + 221) ≈ 0.761
F1 score = 2 × 0.812 × 0.761 ÷ (0.812 + 0.761) ≈ 0.786
Table 3 presents the relationship-wise evaluation results, showing performance variations across different CIDOC-CRM properties.
Table 3.
Relationship-wise evaluation results.
Overall, the automatic extraction performs best in identifying time expressions, but there are more errors in distinguishing artifact types and identifying builders. This suggests that the model is more sensitive to explicit years but tends to have missed or incorrect extractions when dealing with polysemy and complex names.
To facilitate understanding of the sources of errors in automatic extraction, we present 10 typical passages from each of the three historical texts (Table 4, Table 5 and Table 6), showing the differences between manual annotations and model extraction results. For simplicity, we only list core triples and omit irrelevant relationships.
Table 4.
Example passages from “The Study of Song-Liao Diplomatic Relations”.
Table 5.
Example passages from “Geographical Notes on the Liao History”.
Table 6.
Example passages from “History of Khitan Art”.
Table 4 presents examples from “The Study of Song-Liao Diplomatic Relations”. This text largely provides an overview of historical and geographical literature, lacking specific artifact descriptions. The automatic extraction algorithm struggles to distinguish between the main text and book reviews, leading to a significant number of mis-extractions.
Largely of an overview of historical and geographical literature, lacking specific artifact descriptions. The automatic extraction algorithm struggles to distinguish between the main text and book reviews, leading to a significant number of mis-extractions.
Table 5 presents examples from “Geographical Notes on the Liao History”, which contains detailed geographical descriptions and administrative evolution records. The extraction results show that the model performs better on place names but struggles with complex geographical relationships and historical administrative changes.
This book primarily discusses Khitan music, dance, rituals, and court systems, with little focus on specific architecture. However, the algorithm tends to extract artifacts based on words like “palace” or “hall”.
Table 6 demonstrates examples from “History of Khitan Art”. This book primarily discusses Khitan music, dance, rituals, and court systems, with little focus on specific architecture. However, the algorithm tends to extract artifacts based on keywords like “palace” or “hall”, resulting in false positives for artifact entities.
By comparing the automatic extraction and manual annotation of the three Liao history-related texts, we have constructed a gold standard dataset comprising 975 triples and evaluated the performance of a GPT-based model in extracting knowledge about Chifeng Liao Dynasty artifact resources. The results show that the model has an overall precision of approximately 0.81 and a recall rate of 0.76, indicating its potential usability in structured extraction, though there are still issues with missed and incorrect extractions. Specifically, the model tends to:
Entity boundary errors: Incorrectly classifying non-artifact phrases (such as “district” or “sixteen departments”) as artifacts;
Type confusion: Unstable classification of suffixes like “tomb” or “state”, often misclassifying ancient tombs as ruins or architecture;
Time errors: Occurrence of numerical substitution or omissions in time expressions;
Missing recognition: Frequently missing all triples in sentences containing short place names.
5.3. Performance of ChatGPT in Named Entity Recognition
To further evaluate the capability of large language models (LLMs) in extracting structured knowledge from historical texts, ChatGPT was applied to the same corpus from The Geography Section of the Liao History (Liaoshi·Dilizhi) for Named Entity Recognition (NER). The model outputs were compared against the manually annotated gold standard using standard metrics—Precision, Recall, and F1-score. The study selected 13 passages of text (as shown in Figure 3).
Figure 3.
The study selected 13 passages of text.
As shown in Table 7, under strict matching, ChatGPT achieved an F1-score of 0.41, outperforming spaCy (0.214) and CKIP-BERT (0.018). Under lenient matching (0.5 threshold), ChatGPT achieved an F1-score of 0.505, significantly outperforming spaCy (0.25) and CKIP-BERT (0.346). This indicates ChatGPT’s superior contextual understanding and generalization ability in identifying complex entities. The model demonstrated particularly strong performance in recognizing temporal and geographical entities, correctly identifying examples such as “Shangjing Linhuang Prefecture”, “Song Jingde Fourth Year”, and “Tonghe Twenty-Fifth Year”.
Table 7.
Conclusion.
However, minor boundary inconsistencies were observed—e.g., merging multi-word expressions (“Liao reign 210 years”) or splitting entity boundaries. Nonetheless, ChatGPT achieved notably higher recall, suggesting stronger adaptability in weakly supervised or unstructured text scenarios.
Furthermore, ChatGPT showed semantic awareness in entity-type inference, such as categorizing “Liao reign” as a dynasty (ORG) and “Shengzong” as a person (PER), highlighting its inherent knowledge transfer capabilities.
In summary, ChatGPT outperforms traditional NER models in historical text processing, offering high interpretability and domain adaptability. It can serve as an essential component or semi-automatic annotation assistant in knowledge graph construction for cultural heritage research.
5.4. Visualization Interface and Example Display
The data from the knowledge graph is visually displayed through a front-end interface, allowing users to intuitively browse and understand the knowledge structure (Figure 4).
Figure 4.
Chifeng Liao dynasty cultural heritage knowledge graph (partial).
In this example, the central node represents a particular artifact entity, with various related entity nodes and relationship edges radiating around it. Nodes in different colors represent different types of entities. For example, the red nodes represent immovable Liao dynasty cultural heritage in Chifeng, purple nodes represent periods/dynasties, orange nodes represent places, and blue nodes represent artifact attributes (such as shape, decoration, etc.). Each edge is labeled with a relationship name (e.g., “period affiliation”, “original discovery location”, “design”, etc.), thus clearly showcasing the semantic relationship network centered around the artifact. Users can click on any node to expand and explore more related relationships, enabling gradual exploration. The right side of the interface provides a legend and control panel, allowing users to search, filter, and expand/collapse hierarchical content. For example, when users enter “Lingyan Temple Pagoda” in the search box, the corresponding node is highlighted, and related information nodes surrounding the “Lingyan Temple Pagoda” are displayed, such as its construction date, creator, location, and other relationships.
More importantly, leveraging the visualization and graph exploration capabilities of the Neo4j graph database, this study has made groundbreaking progress in the “exploratory application” of the knowledge graph. By visually analyzing the multiple relationships among nodes such as artifacts, places, periods, and people within the graph, researchers can not only quickly access basic information about a specific artifact but also discover knowledge connections that are difficult to identify in traditional tabular formats.
For example, when exploring the “Lingyan Temple Pagoda” as a central node, its related nodes include “fifth year of Liao Tianqing” (time), “Balin Left Banner, Chifeng City” (place), and “construction event”, which are semantically connected. These further indirectly link to people nodes like “Empress Dowager Li”, indicating the political and cultural significance of this building within the Liao dynasty royal family context. Similarly, through cluster analysis of architectural nodes such as “Huguo Temple”, “Chuiqing Temple”, and “Qing’an Zen Temple”, it is found that these buildings are primarily made of “stone” or “wood” and are spatially concentrated in the Balin Left Banner area of Chifeng. This suggests that the region had a high density of Buddhist architecture during the Liao dynasty, possibly reflecting the presence of a political or religious center.
Moreover, extending the “discovery location—place—time” triple relationship, the distribution timeline of major Liao dynasty sites can be constructed. Nodes such as “1058 AD” and “1077 AD” cluster around the construction records of several artifacts, indicating that the mid-to-late Liao dynasty was a period of active construction activity. The discovery of these knowledge connections is a key demonstration of how knowledge graphs exceed the capabilities of traditional databases.
5.5. Generation of Value Themes
By extracting graph structural features and calculating centrality indicators such as node degree and betweenness centrality, key heritage sites and events are identified. The degree of palace architecture nodes (such as “Kaihuang Hall” and “Ande Hall”) is significantly higher than that of general sites, indicating that these nodes are associated with multiple historical events and figures. Local subgraph analysis of nodes like “Lingyan Temple Pagoda” also reveals its strong connections with nodes such as “fifth year of Liao Tianqing” (time), “Balin Left Banner, Chifeng City” (place), and “construction event”, suggesting the political-religious significance of this building within the royal family context.
To avoid subjective classification, this study combines the semantic descriptions of each artifact (including attribute objects from the triples and summary texts) into text vectors. TF-IDF feature representation is used, and K-means clustering algorithm is applied for theme discovery. After evaluating the results using the silhouette coefficient, the optimal number of clusters is determined to be 5 (k = 5). The clustering results reveal that one cluster focuses on terms like “palace architecture”, “noble bed”, and “Yelü Abaoji”, while another cluster centers around terms like “tomb clusters”, “ancestral temples”, and “imperial tombs.” Other clusters include terms related to “Buddhist temples”, “ordination platforms”, and “monk inscriptions”, as well as clusters for “mountain ranges”, “hunting camps”, and “multi-capital system”, “regional and military governance institutions”.
By observing the keywords and representative entities in each cluster and combining them with the historical context of the Liao dynasty in Chifeng, the five clusters are named as follows: “Nomadic Imperial Power and the Oruoduo Administrative System”, “Multi-Capital System and Territorial Governance Network”, “Ritual Order and Ancestor Worship”, “Buddhism Dissemination and Cultural Fusion”, and “Sacred Mountains and Frontier Landscape Order.” These themes reflect the differences and complementarities in their historical development and spatial patterns (Table 8).
Table 8.
Classification of value themes for Chifeng Liao dynasty cultural heritage resources.
5.6. Geospatial Mapping
To study the spatial associations between cultural heritage artifacts of different value themes and to identify the historical artifact enrichment zones in Chifeng City, this study standardizes and stores the geographic information of immovable cultural heritage and maps it onto a GIS platform. The specific approach involves determining the geographic location and multi-source location estimation for each immovable artifact based on literature records, local chronicles, archives, and collection records. The estimation results are provided with a quantifiable uncertainty (in meters). For artifacts with uncertain coordinates, buffer zone processing (e.g., ±500 m) is applied in GIS analysis. The coordinates and their metadata (source, confidence, estimation method, uncertainty, etc.) are exported in standard formats such as GeoJSON/CSV and projected into a suitable local coordinate system before being imported into GIS for subsequent hotspot analysis, clustering identification, and spatiotemporal association studies (Table 9).
Table 9.
Longitude and latitude of Chifeng Liao dynasty immovable cultural heritage (partial).
For immovable cultural heritage records without precise geographical coordinates in the literature or archives, we adopted a pragmatic georeferencing strategy. Following the practice of China’s Third National Cultural Relics Survey, these records were anchored to the centroid of the nearest township-level administrative unit. The centroid coordinates of each township were derived from the national administrative boundary dataset, ensuring that every record has a spatial proxy for analysis. This approach prevents the exclusion of valuable records from GIS-based hotspot and clustering analyses, while explicitly flagging these points as approximations. In subsequent analyses we assigned wider buffer zones to these approximated locations to account for their greater locational uncertainty and clearly documented the data source (the Third National Cultural Relics Survey) and estimation method in the metadata.
After all the artifacts’ coordinates are standardized and mapped to the CIDOC-CRM entity coordinates and imported into GIS, the spatial distribution of Chifeng’s Liao dynasty historical artifacts presents a “dual-core—belt” mixed aggregation pattern (as shown in Figure 5). First, the Upper Capital (Balin Left Banner) and the surrounding ancestral temples—tombs—palace complex form the first core; second, the Middle Capital (Ningcheng County) and the surrounding religious architectural complex form the second core. Between these two cores and on the outer edges, several rich aggregation zones centered around sacred mountains, rivers, or ancient roads are observed. This pattern reflects the overlapping and division of functions between Liao dynasty politics—ritual systems (palaces, ancestral temples) and religion—folk culture (temples, stone inscriptions) in spatial terms, and also proves that Chifeng served as both a political governance center and a hub for religious culture and transportation during the Liao dynasty. Based on this distribution feature, the research and protection strategies can be divided into: (1) strict control protection and archaeological priority excavation of core protection zones; (2) monitoring and digital management of belt zones and corridors; (3) several secondary enrichment points as candidates for cultural display and public education sites. These conclusions are supported by hotspot analysis (kernel density estimation) and spatial clustering (DBSCAN/k-means) results, and layered statistical verification is conducted based on temporal nodes (Time-Span) and artifact types, providing actionable decision-making support for subsequent protection prioritization, research site selection, and cultural tourism planning.
Figure 5.
Enrichment zones of Chifeng Liao dynasty cultural heritage.
6. Conclusions
This study focuses on immovable cultural heritage from the Liao dynasty in Chifeng and, using the CIDOC-CRM international standard ontology framework, combines the knowledge extraction capabilities of large language models with the storage and visualization advantages of the Neo4j graph database to construct the “Chifeng Liao Dynasty Cultural Heritage Knowledge Graph”. A new method for knowledge organization of local cultural heritage is proposed: introducing CIDOC-CRM into the semantic modeling of Chifeng Liao dynasty artifacts and establishing a cross-source, cross-format data integration mechanism, effectively solving the problem of “data silos” in artifact information. The study combines large language models with ontology modeling, designing a process for entity relationship extraction and ontology alignment based on GPT, which effectively addresses challenges such as complex entity nesting and the scarcity of labeled samples in the cultural heritage domain, thus verifying the feasibility of large models in semantic extraction for cultural heritage.
The study constructs a multi-dimensional artifact knowledge graph. Through ontology modeling and triple extraction, a semantic network covering multiple knowledge categories such as people, places, events, time, and materials is formed, providing an initial revelation of the spatial distribution and cultural value structure of Chifeng Liao dynasty artifacts. By mapping artifact entities to GIS, a “dual-core–belt” aggregation pattern of Chifeng Liao dynasty artifacts is identified, providing spatial evidence for archaeological surveys, heritage protection zoning, and cultural exhibitions.
However, the study has several limitations that warrant acknowledgment. First, the accuracy and consistency of knowledge extraction require further verification, especially when dealing with complex sentence structures and ambiguous expressions, which still involve some errors. The current prompt engineering approach, while effective, lacks standardized quality control mechanisms for ensuring extraction reliability across diverse text types. Second, some artifacts lack precise temporal and spatial coordinates, which limits the accuracy of spatial clustering analysis and may affect the reliability of distribution pattern identification. Third, the sample scope is geographically and temporally constrained, focusing exclusively on Liao dynasty heritage in Chifeng, which limits the generalizability of the proposed methodology to other historical periods or regions. Fourth, the interpretability of the large language model’s decision-making process remains a challenge, as the “black box” nature of LLMs makes it difficult to trace the reasoning behind specific entity-relationship extractions. Fifth, the knowledge graph lacks a real-time update mechanism, meaning that newly discovered artifacts or updated research findings cannot be automatically integrated into the system. Finally, the evaluation framework employed in this study relies primarily on manual verification by domain experts, and there is a need for more systematic, domain-specific quality assessment metrics tailored to cultural heritage knowledge graphs.
Despite these limitations, this research demonstrates promising directions for future improvements. First, incorporating active learning and human-in-the-loop mechanisms could reduce manual intervention while maintaining quality control, gradually improving the model’s autonomous extraction capabilities through iterative expert feedback. Second, expanding the model’s application scope to other historical dynasties and geographical regions would validate its transferability and establish a more comprehensive Chinese cultural heritage knowledge system. Third, developing explainable AI techniques specifically for entity-relationship extraction could enhance the transparency and trustworthiness of the automated knowledge construction process. Fourth, establishing standardized evaluation metrics for cultural heritage knowledge graphs, incorporating both computational measures (precision, recall, F1-score) and domain-specific quality indicators (historical accuracy, semantic richness, ontological consistency), would provide more rigorous assessment frameworks. Fifth, implementing dynamic update mechanisms through continuous monitoring of academic publications, archaeological reports, and museum cataloging systems would ensure the knowledge graph remains current and comprehensive. Sixth, extending the system’s functionality beyond visualization to include intelligent question-answering, personalized recommendation systems, and virtual heritage experiences could enhance its practical value for researchers, educators, and the general public. Finally, promoting cross-institutional data sharing and collaboration through standardized CIDOC-CRM mappings could facilitate the construction of a national or even international cultural heritage knowledge network, advancing the field toward truly interoperable digital heritage ecosystems. These future directions not only address the current limitations but also point toward the broader potential of integrating international semantic standards with AI technologies for intelligent cultural heritage management.
Author Contributions
Conceptualization, Y.W. and M.Z.; methodology, Y.W.; software, Y.W.; validation, Y.W. and M.Z.; formal analysis, Y.W.; investigation, Y.W.; resources, M.Z.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, M.Z.; visualization, Y.W.; supervision, M.Z.; project administration, M.Z.; funding acquisition, M.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Ministry of Natural Resources of the People’s Republic of China, grant number 2023YFC3803900-4-3.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| CIDOC-CRM | CIDOC Conceptual Reference Model |
| GPT | Generative Pre-trained Transformer |
| KG | Knowledge Graph |
| LLM | Large Language Model |
| NER | Named Entity Recognition |
| RDF | Resource Description Framework |
| SHACL | Shapes Constraint Language |
| SPARQL | SPARQL Protocol and RDF Query Language |
Appendix A
Please extract the following entities from the historical, archaeological, or geographical literature related to the Liao Dynasty in Chifeng:
- 1.
- Immovable cultural heritage artifacts of the Liao Dynasty in Chifeng.
- 2.
- Builders of these immovable cultural heritage artifacts.
- 3.
- Restorers of these immovable cultural heritage artifacts.
- 4.
- Construction dates of these immovable cultural heritage artifacts.
- 5.
- Restoration dates of these immovable cultural heritage artifacts.
- 6.
- Geographical locations (mountain and water names) related to these immovable cultural heritage artifacts.
- 7.
- Administrative place names related to these immovable cultural heritage artifacts.
- 8.
- Activity place names associated with these immovable cultural heritage artifacts (e.g., places like ‘Naba’).
- 9.
- Events related to these immovable cultural heritage artifacts.
- 10.
- Reigning years (or dynastic years) during the construction or restoration of these immovable cultural heritage artifacts.
- 11.
- Materials used in constructing these immovable cultural heritage artifacts.
- 12.
- Construction activities related to these immovable cultural heritage artifacts.
- 13.
- Restoration activities related to these immovable cultural heritage artifacts.
Extraction Process:
Entity Extraction: First, extract the entities and assign them to appropriate subclasses.
The top-level concept should be named Chifeng Liao Dynasty Cultural Heritage Resources, with the subclass E27 Site for immovable cultural heritage artifacts.
The subclass E27 Site includes six sub-classes from CIDOC CRM: E53 Place, E57 Material, E55 Type, E52 Time-Span, E21 Person, and E7 Activity.
Entity Classification:
- E21 Person includes sub-classes for Builder and Restorer.
- E52 Time-Span includes sub-classes for Construction Time and Restoration Time, with a corresponding time reference table mapping extracted dates and reign periods to three historical periods:
- 907–969 AD (Liao Taizu Yelü Abaoji’s founding to Liao Muzong),
- 969–1031 AD (Liao Jingzong to Liao Shengzong),
- 1031–1125 AD (Liao Xingzong to the fall of the Liao Dynasty).
- E53 Place includes three sub-classes: E53 Administrative Place Name, E53 Mountain/Water Place Name, and E53 Activity Place Name. Match the extracted administrative place names to the following keywords: Linhuang County, Changtai County, Dingba County, Baohua County, and other historical locations.
- E55 Type includes Artifact Categories (e.g., Ancient Buildings, Tombs, Inscriptions, and others) and Person Categories (e.g., Emperors, Nobles, Servants, Priests, Tomb Owners).
- E57 Material maps extracted artifact materials to keywords such as stone, wood, clay, and ceramics.
- E7 Activity includes sub-classes for Construction Activities and Restoration Activities.
Extract Knowledge Triples: Then, identify the relationship types and extract the knowledge triples (Entity 1—Relationship—Entity 2). The relationship types include:
- E27-P45-E57: Artifact Material (e.g., “Lingyan Temple Pagoda—made of—Stone”).
- E27-P2-E55: Artifact Type (e.g., “Lingyan Temple Pagoda—type of—Ancient Building”).
- E27-P4-E52: Construction Time (e.g., “Lingyan Temple Pagoda—constructed during—1115 AD”).
- E27-P14-E21: Builder (e.g., “Lingyan Temple Pagoda—built by—Empress Dowager Li”).
- E27-P12-E7: Construction Activity (e.g., “Lingyan Temple Pagoda—involved in—Construction”).
- E27-P53-E53: Artifact Location (e.g., “Lingyan Temple Pagoda—located at—Balin Left Banner, Chifeng City”).
- E7-P7-E53: Event Location (e.g., “Restoration of the Temple—took place at—Balin Left Banner”).
- E7-P4-E52: Restoration Time (e.g., “Restoration of Lingyan Temple Pagoda—occurred during—1120 AD”).
- E21-P21-E7: Role in Activity (e.g., “Empress Dowager Li—performed in—Restoration”).
Output Format: The extracted entities and triples should be output in the following format:
- Entity 1—Relationship—Entity 2 (e.g., “Liao Shangjing Site—P2—Ancient Site”).
- Entity Classification (e.g., “Liao Shangjing Site—E27 Ancient Site—Artifact Type Classification”; “Qingzhou—Administrative Place Name”; “947 AD—Construction Time”).
Place Reference Table: Include mappings such as “947 AD—Period from Liao Taizu Yelü Abaoji’s founding to Liao Muzong.”
References
- Barzaghi, S.; Moretti, A.; Heibi, I.; Peroni, S. CHAD-KG: A Knowledge Graph for Representing Cultural Heritage Objects and Digitisation Paradata. arXiv 2025, arXiv:2505.13276. [Google Scholar] [CrossRef]
- Felicetti, A.; Himmiche, A.; Somenzi, M. Knowledge Graphs and Artificial Intelligence for the Implementation of Cognitive Heritage Digital Twins. Appl. Sci. 2025, 15, 10061. [Google Scholar] [CrossRef]
- The State Council of the People’s Republic of China. The 14th Five-Year Plan for Cultural Relics Protection and Technological Innovation. China Cultural Relics News, 9 November 2021; p. 003. [Google Scholar]
- ISO 21127:2023; Information and Documentation—A Reference Ontology for the Interchange of Cultural Heritage Information. International Organization for Standardization (ISO): Geneva, Switzerland, 2023.
- GB/T 37965-2019; Information and Documentation—Reference Ontology for Cultural Heritage Information Exchange. Standardization Administration of China: Beijing, China, 2019.
- Mountantonakis, M.; Koumakis, M.; Tzitzikas, Y. Combining LLMs and Hundreds of Knowledge Graphs for Data Enrichment, Validation and Integration: Case Study: Cultural Heritage Domain. In Proceedings of the International Conference on Museum Big Data (MBD2024), Athens, Greece, 18–19 November 2024; Volume 4021, p. 3. [Google Scholar]
- Fan, T.; Wang, H.; Hodel, T. CICHMKG: A Large-Scale and Comprehensive Chinese Intangible Cultural Heritage Multimodal Knowledge Graph. NPJ Herit. Sci. 2023, 11, 115. [Google Scholar] [CrossRef]
- Doerr, M. The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata. AI Mag. 2003, 24, 75–92. [Google Scholar] [CrossRef]
- Wang, Y. Research on the Construction and Application of Knowledge Ontology in the Field of Intangible Cultural Heritage Architecture. Master’s Thesis, Anhui Jianzhu University, Hefei, China, 2023. [Google Scholar] [CrossRef]
- Zhang, J.; Ren, T. A Conceptual Model for Ancient Chinese Ceramics Based on Metadata and Ontology: A Case Study of Collections in the Nankai University Museum. J. Cult. Herit. 2024, 66, 20–36. [Google Scholar] [CrossRef]
- Wang, Y.; Shi, D. Using CIDOC CRM to Construct Knowledge Ontology of Architectural Intangible Cultural Heritage. Comput. Eng. Appl. 2023, 59, 317–326. [Google Scholar]
- Ouyang, J.; Liang, Z.F.; Ren, S.H. Research on the Construction of Knowledge Graph of Large-Scale Chinese Ancient Books. Libr. Inf. Serv. 2021, 65, 126–135. [Google Scholar] [CrossRef]
- Yaman, B.; Randles, A.; McKenna, L.; Kilgallon, L.; Rincón-Yáñez, D.; Johnston, N.; Crooks, P.; O’Sullivan, D. Expanding the Virtual Record Treasury of Ireland Knowledge Graph. Semant. Web J. 2024. in press. Available online: https://www.semantic-web-journal.net/content/expanding-virtual-record-treasury-ireland-knowledge-graph (accessed on 29 October 2025).
- Zhuang, Y. Knowledge Organization of Museum Collections for Artificial Intelligence: The Case of the Palace Museum’s Conceptual Reference Model for Ancient Chinese Movable Cultural Relics. Palace Mus. J. 2023, 11, 126–136+150. [Google Scholar] [CrossRef]
- Xia, Y.; Yao, X.; Wang, J.; Hu, M. Leveraging Knowledge Graphs for Renaissance Costume Matching and Cultural Transmission. NPJ Herit. Sci. 2025, 13, 219. [Google Scholar] [CrossRef]
- Linked Art Editorial Board. Linked Art Data Model, Version 1.0.0. Available online: https://linked.art/model/ (accessed on 29 October 2025).
- Europeana Foundation. Europeana Data Model (EDM): Documentation. Available online: https://pro.europeana.eu/page/edm-documentation (accessed on 29 October 2025).
- Dagdelen, J.; Dunn, A.; Lee, S.; Walker, N.; Rosen, A.S.; Ceder, G.; Persson, K.A.; Jain, A. Structured Information Extraction from Scientific Text with Large Language Models. Nat. Commun. 2024, 15, 1418. [Google Scholar] [CrossRef] [PubMed]
- Foppiano, L.; Lambard, G.; Amagasa, T.; Ishii, M. Mining Experimental Data from Materials Science Literature with Large Language Models: An Evaluation Study. Sci. Technol. Adv. Mater. Methods 2024, 4, 2356506. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J.; Wang, W.; Chen, J.; Yang, X.; Sang, L.; Wen, Z.; Peng, Q. Construction of Cultural Heritage Knowledge Graph Based on Graph Attention Neural Network. Appl. Sci. 2024, 14, 8231. [Google Scholar] [CrossRef]
- Wu, M.C.; Liu, C.; Meng, K.; Wang, D.B. Construction of a Classical-Modern Chinese Translation Model Based on Pre-Trained Language Models. J. Inf. Resour. Manag. 2024, 14, 143–155. [Google Scholar] [CrossRef]
- Huang, Y.; Yu, S.; Chu, J.; Fan, H.; Du, B. Using Knowledge Graphs and Deep Learning Algorithms to Enhance Digital Cultural Heritage Management. Herit. Sci. 2023, 11, 204. [Google Scholar] [CrossRef]
- Yuan, H.; Li, Y.; Wang, B.; Liu, K.; Zhang, J. Knowledge Graph-Based Intelligent Question Answering System for Ancient Chinese Costume Heritage. NPJ Herit. Sci. 2025, 13, 198. [Google Scholar] [CrossRef]
- Maree, M. Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery. Data 2025, 10, 52. [Google Scholar] [CrossRef]
- Zhang, J.; Chan, J.C.F.; Zhao, Z.; Cheng, J.C.P. Heritage Building Information Management and Intelligent Querying by Multimodal Large Language Models and Knowledge Graph. In Proceedings of the Sixth International Conference on Civil and Building Engineering Informatics (ICCBEI 2025), Hong Kong, China, 8–11 January 2025. [Google Scholar]
- Câmara, A.; Almeida, A.; Oliveira, J. Transforming the CIDOC-CRM Model Into a Megalithic Monument Property Graph. J. Comput. Appl. Archaeol. 2024, 7, 213–224. [Google Scholar] [CrossRef]
- Gao, Q.; Li, M.; Yang, Q.; Zhu, W.; Wei, S. Knowledge-Graph Construction for Historic Architectural Complexes: A Case Study of the Grand View Garden in The Story of the Stone. In Proceedings of the Fourth International Conference on Remote Sensing, Surveying, and Mapping (RSSM 2025), Xi’an, China, 10–12 January 2025; SPIE: Xi’an, China, 2025; p. 136422C. [Google Scholar]
- Zhao, Z.; Wang, D. Evaluation of Large Language Models for the Intangible Cultural Heritage Domain. NPJ Herit. Sci. 2025, 13, 439. [Google Scholar] [CrossRef]
- Matic, R.; Kabiljo, M.; Zivkovic, M.; Cabarkapa, M. Extensible Chatbot Architecture Using Metamodels of Natural Language Understanding. Electronics 2021, 10, 2300. [Google Scholar] [CrossRef]
- Patil, R.; Gudivada, V.N. A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs). Appl. Sci. 2024, 14, 2074. [Google Scholar] [CrossRef]
- Zhong, G. Research on the Path of Intangible Cultural Heritage Knowledge Graph Empowering Art Design Education in Colleges and Universities. J. Contemp. Educ. Res. 2025, 9, 9. [Google Scholar] [CrossRef]
- Liu, S.; Tan, N.; Yang, H. Research on the Construction of Knowledge Graph for Liao Dynasty Historical and Cultural Resources. J. Dalian Minzu Univ. 2021, 23, 73–80. [Google Scholar] [CrossRef]
- Li, Y. Research on Cultural Relic Knowledge Extraction Method and Its Knowledge Graph Construction. Master’s Thesis, Chongqing University of Technology, Chongqing, China, 2024. [Google Scholar] [CrossRef]
- Li, C.; Hou, X.; Qiao, X. A Low-Resource Named Entity Recognition Method for the Cultural Heritage Field Incorporating Knowledge Fusion. Acta Sci. Nat. Univ. Pekin. 2024, 60, 13–22. [Google Scholar] [CrossRef]
- Li, L. Research on the Construction of a Digital System for Cultural Relics Based on Knowledge Graph. Master’s Thesis, Beijing Jiaotong University, Beijing, China, 2022. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).