Enhancing Ancient Ceramic Knowledge Services: A Question Answering System Using Fine-Tuned Models and GraphRAG

Chen, Zhi; Liu, Bingxiang

doi:10.3390/info16090792

Open AccessArticle

Enhancing Ancient Ceramic Knowledge Services: A Question Answering System Using Fine-Tuned Models and GraphRAG

by

Zhi Chen

and

Bingxiang Liu

^*

School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(9), 792; https://doi.org/10.3390/info16090792

Submission received: 31 July 2025 / Revised: 26 August 2025 / Accepted: 9 September 2025 / Published: 11 September 2025

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of extensive domain expertise and deficient semantic comprehension in the digital preservation of ancient ceramics, this paper proposes a knowledge question answering (QA) system integrating Low-Rank Adaptation (LoRA) fine-tuning and Graph Retrieval-Augmented Generation (GraphRAG). First, textual information of ceramic images is generated using the GLM-4V-9B model. These texts are then enriched with domain literature to produce ancient ceramic QA pairs via ERNIE 4.0 Turbo, culminating in a high-quality dataset of 2143 curated question–answer groups after manual refinement. Second, LoRA fine-tuning was employed on the Qwen2.5-7B-Instruct foundation model, significantly enhancing its question-answering proficiency specifically for the ancient ceramics domain. Finally, the GraphRAG framework is integrated, combining the fine-tuned large language model with knowledge graph path analysis to augment multi-hop reasoning capabilities for complex queries. Experimental results demonstrate performance improvements of 24.08% in ROUGE-1, 34.75% in ROUGE-2, 29.78% in ROUGE-L, and 4.52% in BERTScore_F1 over the baseline model. This evidence shows that the synergistic implementation of LoRA fine-tuning and GraphRAG delivers significant performance enhancements for ceramic knowledge systems, establishing a replicable technical framework for intelligent cultural heritage knowledge services.

Keywords:

ancient ceramics; QA system; LoRA fine-tuning; GraphRAG

1. Introduction

At the intersection of contemporary digital civilization and traditional cultural preservation, the modernization of ancient ceramic knowledge systems faces unprecedented opportunities and challenges [1]. As vital material carriers of cultural DNA, the embodied artisanal wisdom and aesthetic values within ancient ceramics necessitate creative transformation through digital means [2]. The core contradiction in current knowledge management lies in three critical areas: the cognitive gap between highly specialized ceramic terminology and public understanding; the difficulty in establishing systematic linkages among globally dispersed physical specimens and research findings; and the lack of effective integration mechanisms for unstructured data derived from historical texts versus modern scientific analyses [3,4,5]. This fragmented information landscape results in frequent academic redundancies, intergenerational discontinuities in traditional ceramic craftsmanship transmission, and significantly diminished efficacy in cultural dissemination of ancient ceramics [6].

In recent years, Large Language Models (LLMs) have demonstrated powerful text processing and knowledge reasoning capabilities across multiple domains [7]. Wang et al. [8] proposed an autonomous driving decision-making model integrating LLMs with attention mechanisms, leveraging cognitive learning to enhance scene understanding. Simulation experiments demonstrated its superiority over traditional methods in high-speed driving scenarios, thereby enhancing system reliability. Zou et al. [9] proposed a Chinese medical dialogue system based on LLMs, which improves diagnostic and treatment accuracy while reducing carbon footprint through innovations like knowledge retrieval enhancement, supporting sustainable smart city development. Xiong et al. [10] proposed an agricultural intelligent QA system based on InterLM-20B-LoRA, integrating RAG technology and a local plant protection knowledge base, significantly improving answer accuracy and effectively suppressing LLM hallucinations, providing reliable knowledge services for agricultural practitioners. However, when handling knowledge-intensive tasks in highly specialized fields like ancient ceramics, LLMs face significant limitations. They lack the domain-specific knowledge depth [11], leading to difficulties in accurately understanding the precise meanings of professional terminology, often resulting in conceptual confusion and factual errors in generated content. More critically, LLMs lack the capability for structured knowledge organization [12]. They cannot establish the multidimensional correlation networks linking factors like body material composition, firing temperature, and glaze color manifestation. This results in fragmented knowledge output. When confronted with complex queries, LLMs are prone to generating “hallucinatory” responses, fabricating non-existent literature sources or misinterpreting academic viewpoints [13]. These technical limitations collectively constrain the deep application value of traditional LLMs in the field of ancient ceramics.

To address the challenges of knowledge transfer in large language models for the ancient ceramics domain, Low-Rank Adaptation fine-tuning offers an efficient and flexible solution [14]. This method leverages low-rank matrix decomposition to selectively adapt only a small subset of critical parameters while freezing the main body of the pre-trained model. This strategy preserves the model’s core capabilities while enabling precise adaptation to specific domain knowledge. Furthermore, GraphRAG technology facilitates complex relational reasoning by converting diverse elements from external knowledge bases into nodes within a knowledge graph, enabling graph-based inference. Its dynamic subgraph retrieval mechanism allows for precise localization of fine-grained knowledge elements [15]. Accordingly, this paper proposes an ancient ceramic knowledge question-answering system based on LoRA fine-tuning and GraphRAG technology, designed to effectively address the challenges encountered in transferring ancient ceramic knowledge. The key contributions of this paper are summarized as follows:

To address the prevalence of ceramic imagery in ancient ceramic literature, this study employs the GLM-4V-9B vision-language model. By providing contextually informed prompts, the model generates textual descriptions of images, substantially preserving the original information from ancient ceramic pictures. This approach furnishes a more comprehensive knowledge source for research in the ancient ceramics domain.
This study utilizes ERNIE 4.0 Turbo, which supports multi-document input, to generate QA pairs via global and local prompt engineering. Following the manual removal of duplicate and low-quality content, a final set of 2143 representative QA pairs is generated. This curated dataset effectively encapsulates knowledge within the ancient ceramics field, providing robust data support for subsequent research.
The Qwen2.5-7B-Instruct model serves as the base architecture in this research. Leveraging LoRA fine-tuning technology, the model is trained on the generated QA pair dataset. The resultant fine-tuned model, Qwen2.5-LoRA, is deployed for answer generation. Experimental results demonstrate that the fine-tuned model significantly enhances question-answering performance specifically within the ancient ceramics domain.
This study integrates the GraphRAG framework with external knowledge bases to structurally process heterogeneous knowledge. For questions requiring multi-hop reasoning, graph path analysis provides the large language model with precise, structured contextual information. This enhanced retrieval architecture markedly improves the accuracy of the question-answering system when handling complex queries.

2. Related Work

Artificial intelligence is profoundly transforming the field of cultural heritage preservation. In the semantic mining and utilization of cultural heritage images, Abgaz et al. [16] employed computer vision and semantic web technologies to analyze the implicit socioeconomic and cultural values embedded in digital cultural heritage images, with the Europeana platform as a case study. This approach enables end-to-end intelligent processing of cultural resources, addressing the challenge of capturing latent information that traditional technologies struggle with. In the restoration of historical buildings, Fang et al. [17] achieved millimeter-level precision by combining 3D point cloud reconstruction with generative adversarial networks (GANs). This technology not only accelerates the restoration process but also ensures long-term stability, providing intelligent solutions for cultural heritage preservation.

As a crucial form of human–computer interaction, the technological evolution of question answering systems has progressed from simple rule-based approaches to sophisticated knowledge-enhanced paradigms [18]. In specialized domains like ancient ceramics, these systems are advancing toward greater knowledge depth while maintaining natural interaction capabilities. The foundational stage began with rule-based systems, the ELIZA system in the 1960s pioneered human–computer dialogue through pattern matching but relied on manually crafted rules [19], proving inadequate for addressing specialized queries in ancient ceramics. The 1990s witnessed the rise in search-engine-style QA, which retrieved pre-stored question–answer pairs to expand knowledge coverage [20]. Nevertheless, such systems struggled with domain-specific reasoning tasks. The statistical learning era brought significant breakthroughs, IBM Watson’s 2011 demonstration on Jeopardy! proved statistical methods could handle cross-domain knowledge associations [21]. Systems during this period integrated professional resources such as ceramic kiln lineage databases and artifact feature libraries, enabling responses to basic inquiries, though comprehension gaps persisted for complex queries.

Recently, the deep learning revolution has fundamentally transformed the technical paradigm. The emergence of Transformer architectures allows models to grasp intricate linguistic patterns through pretraining [22]. Fine-tuning techniques adapt large language models to vertical domains like ancient ceramics, empowering systems to comprehend specialized expressions such as ceramic craft characteristics. The incorporation of knowledge graphs resolves challenges in modeling relationships among technical terminology, substantially enhancing answer precision.

3. Research Methods

3.1. Applications of Vision-Language Models

The prevalence of ceramic imagery within literature resources in the ancient ceramic knowledge domain presents a significant challenge [23]. Vision-Language Models (VLMs) offer a novel solution to this issue. VLMs are artificial intelligence (AI) systems trained via joint learning to achieve cross-modal understanding between images and text; their core capability lies in establishing mapping relationships between visual features and semantic space [24]. Such models typically employ a dual-stream encoder architecture, comprising a visual encoder and a text encoder, integrated with attention-based fusion mechanisms. They support tasks such as image caption generation and Visual Question Answering (VQA), finding wide application in fields like medical image analysis and autonomous driving. This study adopts the GLM-4V-9B model for generating textual descriptions of ancient ceramic images. The framework structure of this model is illustrated in Figure 1.

Compared to traditional OCR which extracts only textual information from images, GLM-4V-9B integrates visual comprehension with linguistic understanding to generate descriptive text containing semantic elements [25]. Having acquired generic visual interpretation capabilities through pre-training, this model can be directly deployed via online inference to analyze ancient ceramic imagery. The processing workflow involves inputting porcelain images alongside prompt terms providing foundational descriptors such as vessel morphology, glaze coloration, and decorative motifs, which undergo GLM-4V-9B computation during online inference, ultimately yielding descriptive textual outputs for ceramic images.

3.2. Question–Answer Pair Dataset Construction

The construction of a high-quality knowledge base constitutes foundational work for the ancient ceramic question-answering system. Initially, systematic collection of over 90 authoritative references, including specialized publications such as History of Chinese Ceramics, archeological reports, and scholarly papers, formed a primary knowledge repository covering dimensions such as kiln lineage development, craft characteristics, chronological periods, and decorative techniques. To address unique visual information in ancient ceramics, the GLM-4V-9B vision-language model described in Section 3.1 was employed to generate descriptive texts for artifact images, subsequently integrating these outputs with existing textual resources to establish a raw corpus. Subsequently, ERNIE 4.5 Turbo with multi-document input capability generated question–answer pairs encompassing both holistic synthesis and locally fine-grained categories, utilizing prompt templates illustrated in Table 1. Following the generation of the initial question–answer pair dataset, data deduplication methods were applied to filter similar content, appropriate character length thresholds were set to enhance data quality, and manual curation was conducted to select fact-based question–answer pairs with authoritative content, ultimately resulting in the construction of a high-quality dataset comprising 2143 question–answer pairs. Representative examples of the generated pairs are shown in Figure 2.

3.3. Large Language Model Fine-Tuning

3.3.1. LLaMA-Factory Framework

The LLaMA-Factory open-source framework [26] is a PyTorch-based deep learning framework that supports multiple parameter-efficient fine-tuning methods including LoRA, Prefix Tuning, and QloRA. Prefix Tuning [27] guides model behavior by adding trainable parameters to prefix sequences, avoiding modification of the original parameters. However, its implicit prompting demonstrates limited capability in capturing domain-specific terminology. QLoRA [28] employs 4-bit quantization technology to achieve ultra-low resource consumption, making it particularly effective for deployment on memory-constrained mobile devices. Nevertheless, precision loss during quantization impacts the accurate generation of specialized terms. LoRA innovatively incorporates low-rank adapter matrices alongside original model parameters. This approach maintains the expressiveness of full-parameter fine-tuning while substantially reducing trainable parameters. Consequently, for the QA dataset in the ancient ceramics knowledge domain, this study employs LoRA technology to fine-tune large language models.

3.3.2. LoRA Fine-Tuning

Proposed by Microsoft Research in 2021, LoRA [29] has become a mainstream solution for parameter-efficient fine-tuning of large language models. This technique maintains frozen original weights of pre-trained models while injecting trainable low-rank matrices to capture domain-specific knowledge, with its technical framework illustrated in Figure 3. The design preserves the knowledge integrity of pre-trained models while enabling precision augmentation of domain capabilities through plug-and-play adapter modules. The low-rank characteristic ensures that the newly introduced parameters are substantially fewer than those of the original model, significantly reducing computational and storage demands. Concurrently, the linear composition properties of matrix decomposition retain strong representational capacity. Crucially, semantic relationships of specialized vocabulary can be precisely modeled through the low-rank matrices, making LoRA particularly suited for handling the complex terminology systems characteristic of the ancient ceramics domain.

For the original weight matrix

W_{0} \in R^{d \times k}

, LoRA introduces two compact matrices to construct a low-rank decomposition representation of parameter updates, with computational expressions formalized in Equation (1) and forward propagation processes defined in Equation (2). The implementation principle integrates a trainable bypass pathway parallel to the original pre-trained language model for efficient parameter fine-tuning, where this pathway first projects high-dimensional features into a low-rank space through dimensionality reduction transformations before restoring the original dimensionality via restoration operations. Matrix A undergoes initialization via random Gaussian distribution while Matrix B is zero-initialized. Throughout training, pre-trained model parameters remain frozen with optimization exclusively targeting matrices A and B. Upon completion, the product of matrices B and A integrates with the original pre-trained parameters while preserving the fundamental architectural integrity to constitute the operational weights of the fine-tuned model.

Δ W = B A

(1)

h = W_{0} x + Δ W x = W_{0} x + B A x

(2)

where

A \in R^{r \times k}

and

B \in R^{d \times r}

are trainable low-rank matrices,

Δ W \in R^{d \times k}

denotes the delta matrix,

r

denotes the low-rank matrix whose dimensionality is orders of magnitude smaller than that of the original weight matrix, achieving an optimal balance between computational efficiency and model performance,

x \in R^{r}

denotes the input feature vector,

h \in R^{d}

denotes the output feature vector.

3.3.3. Large Language Model Selection

In AI application practices, selecting large language models necessitates scientific decision-making. This study utilizes the SuperCLUE evaluation benchmark as its foundational reference framework, with partial rankings displayed in Table 2; this benchmark objectively reflects model capabilities through multidimensional standardized testing encompassing Hard tasks, scientific reasoning, and liberal arts competencies [30]. For the specialized task of ancient ceramics knowledge question answering, model performance on critical metrics, particularly comprehension of specialized terminology and dialogue coherence is paramount. Parameter scale constitutes another vital consideration in model selection; current mainstream LLMs range from hundreds of millions to hundreds of billions of parameters, with models of varying scales exhibiting significant differences in inference speed, GPU memory consumption, and fine-tuning costs.

This study comprehensively evaluated the model capabilities and parameter sizes, selecting Llama-3.1-8B-Instruct, Gemma-2-9b-it, GLM-4-9B-Chat, and Qwen2.5-7B-Instruct for the assessment of ancient ceramic knowledge domains. Based on the subsequent comparative experimental results, the Qwen2.5-7B-Instruct model developed by Alibaba Cloud was ultimately chosen. This model demonstrates strong applicability in ceramic-related tasks while effectively balancing performance and computational resource consumption. Its 7B parameter scale enables efficient fine-tuning and deployment on consumer-grade GPUs.

3.4. GraphRAG

GraphRAG [31] is a knowledge graph-based retrieval-augmented generation technique that utilizes graph structures to represent knowledge nodes and their relationships, integrating both structured and unstructured knowledge into generative tasks. After retrieving relevant information through knowledge retrieval, it leverages graph connectivity to expand semantic information and ultimately generates contextually relevant answers. By embedding the structured relationships of knowledge graphs into the generation process, this approach enhances large language models’ comprehension and reasoning capabilities regarding complex information, thereby addressing traditional RAG’s limitations in handling complex queries and multi-hop reasoning. This study employs GraphRAG for knowledge extraction and answer generation concerning ancient ceramic artifacts documentation, with the comprehensive framework illustrated in Figure 4.

First, the text is segmented, dividing the document into paragraphs or semantic units to form manageable chunks, balancing chunk size with information integrity. Next, the fine-tuned Qwen2.5-LoRA large model from the previous stage is invoked to extract entities such as persons, dynasties, locations, and ancient ceramics from each text chunk, followed by relationship extraction to establish connections between entities. Visualization is performed using Neo4j version 5.26.0, as shown in Figure 5, where different colors represent different entity types. Critically, the multi-hop relational chains between entities constitute the physical manifestation of multi-hop reasoning. When multiple entities connect via directed relational chains, each relational edge corresponds to a ‘hop’ in the reasoning process, collectively forming the complete multi-hop reasoning chain. Compared to generic large models, the fine-tuned model captures entities and relationships with more distinctive features specific to the ancient ceramics domain. Subsequently, an embedding model is employed to convert textual entities in the knowledge graph into vector representations. These vector embeddings are stored within specific node properties in the Neo4j graph database. Leveraging Neo4j’s native graph index and vector extension capabilities, the system supports both efficient graph traversal queries and vector similarity searches. Following this, Neo4j’s built-in graph algorithm module is utilized to perform multi-tiered community detection on entity relationships, automatically identifying clusters of tightly interconnected entities. Structured reports are generated for each community, containing core nodes, key relationships, and descriptions of community characteristics. This architectural design enables query results to simultaneously present the global knowledge network and focus on local relational details, achieving multi-layered knowledge presentation. It provides the fine-tuned large language model Qwen2.5-LoRA with contextual information characterized by explicit logical structure and rich semantic associations, significantly enhancing the performance of the question-answering system.

4. Results and Analysis

4.1. Evaluation Indicators

The evaluation of large language models requires comprehensive assessment across multiple dimensions including language generation quality, task performance, and logical reasoning capabilities. To holistically evaluate experimental effectiveness in question-answering tasks, this study adopts a dual evaluation framework combining the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) series metrics [32] with Bidirectional Encoder Representations from Transformers score (BERTScore).

R O U G E - N

is a recall-based textual evaluation metric where

N

denotes the matching granularity of contiguous

g r a m_{N}

, assessing content coverage and semantic consistency by measuring lexical and phrasal overlap between generated and reference texts as defined in Equation (3). ROUGE-1 and ROUGE-2 constitute specialized variants thereof, with ROUGE-1 evaluating unigram semantic coverage to detect query-required entity nouns within generated text, and ROUGE-2 assessing phrase-level expression accuracy through bigram matching to rigorously measure the adequacy of domain terminologies or fixed collocations.

R O U G E - N = \frac{\sum_{S \in R e f e r e n c e S u m m a r i e s} \sum_{g r a m_{N} \in S} C o u n t_{m a t c h} (g r a m_{N})}{\sum_{S \in R e f e r e n c e S u m m a r i e s} \sum_{g r a m_{N} \in S} C o u n t (g r a m_{N})}

(3)

where

S

represents an individual text within the reference text collection,

g r a m_{N}

denotes a contiguous word sequence of length

N

,

C o u n t_{m a t c h} (g r a m_{N})

signifies the minimum occurrence count of co-occurrence

g r a m_{N}

between generated and reference texts,

Count (g r a m_{N})

indicates the overall occurrence count of

g r a m_{N}

in the reference text,

R e f e r e n c e S u m m a r i e s

corresponds to the reference text collection.

R O U G E - L

computes the Longest Common Subsequence (LCS) between reference texts and generated texts to evaluate logical coherence at the sentence level, with its computational formulation expressed in Equations (4)–(6).

R e c a l l_{L C S} = \frac{\sum_{S \in R e f e r e n c e S u m m a r i e s} | L C S (S, G e n) |}{\sum_{S \in R e f e r e n c e S u m m a r i e s} | S |}

(4)

P e r c i s i o n_{L C S} = \frac{\sum_{S \in R e f e r e n c e S u m m a r i e s} | L C S (S, G e n) |}{| G e n |}

(5)

R O U G E - L = \frac{(1 + β^{2}) \cdot R e c a l l_{L C S} \cdot P r e c i s i o n_{L C S}}{R e c a l l_{L C S} + β^{2} \cdot P r e c i s i o n_{L C S}}

(6)

where

R e c a l l_{L C S}

denotes the recall rate of the generated text;

P r e c i s i o n_{L C S}

signifies the precision rate of the generated text;

R O U G E - L

represents the harmonic mean of the preceding two metrics;

L C S (S, G e n)

corresponds to the Longest Common Subsequence between reference text

S

and generated text Gen;

| L C S (S, G e n) |

indicates the length of this subsequence;

| S |

and

| G e n |

denote the lengths of the reference text and generated text, respectively;

β

serves as a weighting balance factor adjusting recall and precision with a default value of 1 to ensure equal consideration of coverage and conciseness.

BERTScore [33] leverages pre-trained models to map texts into high-dimensional semantic spaces, generating context-aware embedding vectors. It evaluates deep semantic similarity by computing token vector alignment precision, effectively identifying synonymous substitutions and logical equivalence. The computational formulation is expressed in Equations (7)–(9).

R_{B E R T} = \frac{1}{| X |} \sum_{X_{i} \in X} \underset{S_{j} \in S}{M a x} X_{i}^{T} S_{j}

(7)

P_{B E R T} = \frac{1}{| X |} \sum_{S_{j} \in X} \underset{X_{i} \in X}{M a x} X_{i}^{T} S_{j}

(8)

F_{B E R T} = 2 \frac{P_{B E R T} \times R_{B E R T}}{P_{B E R T} + R_{B E R T}}

(9)

where

X

denotes the token embedding set of the reference text,

S

represents the token embedding set of the generated text,

X_{i}^{T} S_{j}

indicates the cosine similarity between reference token embedding

X_{i}

and generated token embedding

S_{j}

,

R_{B E R T}

signifies the mean maximum similarity for each token in the reference text against the generated text,

P_{B E R T}

corresponds to the mean maximum similarity for each token in the generated text against the reference text,

F_{B E R T}

denotes the harmonic mean of

R_{B E R T}

and

P_{B E R T}

, providing a comprehensive assessment of the generated text’s coverage and conciseness.

4.2. Experimental Environment and Results

This experiment’s hardware and software environment utilizes Linux Ubuntu 20.04 as the operating system, with an NVIDIA Tesla A100 GPU featuring 40 GB VRAM and an Intel Xeon Gold 6258R 14-core CPU operating at 2.70 GHz with multithreading support. The system incorporates 120 GB DDR4 RAM to ensure smooth large-scale data processing, running Python 3.11.10 and CUDA 12.4 to optimize computational efficiency for deep learning tasks. This study employs LoRA fine-tuning technology to adapt the Qwen2.5-7B-Instruct model, significantly reducing computational resource demands while preserving the large model’s overall performance. Key training parameters are documented in Table 3.

Based on the constructed ancient ceramic QA dataset, we divided 2143 question–answer pairs into training, validation, and test sets at an 8:1:1 ratio. Subsequently, supervised fine-tuning of Qwen2.5-7B-Instruct was conducted within the LLaMA-Factory framework, yielding the Qwen2.5-LoRA model. The loss functions during training and validation were visualized via TensorBoard as depicted in Figure 6. Overall, both training and validation losses progressively decreased with increasing training steps, ultimately converging to approximately 0.2836 and 0.3173, respectively. The stable difference of approximately 0.0337 between these metrics falls within a reasonable range, demonstrating effective fitting on the training data and robust generalization capability.

4.3. Comparison with Other Models

To comprehensively evaluate the effectiveness and reliability of the ancient ceramic knowledge QA system integrating Qwen2.5-LoRA and GraphRAG, this study compares the performance of multiple large language models on the ancient ceramic QA task. The models include Llama-3.1-8B-Instruct, Gemma-2-9B-it, GLM-4-9B-Chat, and the baseline Qwen2.5-7B-Instruct. The experimental results are shown in Table 4.

As evidenced in Table 4, the ancient ceramic knowledge QA system leveraging model fine-tuning and GraphRAG significantly outperforms other models across ROUGE and BERTScore metrics. Specifically, it achieves ROUGE-1, ROUGE-2, and ROUGE-L scores of 57.92%, 48.46%, and 52.17%, respectively, alongside a BERTScore_F1 of 93.68%. Compared to the baseline Qwen2.5-7B-Instruct model, these results represent substantial improvements of 24.08%, 34.75%, 29.78%, and 4.52%, respectively, demonstrating marked enhancement in answer accuracy and reasoning capabilities for ancient ceramic knowledge. This research confirms that deep synergy between large language models and knowledge graphs achieves breakthrough in capability boundaries for domain-specific QA systems. The dual-engine architecture fully harnesses complementary advantages of both technologies, enabling the system to simultaneously deliver profound semantic comprehension and precise knowledge localization efficiency.

4.4. Ablation Experiment

To systematically evaluate the individual contributions of the LoRA fine-tuning and GraphRAG techniques on the ancient ceramic knowledge QA task, this study conducts multiple ablation experiments using Qwen2.5-7B-Instruct as the base model, with results detailed in Table 5. The methodology employs strict variable control, incrementally modifying the baseline control group to quantify the independent value of each technical module in enhancing ancient ceramic knowledge understanding.

As demonstrated in Table 5, the synergistic application of IoRA fine-tuning and GraphRAG technology demonstrates substantial performance enhancements in optimizing the ancient ceramic knowledge QA system. Notably, IoRA fine-tuning yields the most significant improvements, ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore_F1 scores increase by 19.77%, 30.22%, 25.46%, and 3.63%, respectively, following its implementation. This indicates that IoRA fine-tuning enables the model to maintain general linguistic comprehension capabilities while achieving deep specialization in ancient ceramics, effectively adapting to domain-specific terminology systems and knowledge association patterns, thereby resolving knowledge transfer barriers commonly encountered with conventional fine-tuning approaches in specialized domains. Correspondingly, Table 5 reveals consistent metric elevation across all indicators after GraphRAG integration. These results confirm that fine-tuned large models alone frequently fail to attain optimal performance due to limitations such as hallucinations. The GraphRAG augmentation produces responses preserving the fluency of natural language generation while adhering to the precision standards essential for scholarly ceramic research, collectively validating the comprehensive efficacy of our methodology.

4.5. Manual Evaluation Results

To systematically validate the academic accuracy and application value of answers generated by the ancient ceramic knowledge question-answering system, this study introduced a manual evaluation mechanism to score experimental results under different research methods, ensuring comprehensive and reliable assessment conclusions. Specifically, five domain experts in ancient ceramics were invited to evaluate the results based on five metrics: factual accuracy, terminological rigor, information completeness, linguistic coherence, and logical consistency. Each metric was scored on a 10-point scale, with higher scores indicating better model performance. The final score for each metric was calculated as the average of the five experts’ ratings. The statistically analyzed evaluation results are presented in Table 6.

As evidenced in Table 6, the methodology integrating IoRA model fine-tuning and GraphRAG technology proposed in this study consistently outperforms alternative approaches across all five evaluation metrics. Specifically, relative to the baseline Qwen2.5-7B-Instruct model, it achieves an average improvement of 5.16 points. When compared to methods employing GraphRAG alone, the gain averages 3.24 points, while against approaches using only IoRA fine-tuning, the advancement averages 1.56 points. These results indicate that the synergistic integration of IoRA fine-tuning and GraphRAG yields substantial performance enhancements, with both technologies exhibiting significant advantages in optimizing knowledge representation and enhancing reasoning capabilities. Furthermore, the expert evaluation validates the robustness of ancient ceramic QA system and confirms its high practical utility for real-world applications.

5. Discussion

Against the backdrop of rapid AI advancement, domain-specific QA systems are demonstrating divergent evolutionary trajectories. In healthcare, these systems prioritize clinical decision support, with knowledge modeling focusing on precise symptom-treatment matching constrained by medical term polysemy resolution. Conversely, ancient ceramic QA systems must handle unstructured data unique to material cultural heritage. Legal consultation systems rely on case law networks with coherent statutory interpretation logic, whereas ceramic QAs demand causal-chain validation of material properties like clay recipes and firing techniques. These disparities demonstrate that ancient ceramics, as vital material carriers of civilization, possess knowledge systems characterized by high specialization and data heterogeneity. Addressing these challenges, this study integrates visual language models, fine-tuned large language models and knowledge graph enhancement technologies to construct a specialized ancient ceramics QA system, delivering an innovative solution for digital cultural heritage preservation.

This study fundamentally addresses the critical challenge of low image data utilization in ancient ceramics research. By deploying the GLM-4V-9B visual language model, the system achieves precise identification of specialized ceramic features including glaze color variations and decorative patterns, while generating semantically rich textual descriptions. This approach not only preserves crucial visual information but also accomplishes effective transformation from unstructured image data to structured knowledge representation, establishing a robust foundation for subsequent knowledge processing.

For knowledge construction, the research employs the ERNIE 4.0 Turbo model to automate question–answer pair generation. Through meticulously designed global and local prompting strategies, the system extracts 2143 high-quality QA pairs from multidisciplinary literature sources, comprehensively covering core knowledge dimensions such as kiln system evolution, craftsmanship characteristics, and chronological identification in ancient ceramics studies.

The model fine-tuning component represents another major innovation. Using Qwen2.5-7B-Instruct as the base model, domain-specific adaptation is achieved through LoRA technology, resulting in the Qwen2.5-LoRA variant that demonstrates exceptional performance in professional ceramic QA tasks. Experimental results show significant improvements across all evaluation metrics, with ROUGE series indicators exhibiting over 20% enhancement.

The implementation of GraphRAG framework constitutes a technological breakthrough. While conventional retrieval-augmented generation methods struggle with multi-hop reasoning in ceramic knowledge queries, this research effectively resolves the challenge through structured knowledge graph representation and path analysis. The system demonstrates superior capability in capturing complex interrelationships, substantially improving answer quality for queries requiring deep reasoning. Ablation studies demonstrate notable enhancement across all evaluation metrics following the application of GraphRAG technology.

From a practical application perspective, the QA system developed in this study demonstrates unique advantages across multiple scenarios. Within museum digitization initiatives, it facilitates intelligent annotation of vast collections; for ceramic-related academic programs at higher education institutions, it provides an academically informed knowledge query resource; in cultural dissemination contexts, the intelligent QA interface effectively disseminates scientific and cultural knowledge about ancient ceramics to the public.

However, critical challenges persist in the current system regarding ancient ceramic knowledge: in terms of knowledge coverage, the insufficient modeling depth for non-mainstream kiln lineages and regional craft variations hinders comprehensive representation of ceramic cultural diversity; concerning data quality, reliance on external literature databases with irregular updates and inconsistent authority compromises the timeliness and reliability of knowledge services; technically, existing architectures exhibit limited multimodal data integration capabilities, particularly lacking effective solutions for visual feature extraction of decorative motifs and digital reconstruction of three-dimensional forms. Future research will establish dynamic knowledge expansion mechanisms to enhance domain coverage, develop multi-source data quality assessment frameworks to ensure knowledge supply reliability, and investigate the potential of cross-modal large models for artifact feature extraction and spatial relationship modeling. These improvements will significantly elevate the system’s practical value in academic research and cultural communication.

6. Conclusions

This study innovatively integrates fine-tuned large language models with GraphRAG technology to construct a specialized QA system for ancient ceramics, providing novel technological pathways for digital preservation and dissemination of cultural heritage. The system achieves structured representation and intelligent services for traditional ceramic knowledge, demonstrating significant advantages in semantic understanding depth and knowledge reasoning accuracy. First, to address the comprehension challenges of professional terminology in ancient ceramics, we designed a fine-tuning strategy based on domain-specific corpora, enabling the large language model to accurately capture key semantics of craft characteristics. Second, the craft knowledge graph constructed through GraphRAG technology effectively resolves spatiotemporal correlation modeling issues in ceramic technique inheritance. Finally, the integration of fine-tuned language models with GraphRAG combines the strengths of structured knowledge and unstructured texts, markedly improving response quality to complex queries.

While achieving notable outcomes, this study still has areas for improvement. In terms of knowledge coverage breadth, the system has problems of incomplete knowledge regarding regional kilns and specialized techniques. In terms of technical integration, there exists the issue of real-time constraints. When dynamically updating the knowledge graph, it requires re-tuning IoRA parameters, making it difficult to meet research demands for high-frequency updates. Regarding application scalability, problems of high knowledge transfer costs exist when extending the system to new kiln knowledge, which necessitates reconstructing the knowledge graph and incurs substantial manual annotation costs. With the continuous advancement of relevant technologies, this research is expected to deliver greater value across academic research, cultural dissemination, and educational popularization.

Author Contributions

Conceptualization, Z.C. and B.L.; methodology, Z.C.; software, Z.C.; validation, Z.C. and B.L.; data curation, Z.C. and B.L.; writing—original draft preparation, Z.C.; writing—review and editing, Z.C. and B.L.; visualization, Z.C.; supervision, Z.C. and B.L.; project administration, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

We would like to thank the anonymous reviewers for their constructive and valuable suggestions on the earlier drafts of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LoRA	Low-Rank Adaptation
GraphRAG	Graph Retrieval-Augmented Generation
AI	Artificial Intelligence
GANs	Generative Adversarial Networks
LLMs	Large Language Models
VLMs	Vision-Language Models
ROUGE	Recall-Oriented Understudy for Gisting Evaluation
BERTscore	Bidirectional Encoder Representations from Transformers score
LCS	Longest Common Subsequence
GPU	Graphics Processing Unit
CPU	Central Processing Unit

References

Poddar, A.K. Impact of Global Digitalization on Traditional Cultures. Int. J. Interdiscip. Soc. Community Stud. 2024, 20, 209. [Google Scholar] [CrossRef]
Yu, Q. Inheritance and Innovation of Cultural and Creative Design in Jingdezhen under the Background of Cultural and Tourism Integration. Design 2024, 9, 640. [Google Scholar] [CrossRef]
Sun, H.; He, Y.; Wang, R.; Su, J.; Li, H.; Zhang, J.; Zheng, X. Discussion: Constructing a Scientific Theory and Methodology System Oriented Towards Empirical and Practical Aspects, Contemporary Reflections on the Construction of Cultural Heritage Disciplines. China Cult. Herit. 2025, 2, 4. [Google Scholar]
Bakker, F.T.; Antonelli, A.; Clarke, J.A.; Cook, J.A.; Edwards, S.V.; Ericson, P.G.P.; Faurby, S.; Ferrand, N.; Gelang, M.; Gillespie, R.G.; et al. The Global Museum: Natural history collections and the future of evolutionary science and public education. PeerJ 2020, 8, 8225. [Google Scholar] [CrossRef] [PubMed]
Girdhar, N.; Coustaty, M.; Doucet, A. Digitizing history: Transitioning historical paper documents to digital content for information retrieval and mining—A comprehensive survey. IEEE Trans. Comput. Soc. Syst. 2024, 11, 6151–6180. [Google Scholar] [CrossRef]
Lin, T.; Vermol, V.V.; Yu, J.; Jiang, H. Cultural inheritance and technological innovation in modern ceramics: A historical study based on the evolution of individual practice and aesthetic consciousness of ceramic artists. Herança 2025, 8, 3. [Google Scholar]
Kumar, P. Large language models (LLMs): Survey, technical frameworks, and future challenges. Artif. Intell. Rev. 2024, 57, 260. [Google Scholar] [CrossRef]
Wang, X.; Tan, G. Research on Decision-making of Autonomous Driving in Highway Environment Based on Knowledge and Large Language Model. J. Syst. Simul. 2025, 37, 1246–1255. [Google Scholar]
Zou, H.; Wang, Y.; Huang, A. A novel domain knowledge augmented large language model based medical conversation system for sustainable smart city development. Sustain. Cities Soc. 2025, 128, 106444. [Google Scholar] [CrossRef]
Xiong, J.; Pan, L.; Liu, Y.; Zhu, L.; Zhang, L.; Tan, S. Enhancing Plant Protection Knowledge with Large Language Models: A Fine-Tuned Question-Answering System Using LoRA. Appl. Sci. 2025, 15, 3850. [Google Scholar] [CrossRef]
Zheng, B.; Liu, F.; Zhang, M.; Tong, Q.; Cui, S.; Ye, Y.; Guo, Y. Image captioning for cultural artworks: A case study on ceramics. Multimed. Syst. 2023, 29, 3223–3243. [Google Scholar] [CrossRef]
Chen, H. Large knowledge model: Perspectives and challenges. arXiv 2023, arXiv:2312.02706. [Google Scholar] [CrossRef]
Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
Ding, N.; Lv, X.; Wang, Q.; Chen, Y.; Zhou, B.; Liu, Z.; Sun, M. Sparse low-rank adaptation of pre-trained language models. arXiv 2023, arXiv:2311.11696. [Google Scholar]
Zhang, Q.; Chen, S.; Bei, Y.; Zheng, Y.; Hua, Z.; Hong, Z.; Dong, J.; Chen, H.; Zhang, Y.; Huang, X. A survey of graph retrieval-augmented generation for customized large language models. arXiv 2025, arXiv:2501.13958. [Google Scholar]
Abgaz, Y.; Rocha Souza, R.; Methuku, J.; Koch, G.; Dorn, A. A Methodology for Semantic Enrichment of Cultural Heritage Images Using Artificial Intelligence Technologies. J. Imaging 2021, 7, 121. [Google Scholar] [CrossRef] [PubMed]
Fang, T.; Hui, Z.; Rey, W.P.; Yang, A.; Liu, B.; Xie, Z. Digital restoration of historical buildings by integrating 3D PC reconstruction and GAN algorithm. J. Artif. Intell. Technol. 2024, 4, 179–187. [Google Scholar] [CrossRef]
Deng, Y.; Lei, W.; Lin, W.; Cai, D. A survey on proactive dialogue systems: Problems, methods, and prospects. arXiv 2023, arXiv:2305.02750. [Google Scholar] [CrossRef]
Rajaraman, V. From ELIZA to ChatGPT: History of human-computer conversation. Resonance 2023, 28, 889–905. [Google Scholar] [CrossRef]
Kamble, K.; Russak, M.; Mozolevskyi, D.; Ali, M.; Russak, M.; AlShikn, W. Expect the Unexpected: FailSafe Long Context QA for Finance. arXiv 2025, arXiv:2502.06329. [Google Scholar] [CrossRef]
Acharya, K.; Velasquez, A.; Song, H.H. A survey on symbolic knowledge distillation of large language models. IEEE Trans. Artif. Intell. 2024, 5, 5928–5948. [Google Scholar] [CrossRef]
Raiaan, M.A.K.; Mukta, M.S.H.; Fatema, K.; Fatema, K.; Fahad, M.N.; Sakib, S.; Mim, M.M.J. A review on large language models: Architectures, applications, taxonomies, open issues and challenges. IEEE Access 2024, 12, 26839–26874. [Google Scholar] [CrossRef]
Ling, Z.; Delnevo, G.; Salomoni, P.; Mirri, S. Findings on machine learning for identification of archaeological ceramics-a systematic literature review. IEEE Access 2024, 12, 100167–100185. [Google Scholar] [CrossRef]
Zhang, J.; Huang, J.; Sheng, J.; Lu, S. Vision-language models for vision tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5625–5644. [Google Scholar] [CrossRef]
Hong, W.; Wang, W.; Ding, M.; Yu, W.; Lv, Q.; Wang, Y.; Cheng, Y.; Huang, S.; Ji, J.; Zho, X. Cogvlm2: Visual language models for image and video understanding. arXiv 2024, arXiv:2408.16500. [Google Scholar] [CrossRef]
Zheng, Y.; Zhang, R.; Zhang, J.; Ye, Y.; Luo, Z.; Feng, Z.; Ma, Y. Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv 2024, arXiv:2403.13372. [Google Scholar]
Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar] [CrossRef]
Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 2023, 36, 10088–10115. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
Duan, X.; Liu, W.; Gao, D.; Liu, S.; Huang, Y. A Quantitative Evaluation Method Based on Consistency Metrics for Large Model Benchmarks. In Proceedings of the International Conference on Modeling, Natural Language Processing and Machine Learning, Xi’an, China, 17–19 May 2024; pp. 39–48. [Google Scholar]
Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Metropolitansky, D.; Ness, R.O.; Larson, J. From local to global: A graph rag approach to query-focused summarization. arXiv 2024, arXiv:2404.16130. [Google Scholar] [CrossRef]
Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. Bertscore: Evaluating text generation with bert. arXiv 2019, arXiv:1904.09675. [Google Scholar]

Figure 1. The framework structure diagram of GLM-4V-9B.

Figure 2. The example of generated ancient ceramics QA pairs.

Figure 3. The framework diagram of LoRA fine-tuning technology.

Figure 4. The framework diagram of GraphRAG.

Figure 5. The visualization of partial entities and their relationships.

Figure 6. The loss function curve. (a) The training process; (b) The validation process.

Table 1. The prompt templates for two QA types.

QA Type

Prompt Template

Global Synthesis

You are an expert in large language model question–answer generation. Thoroughly review all information across {all documents}, integrate insights comprehensively, and consider from the user’s perspective: What questions would users likely pose to the AI Ancient Ceramics Technology and Culture Knowledge System? Ensure questions have definitive answers and must not fabricate inquiries. Present output in QA pair format.

Local Fine-grained

You are an expert in large language model question–answer generation. Meticulously examine the complete content within {single document}, adopt the user’s viewpoint, and deduce: What specific questions would users address to the AI Ancient Ceramics Technology and Culture Knowledge System? Ensure questions have definitive answers and must not fabricate inquiries. Present output in QA pair format.

Table 2. The SuperCLUE evaluation leaderboard for select large language models.

Ranking	Model Name	Overall Score	Hard Score	Science Score	Liberal Arts Score
1	Qwen2.5-7B-Instruct	60.61	33.92	74.63	73.28
2	GLM-4-9B-Chat	56.83	29.33	69.22	71.94
3	Gemma-2-9b-it	55.48	29.03	67.78	69.63
4	MiniCPM3-4B	53.16	26.56	63.04	69.87
5	Llama-3.1-8B-Instruct	51.42	25.67	63.27	65.30
6	Yi-1.5-6B-Chat	48.69	25.16	57.03	63.89

Table 3. The main training parameter configurations for LoRA fine-tuning.

Training Parameter	Parameter Value
learning_rate	1 × 10⁻⁴
num_train_epochs	3
gradient_accumulation_steps	4
per_device_train_batch_size	1
lora_rank	8
lora_alpha	32
lora_dropout	0.05

Table 4. Comparison of our method with other models.

Method	ROUGE-1/%	ROUGE-2/%	ROUGE-L/%	BERTScore_F1/%
Llama-3.1-8B-Instruct	34.29	11.64	19.35	87.61
Gemma-2-9b-it	31.73	13.75	20.59	88.24
GLM-4-9B-Chat	28.39	13.67	18.48	87.93
Qwen2.5-7B-Instruct	33.84	13.71	22.39	89.16
Qwen2.5-IoRA + GraphRAG	57.92	48.46	52.17	93.68

Table 5. The results of ablation experiment.

Method	ROUGE-1/%	ROUGE-2/%	ROUGE-L/%	BERTScore_F1/%
Qwen2.5-7B-Instruct	33.84	13.71	22.39	89.16
Qwen2.5-7B-Instruct + GraphRAG	38.24	17.37	26.54	91.24
Qwen2.5-IoRA	53.61	43.93	47.85	92.79
Qwen2.5-IoRA + GraphRAG	57.92	48.46	52.17	93.68

Table 6. The results of manual evaluation under different research methods.

Method	Factual Accuracy	Terminolocal Rigor	Information Completeness	Linguistic Coherence	Logical Consistency
Llama-3.1-8B-Instruct	3.6	3.8	3.8	4.0	4.2
Gemma-2-9b-it	3.8	4.2	3.8	3.8	4.0
GLM-4-9B-Chat	4.0	3.8	4.0	4.2	4.2
Qwen2.5-7B-Instruct	4.4	4.2	4.6	4.2	4.4
Qwen2.5-7B-Instruct + GrphRAG	6.4	6.2	6.6	6.0	6.2
Qwen2.5-IoRA	7.6	7.8	8.0	8.2	8.2
Qwen2.5-IoRA + GraphRAG	9.6	9.4	9.6	9.4	9.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Liu, B. Enhancing Ancient Ceramic Knowledge Services: A Question Answering System Using Fine-Tuned Models and GraphRAG. Information 2025, 16, 792. https://doi.org/10.3390/info16090792

AMA Style

Chen Z, Liu B. Enhancing Ancient Ceramic Knowledge Services: A Question Answering System Using Fine-Tuned Models and GraphRAG. Information. 2025; 16(9):792. https://doi.org/10.3390/info16090792

Chicago/Turabian Style

Chen, Zhi, and Bingxiang Liu. 2025. "Enhancing Ancient Ceramic Knowledge Services: A Question Answering System Using Fine-Tuned Models and GraphRAG" Information 16, no. 9: 792. https://doi.org/10.3390/info16090792

APA Style

Chen, Z., & Liu, B. (2025). Enhancing Ancient Ceramic Knowledge Services: A Question Answering System Using Fine-Tuned Models and GraphRAG. Information, 16(9), 792. https://doi.org/10.3390/info16090792

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Ancient Ceramic Knowledge Services: A Question Answering System Using Fine-Tuned Models and GraphRAG

Abstract

1. Introduction

2. Related Work

3. Research Methods

3.1. Applications of Vision-Language Models

3.2. Question–Answer Pair Dataset Construction

3.3. Large Language Model Fine-Tuning

3.3.1. LLaMA-Factory Framework

3.3.2. LoRA Fine-Tuning

3.3.3. Large Language Model Selection

3.4. GraphRAG

4. Results and Analysis

4.1. Evaluation Indicators

4.2. Experimental Environment and Results

4.3. Comparison with Other Models

4.4. Ablation Experiment

4.5. Manual Evaluation Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI